Effect of fusing features from multiple DCNN architectures in image classification

Thangarajah Akilan; Qingming Jonathan Wu; Hui Zhang

Effect of fusing features from multiple DCNN architectures in image classification

View Fulltext

Author(s): Thangarajah Akilan¹ ; Qingming Jonathan Wu¹ ; Hui Zhang²
- Affiliations: 1: Department of Electrical and Computer Engineering , University of Windsor , 401 Sunset Avenue, Windsor , Canada ;
  2: College of Electrical and Information Engineering , Changsha University of Science and Technology , Changsha , People's Republic of China
Source: Volume 12, Issue 7, July 2018, p. 1102 – 1110
DOI: 10.1049/iet-ipr.2017.0232 , Print ISSN 1751-9659, Online ISSN 1751-9667

Received 03/03/2017, Accepted 03/02/2018, Revised 23/12/2017, Published 09/02/2018

Automatic image classification has become a necessary task to handle the rapidly growing digital image usage. It has branched out many algorithms and adopted new techniques. Among them, feature fusion-based image classification methods rely on hand-crafted features traditionally. However, it has been proven that the bottleneck features extracted through pre-trained convolutional neural networks (CNNs) can improve the classification accuracy. Thence, this study analyses the effect of fusing such cues from multiple architectures without being tied to any hand-crafted features. First, the CNN features are extracted from three different pre-trained models, namely AlexNet, VGG-16, and Inception-V3. Then, a generalised feature space is formed by employing principal component reconstruction and energy-level normalisation, where the features from individual CNN are mapped into a common subspace and embedded using arithmetic rules to construct fused feature vectors (FFVs). This transformation play a vital role in creating a representation that is appearance invariant by capturing complementary information of different high-level features. Finally, a multi-class linear support vector machine is trained. The experimental results demonstrate that such multi-modal CNN feature fusion is well suited for image/object classification tasks, but surprisingly it has not been explored so far by the computer vision research community extensively.

References

1. 1)
  - 37. Dong, C., Loy, C.-C., He, K., et al: ‘Image super-resolution using deep convolutional networks’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, pp. 295–307.
2. 2)
  - 14. Gehler, P.-V., Nowozin, S.: ‘On feature combination for multiclass object classification’. Proc. IEEE 12th Int. Conf. Computer Vision (ICCV), 2009.
3. 3)
  - 17. Zeiler, M.D., Fergus, R.: ‘Visualizing and understanding convolutional networks’, 2014, pp. 818–833.
4. 4)
  - 41. Krizhevsky, A., Sutskever, I., Hinton, G.: ‘ImageNet classification with deep convolutional neural networks’. Advances in Neural Information Processing Systems (NIPS), 2012, pp. 1097–1105.
5. 5)
  - 34. Thangarajah, A., Wu, Q.M.J., Safaei, A., et al: ‘A late fusion approach for harnessing multi-CNN model high-level features’. IEEE Int. Conf. Systems, Man, and Cybernetics, SMC, 2017, pp. 566–571.
6. 6)
  - 24. Hayat, M., Khan, S.H., Bennamoun, M., et al: ‘A spatial layout and scale invariant feature representation for indoor scene classification’, IEEE Trans. Image Process., 2016, 25, (10), pp. 4829–4841.
7. 7)
  - 35. Cai, S., Zhang, L., Zuo, W., et al: ‘A probabilistic collaborative representation based approach for pattern classification’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2950–2959.
8. 8)
  - 4. Wang, T., Wu, D.J., Coates, A., et al: ‘End-to-end text recognition with convolutional neural networks’. Inter. Conf. Pattern Recognition (ICPR), 2012.
9. 9)
  - 38. Gao, Z., Fatih, P., Hongdong, L.: ‘Robust visual tracking with deep convolutional neural network based object proposals on pets’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 26–33.
10. 10)
  - 8. Akilan, T., Wu, Q.M.J., Yang, Y., et al: ‘Fusion of transfer learning features and its application in image classification’. IEEE Canadian Conf. Electrical and Computer Engineering (CCECE), 2017, pp. 1–5.
11. 11)
  - 20. Huang, X., Xu, Y., Yang, L.: ‘Local visual similarity descriptor for describing local region’. Int. Conf. Machine Vision, 2017, p. 103410S.
12. 12)
  - 50. Snoek, J., Larochelle, H., Adams, R.P.: ‘Practical Bayesian optimization of machine learning algorithms’, Adv. Neural Inf. Process. Syst., 2012, 25, pp. 2951–2959.
13. 13)
  - 39. Wenling, S., Kihyuk, S., Diogo, A., et al: ‘The extraordinary link between deep neural networks and the nature of the universe’, MIT Technol. Rev., 2016.
14. 14)
  - 15. Khan, F.-S., van de Weijer, J., Vanrell, M.: ‘Modulating shape features by color attention for object recognition’, Int. J. Comput. Vis., 2012, 98, pp. 49–64.
15. 15)
  - 6. Sharif Razavian, A., Azizpour, H., Sullivan, J., et al: ‘CNN features off-the shelf: an astounding baseline for recognition’. IEEE Conf. Computer Vision and Pattern Recognition Workshops, 2014, pp. 806–813.
16. 16)
  - 12. Yu, K., Zhang, T.: ‘Improved local coordinate coding using local tangents’. Proc. 27th Int. Conf. Machine Learning (ICML), 2010, pp. 1215–1222.
17. 17)
  - 42. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
18. 18)
  - 46. Fei-Fei, L., Fergus, R., Perona, P.: ‘Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories’, Comput. Vis. Image Underst., 2007, 106, (1), pp. 59–70.
19. 19)
  - 1. Kolouri, S., Park, S.-R., Rohde, G.-K.: ‘The radon cumulative distribution transform and its application to image classification’, IEEE Trans. Image Process., 2016, 25, (2), pp. 920–934.
20. 20)
  - 49. Everingham, M., Eslami, S.M.A., Van Gool, L., et al: ‘The Pascal visual object classes challenge: a retrospective’, Int. J. Comput. Vis., 2015, 111, (1), pp. 98–136.
21. 21)
  - 7. Sermanet, P., Eigen, D., Zhang, X., et al: ‘Overfeat: integrated recognition, localization and detection using convolutional networks’. Int. Conf. Learning Representations (ICLR), 2014.
22. 22)
  - 48. Quattoni, A., Torralba, A.: ‘Recognizing indoor scenes’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009.
23. 23)
  - 22. Sun, M., Han, T.-X., Liu, M.-C., et al: ‘Latent model ensemble with auto-localization’. Proc. Int. Conf. Pattern Recognition (ICPR), 2016.
24. 24)
  - 26. Dai, J., Li, Y., He, K., et al: ‘R-FCN: object detection via region-based fully convolutional networks’, Adv. Neural Inf. Process. Syst., 2016, pp. 379–387.
25. 25)
  - 10. Hoashi, H., Joutou, T., Yanai, K.: ‘Image recognition of 85 food categories by feature fusion’. Proc. Second Workshop on Multimedia for Cooking and Eating Activities, 2010.
26. 26)
  - 31. Thangarajah, A., Wu, Q.J., Yimin, Y.: ‘Fusion-based foreground enhancement for background subtraction using multivariate multi-model Gaussian distribution’, Inf. Sci., 2018, 430, pp. 414–431.
27. 27)
  - 28. Zhang, Y., Shi, B.: ‘Improving pooling method for regularization of convolutional networks based on the failure probability density’, Opt. – Int. J. Light Electron Opt., 2017, 145, (Suppl. C), pp. 258–265.
28. 28)
  - 40. Krizhevsky, A., Hinton, G.: ‘Learning multiple layers of features from tiny images’. Technical Report, University of Toronto, 2009.
29. 29)
  - 18. Dixit, M., Chen, S., Gao, D., et al: ‘Scene classification with semantic Fisher vectors’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 2974–2983.
30. 30)
  - 44. Luo, Y., Tao, D., Wen, Y., et al: ‘Tensor canonical correlation analysis for multi-view dimension reduction’, IEEE Trans. Knowl. Data Eng., 2015, 27, (11), pp. 3111–3124.
31. 31)
  - 11. Park, D.-C.: ‘Multiple feature-based classifier and its application to image classification’. IEEE Int. Conf. Data Mining Workshops, 2010, pp. 65–71.
32. 32)
  - 29. Snoek, J., Rippel, O., Swersky, K., et al: ‘Scalable Bayesian optimization using deep neural networks’. Int. Conf. Machine Learning (ICML), 2015, pp. 2171–2180.
33. 33)
  - 9. Li, P., Wang, Q., Zeng, H., et al: ‘Local log-Euclidean multivariate Gaussian descriptor and its application to image classification’, IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39, (4), pp. 803–817.
34. 34)
  - 27. Oquab, M., Bottou, L., Laptev, I., et al: ‘Learning and transferring mid-level image representations using convolutional neural networks’. Proc. IEEE Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1717–1724.
35. 35)
  - 23. Khan, S.H., Hayat, M., Bennamoun, M., et al: ‘A discriminative representation of convolutional features for indoor scene recognition’, IEEE Trans. Image Process., 2016, 25, (7), pp. 3372–3383.
36. 36)
  - 43. Szegedy, C., Vanhoucke, V., Ioffe, S., et al: ‘Rethinking the inception architecture for computer vision’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826.
37. 37)
  - 13. Zhang, J., Marszalek, M., Lazebink, S., et al: ‘Local features and kernels for classification of texture and object categories: a comprehensive study’, Int. J. Comput. Vis., 2007, 73, (2), pp. 213–238.
38. 38)
  - 45. Hsu, C.-W., Lin, C.-J.: ‘A comparison of methods for multiclass support vector machines’, IEEE Trans. Neural Netw., 2002, 13, (2), pp. 415–425.
39. 39)
  - 32. Ai, D., Duan, G., Han, X., et al: ‘Multiple feature and fusion based on generalized n-dimensional independent component analysis’. Int. Conf. Pattern Recognition (ICPR), 2012, vol. 21, pp. 971–974.
40. 40)
  - 47. Griffin, G., Holub, A., Perona, P.: ‘The Caltech-256: Caltech’. Technical Report, 2007, 7694.
41. 41)
  - 3. Feng, J., Liu, X., Dong, Y., et al: ‘Structural difference histogram representation for texture image classification’, IET Image Process., 2017, 11, (2), pp. 118–125.
42. 42)
  - 33. Bahrampour, S., Nasrabadi, N.M., Ray, A., et al: ‘Multimodal task-driven dictionary learning for image classification’, IEEE Trans. Image Process., 2016, 25, pp. 24–38.
43. 43)
  - 30. Yang, J., Yang, M.-H.: ‘Learning hierarchical image representation with sparsity, saliency and locality’. Proc. British Machine Vision Conf., 2011, pp. 1–11.
44. 44)
  - 19. Lin, G., Zhu, H., Kang, X., et al: ‘Feature structure fusion modelling for classification’, IET Image Process., 2015, 9, (10), pp. 883–888.
45. 45)
  - 16. Xiao, J., Hays, K., Ehinger, A., et al: ‘Sun database: large-scale scene recognition from abbey to zoo’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3485–3492.
46. 46)
  - 25. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016.
47. 47)
  - 2. Ristin, M., Gal, J., Guillaumin, M., et al: ‘From categories to subcategories: large-scale image classification with partial class label refinement’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 231–239.
48. 48)
  - 5. Fernando, T.-B., Fromont, E., Muselet, D., et al: ‘Discriminative feature fusion for image classification’. Inter. Conf. Pattern Recognition (ICPR), 2012, pp. 3434–3441.
49. 49)
  - 21. Zhu, Q.-H., Wang, Z.-Z., Mao, X.-J., et al: ‘Spatial locality-preserving feature coding for image classification’, Appl. Intell., 2017, 47, (1), pp. 148–157.
50. 50)
  - 51. Zhou, B., Lapedriza, A., Xiao, J., et al: ‘Learning deep features for scene recognition using places database’, Adv. Neural Inf. Process. Syst., 2014, 27, pp. 487–495.
51. 51)
  - 36. Chen, S., Yang, J., Luo, L., et al: ‘Low-rank latent pattern approximation with applications to robust image classification’, IEEE Trans. Image Process., 2017, 26, (11), pp. 5519–5530.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Effect of fusing features from multiple DCNN architectures in image classification

References

Related content