Effect of fusing features from multiple DCNN architectures in image classification
- Author(s): Thangarajah Akilan 1 ; Qingming Jonathan Wu 1 ; Hui Zhang 2
-
-
View affiliations
-
Affiliations:
1:
Department of Electrical and Computer Engineering , University of Windsor , 401 Sunset Avenue, Windsor , Canada ;
2: College of Electrical and Information Engineering , Changsha University of Science and Technology , Changsha , People's Republic of China
-
Affiliations:
1:
Department of Electrical and Computer Engineering , University of Windsor , 401 Sunset Avenue, Windsor , Canada ;
- Source:
Volume 12, Issue 7,
July
2018,
p.
1102 – 1110
DOI: 10.1049/iet-ipr.2017.0232 , Print ISSN 1751-9659, Online ISSN 1751-9667
Automatic image classification has become a necessary task to handle the rapidly growing digital image usage. It has branched out many algorithms and adopted new techniques. Among them, feature fusion-based image classification methods rely on hand-crafted features traditionally. However, it has been proven that the bottleneck features extracted through pre-trained convolutional neural networks (CNNs) can improve the classification accuracy. Thence, this study analyses the effect of fusing such cues from multiple architectures without being tied to any hand-crafted features. First, the CNN features are extracted from three different pre-trained models, namely AlexNet, VGG-16, and Inception-V3. Then, a generalised feature space is formed by employing principal component reconstruction and energy-level normalisation, where the features from individual CNN are mapped into a common subspace and embedded using arithmetic rules to construct fused feature vectors (FFVs). This transformation play a vital role in creating a representation that is appearance invariant by capturing complementary information of different high-level features. Finally, a multi-class linear support vector machine is trained. The experimental results demonstrate that such multi-modal CNN feature fusion is well suited for image/object classification tasks, but surprisingly it has not been explored so far by the computer vision research community extensively.
Inspec keywords: neural nets; image representation; principal component analysis; computer vision; image reconstruction; feature extraction; image classification
Other keywords: computer vision; image statistics representation; pre-trained deep convolutional neural networks; feature extraction; DCNN architectures; generalised feature space; energy-level normalisation; multiclass linear support vector machine; automatic image classification; principal component reconstruction; fused feature vectors; FFV
Subjects: Image recognition; Computer vision and image processing techniques; Neural computing techniques; Other topics in statistics; Other topics in statistics
References
-
-
1)
-
37. Dong, C., Loy, C.-C., He, K., et al: ‘Image super-resolution using deep convolutional networks’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, pp. 295–307.
-
-
2)
-
14. Gehler, P.-V., Nowozin, S.: ‘On feature combination for multiclass object classification’. Proc. IEEE 12th Int. Conf. Computer Vision (ICCV), 2009.
-
-
3)
-
17. Zeiler, M.D., Fergus, R.: ‘Visualizing and understanding convolutional networks’, 2014, pp. 818–833.
-
-
4)
-
41. Krizhevsky, A., Sutskever, I., Hinton, G.: ‘ImageNet classification with deep convolutional neural networks’. Advances in Neural Information Processing Systems (NIPS), 2012, pp. 1097–1105.
-
-
5)
-
34. Thangarajah, A., Wu, Q.M.J., Safaei, A., et al: ‘A late fusion approach for harnessing multi-CNN model high-level features’. IEEE Int. Conf. Systems, Man, and Cybernetics, SMC, 2017, pp. 566–571.
-
-
6)
-
24. Hayat, M., Khan, S.H., Bennamoun, M., et al: ‘A spatial layout and scale invariant feature representation for indoor scene classification’, IEEE Trans. Image Process., 2016, 25, (10), pp. 4829–4841.
-
-
7)
-
35. Cai, S., Zhang, L., Zuo, W., et al: ‘A probabilistic collaborative representation based approach for pattern classification’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2950–2959.
-
-
8)
-
4. Wang, T., Wu, D.J., Coates, A., et al: ‘End-to-end text recognition with convolutional neural networks’. Inter. Conf. Pattern Recognition (ICPR), 2012.
-
-
9)
-
38. Gao, Z., Fatih, P., Hongdong, L.: ‘Robust visual tracking with deep convolutional neural network based object proposals on pets’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 26–33.
-
-
10)
-
8. Akilan, T., Wu, Q.M.J., Yang, Y., et al: ‘Fusion of transfer learning features and its application in image classification’. IEEE Canadian Conf. Electrical and Computer Engineering (CCECE), 2017, pp. 1–5.
-
-
11)
-
20. Huang, X., Xu, Y., Yang, L.: ‘Local visual similarity descriptor for describing local region’. Int. Conf. Machine Vision, 2017, p. 103410S.
-
-
12)
-
50. Snoek, J., Larochelle, H., Adams, R.P.: ‘Practical Bayesian optimization of machine learning algorithms’, Adv. Neural Inf. Process. Syst., 2012, 25, pp. 2951–2959.
-
-
13)
-
39. Wenling, S., Kihyuk, S., Diogo, A., et al: ‘The extraordinary link between deep neural networks and the nature of the universe’, MIT Technol. Rev., 2016.
-
-
14)
-
15. Khan, F.-S., van de Weijer, J., Vanrell, M.: ‘Modulating shape features by color attention for object recognition’, Int. J. Comput. Vis., 2012, 98, pp. 49–64.
-
-
15)
-
6. Sharif Razavian, A., Azizpour, H., Sullivan, J., et al: ‘CNN features off-the shelf: an astounding baseline for recognition’. IEEE Conf. Computer Vision and Pattern Recognition Workshops, 2014, pp. 806–813.
-
-
16)
-
12. Yu, K., Zhang, T.: ‘Improved local coordinate coding using local tangents’. Proc. 27th Int. Conf. Machine Learning (ICML), 2010, pp. 1215–1222.
-
-
17)
-
42. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
-
-
18)
-
46. Fei-Fei, L., Fergus, R., Perona, P.: ‘Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories’, Comput. Vis. Image Underst., 2007, 106, (1), pp. 59–70.
-
-
19)
-
1. Kolouri, S., Park, S.-R., Rohde, G.-K.: ‘The radon cumulative distribution transform and its application to image classification’, IEEE Trans. Image Process., 2016, 25, (2), pp. 920–934.
-
-
20)
-
49. Everingham, M., Eslami, S.M.A., Van Gool, L., et al: ‘The Pascal visual object classes challenge: a retrospective’, Int. J. Comput. Vis., 2015, 111, (1), pp. 98–136.
-
-
21)
-
7. Sermanet, P., Eigen, D., Zhang, X., et al: ‘Overfeat: integrated recognition, localization and detection using convolutional networks’. Int. Conf. Learning Representations (ICLR), 2014.
-
-
22)
-
48. Quattoni, A., Torralba, A.: ‘Recognizing indoor scenes’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009.
-
-
23)
-
22. Sun, M., Han, T.-X., Liu, M.-C., et al: ‘Latent model ensemble with auto-localization’. Proc. Int. Conf. Pattern Recognition (ICPR), 2016.
-
-
24)
-
26. Dai, J., Li, Y., He, K., et al: ‘R-FCN: object detection via region-based fully convolutional networks’, Adv. Neural Inf. Process. Syst., 2016, pp. 379–387.
-
-
25)
-
10. Hoashi, H., Joutou, T., Yanai, K.: ‘Image recognition of 85 food categories by feature fusion’. Proc. Second Workshop on Multimedia for Cooking and Eating Activities, 2010.
-
-
26)
-
31. Thangarajah, A., Wu, Q.J., Yimin, Y.: ‘Fusion-based foreground enhancement for background subtraction using multivariate multi-model Gaussian distribution’, Inf. Sci., 2018, 430, pp. 414–431.
-
-
27)
-
28. Zhang, Y., Shi, B.: ‘Improving pooling method for regularization of convolutional networks based on the failure probability density’, Opt. – Int. J. Light Electron Opt., 2017, 145, (Suppl. C), pp. 258–265.
-
-
28)
-
40. Krizhevsky, A., Hinton, G.: ‘Learning multiple layers of features from tiny images’. Technical Report, University of Toronto, 2009.
-
-
29)
-
18. Dixit, M., Chen, S., Gao, D., et al: ‘Scene classification with semantic Fisher vectors’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 2974–2983.
-
-
30)
-
44. Luo, Y., Tao, D., Wen, Y., et al: ‘Tensor canonical correlation analysis for multi-view dimension reduction’, IEEE Trans. Knowl. Data Eng., 2015, 27, (11), pp. 3111–3124.
-
-
31)
-
11. Park, D.-C.: ‘Multiple feature-based classifier and its application to image classification’. IEEE Int. Conf. Data Mining Workshops, 2010, pp. 65–71.
-
-
32)
-
29. Snoek, J., Rippel, O., Swersky, K., et al: ‘Scalable Bayesian optimization using deep neural networks’. Int. Conf. Machine Learning (ICML), 2015, pp. 2171–2180.
-
-
33)
-
9. Li, P., Wang, Q., Zeng, H., et al: ‘Local log-Euclidean multivariate Gaussian descriptor and its application to image classification’, IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39, (4), pp. 803–817.
-
-
34)
-
27. Oquab, M., Bottou, L., Laptev, I., et al: ‘Learning and transferring mid-level image representations using convolutional neural networks’. Proc. IEEE Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1717–1724.
-
-
35)
-
23. Khan, S.H., Hayat, M., Bennamoun, M., et al: ‘A discriminative representation of convolutional features for indoor scene recognition’, IEEE Trans. Image Process., 2016, 25, (7), pp. 3372–3383.
-
-
36)
-
43. Szegedy, C., Vanhoucke, V., Ioffe, S., et al: ‘Rethinking the inception architecture for computer vision’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826.
-
-
37)
-
13. Zhang, J., Marszalek, M., Lazebink, S., et al: ‘Local features and kernels for classification of texture and object categories: a comprehensive study’, Int. J. Comput. Vis., 2007, 73, (2), pp. 213–238.
-
-
38)
-
45. Hsu, C.-W., Lin, C.-J.: ‘A comparison of methods for multiclass support vector machines’, IEEE Trans. Neural Netw., 2002, 13, (2), pp. 415–425.
-
-
39)
-
32. Ai, D., Duan, G., Han, X., et al: ‘Multiple feature and fusion based on generalized n-dimensional independent component analysis’. Int. Conf. Pattern Recognition (ICPR), 2012, vol. 21, pp. 971–974.
-
-
40)
-
47. Griffin, G., Holub, A., Perona, P.: ‘The Caltech-256: Caltech’. Technical Report, 2007, 7694.
-
-
41)
-
3. Feng, J., Liu, X., Dong, Y., et al: ‘Structural difference histogram representation for texture image classification’, IET Image Process., 2017, 11, (2), pp. 118–125.
-
-
42)
-
33. Bahrampour, S., Nasrabadi, N.M., Ray, A., et al: ‘Multimodal task-driven dictionary learning for image classification’, IEEE Trans. Image Process., 2016, 25, pp. 24–38.
-
-
43)
-
30. Yang, J., Yang, M.-H.: ‘Learning hierarchical image representation with sparsity, saliency and locality’. Proc. British Machine Vision Conf., 2011, pp. 1–11.
-
-
44)
-
19. Lin, G., Zhu, H., Kang, X., et al: ‘Feature structure fusion modelling for classification’, IET Image Process., 2015, 9, (10), pp. 883–888.
-
-
45)
-
16. Xiao, J., Hays, K., Ehinger, A., et al: ‘Sun database: large-scale scene recognition from abbey to zoo’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3485–3492.
-
-
46)
-
25. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016.
-
-
47)
-
2. Ristin, M., Gal, J., Guillaumin, M., et al: ‘From categories to subcategories: large-scale image classification with partial class label refinement’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 231–239.
-
-
48)
-
5. Fernando, T.-B., Fromont, E., Muselet, D., et al: ‘Discriminative feature fusion for image classification’. Inter. Conf. Pattern Recognition (ICPR), 2012, pp. 3434–3441.
-
-
49)
-
21. Zhu, Q.-H., Wang, Z.-Z., Mao, X.-J., et al: ‘Spatial locality-preserving feature coding for image classification’, Appl. Intell., 2017, 47, (1), pp. 148–157.
-
-
50)
-
51. Zhou, B., Lapedriza, A., Xiao, J., et al: ‘Learning deep features for scene recognition using places database’, Adv. Neural Inf. Process. Syst., 2014, 27, pp. 487–495.
-
-
51)
-
36. Chen, S., Yang, J., Luo, L., et al: ‘Low-rank latent pattern approximation with applications to robust image classification’, IEEE Trans. Image Process., 2017, 26, (11), pp. 5519–5530.
-
-
1)