Effect of fusing features from multiple DCNN architectures in image classification

Effect of fusing features from multiple DCNN architectures in image classification

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Image Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Automatic image classification has become a necessary task to handle the rapidly growing digital image usage. It has branched out many algorithms and adopted new techniques. Among them, feature fusion-based image classification methods rely on hand-crafted features traditionally. However, it has been proven that the bottleneck features extracted through pre-trained convolutional neural networks (CNNs) can improve the classification accuracy. Thence, this study analyses the effect of fusing such cues from multiple architectures without being tied to any hand-crafted features. First, the CNN features are extracted from three different pre-trained models, namely AlexNet, VGG-16, and Inception-V3. Then, a generalised feature space is formed by employing principal component reconstruction and energy-level normalisation, where the features from individual CNN are mapped into a common subspace and embedded using arithmetic rules to construct fused feature vectors (FFVs). This transformation play a vital role in creating a representation that is appearance invariant by capturing complementary information of different high-level features. Finally, a multi-class linear support vector machine is trained. The experimental results demonstrate that such multi-modal CNN feature fusion is well suited for image/object classification tasks, but surprisingly it has not been explored so far by the computer vision research community extensively.


    1. 1)
      • 1. Kolouri, S., Park, S.-R., Rohde, G.-K.: ‘The radon cumulative distribution transform and its application to image classification’, IEEE Trans. Image Process., 2016, 25, (2), pp. 920934.
    2. 2)
      • 2. Ristin, M., Gal, J., Guillaumin, M., et al: ‘From categories to subcategories: large-scale image classification with partial class label refinement’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 231239.
    3. 3)
      • 3. Feng, J., Liu, X., Dong, Y., et al: ‘Structural difference histogram representation for texture image classification’, IET Image Process., 2017, 11, (2), pp. 118125.
    4. 4)
      • 4. Wang, T., Wu, D.J., Coates, A., et al: ‘End-to-end text recognition with convolutional neural networks’. Inter. Conf. Pattern Recognition (ICPR), 2012.
    5. 5)
      • 5. Fernando, T.-B., Fromont, E., Muselet, D., et al: ‘Discriminative feature fusion for image classification’. Inter. Conf. Pattern Recognition (ICPR), 2012, pp. 34343441.
    6. 6)
      • 6. Sharif Razavian, A., Azizpour, H., Sullivan, J., et al: ‘CNN features off-the shelf: an astounding baseline for recognition’. IEEE Conf. Computer Vision and Pattern Recognition Workshops, 2014, pp. 806813.
    7. 7)
      • 7. Sermanet, P., Eigen, D., Zhang, X., et al: ‘Overfeat: integrated recognition, localization and detection using convolutional networks’. Int. Conf. Learning Representations (ICLR), 2014.
    8. 8)
      • 8. Akilan, T., Wu, Q.M.J., Yang, Y., et al: ‘Fusion of transfer learning features and its application in image classification’. IEEE Canadian Conf. Electrical and Computer Engineering (CCECE), 2017, pp. 15.
    9. 9)
      • 9. Li, P., Wang, Q., Zeng, H., et al: ‘Local log-Euclidean multivariate Gaussian descriptor and its application to image classification’, IEEE Trans. Pattern Anal. Mach. Intell., 2017, 39, (4), pp. 803817.
    10. 10)
      • 10. Hoashi, H., Joutou, T., Yanai, K.: ‘Image recognition of 85 food categories by feature fusion’. Proc. Second Workshop on Multimedia for Cooking and Eating Activities, 2010.
    11. 11)
      • 11. Park, D.-C.: ‘Multiple feature-based classifier and its application to image classification’. IEEE Int. Conf. Data Mining Workshops, 2010, pp. 6571.
    12. 12)
      • 12. Yu, K., Zhang, T.: ‘Improved local coordinate coding using local tangents’. Proc. 27th Int. Conf. Machine Learning (ICML), 2010, pp. 12151222.
    13. 13)
      • 13. Zhang, J., Marszalek, M., Lazebink, S., et al: ‘Local features and kernels for classification of texture and object categories: a comprehensive study’, Int. J. Comput. Vis., 2007, 73, (2), pp. 213238.
    14. 14)
      • 14. Gehler, P.-V., Nowozin, S.: ‘On feature combination for multiclass object classification’. Proc. IEEE 12th Int. Conf. Computer Vision (ICCV), 2009.
    15. 15)
      • 15. Khan, F.-S., van de Weijer, J., Vanrell, M.: ‘Modulating shape features by color attention for object recognition’, Int. J. Comput. Vis., 2012, 98, pp. 4964.
    16. 16)
      • 16. Xiao, J., Hays, K., Ehinger, A., et al: ‘Sun database: large-scale scene recognition from abbey to zoo’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010, pp. 34853492.
    17. 17)
      • 17. Zeiler, M.D., Fergus, R.: ‘Visualizing and understanding convolutional networks’, 2014, pp. 818833.
    18. 18)
      • 18. Dixit, M., Chen, S., Gao, D., et al: ‘Scene classification with semantic Fisher vectors’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 29742983.
    19. 19)
      • 19. Lin, G., Zhu, H., Kang, X., et al: ‘Feature structure fusion modelling for classification’, IET Image Process., 2015, 9, (10), pp. 883888.
    20. 20)
      • 20. Huang, X., Xu, Y., Yang, L.: ‘Local visual similarity descriptor for describing local region’. Int. Conf. Machine Vision, 2017, p. 103410S.
    21. 21)
      • 21. Zhu, Q.-H., Wang, Z.-Z., Mao, X.-J., et al: ‘Spatial locality-preserving feature coding for image classification’, Appl. Intell., 2017, 47, (1), pp. 148157.
    22. 22)
      • 22. Sun, M., Han, T.-X., Liu, M.-C., et al: ‘Latent model ensemble with auto-localization’. Proc. Int. Conf. Pattern Recognition (ICPR), 2016.
    23. 23)
      • 23. Khan, S.H., Hayat, M., Bennamoun, M., et al: ‘A discriminative representation of convolutional features for indoor scene recognition’, IEEE Trans. Image Process., 2016, 25, (7), pp. 33723383.
    24. 24)
      • 24. Hayat, M., Khan, S.H., Bennamoun, M., et al: ‘A spatial layout and scale invariant feature representation for indoor scene classification’, IEEE Trans. Image Process., 2016, 25, (10), pp. 48294841.
    25. 25)
      • 25. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016.
    26. 26)
      • 26. Dai, J., Li, Y., He, K., et al: ‘R-FCN: object detection via region-based fully convolutional networks’, Adv. Neural Inf. Process. Syst., 2016, pp. 379387.
    27. 27)
      • 27. Oquab, M., Bottou, L., Laptev, I., et al: ‘Learning and transferring mid-level image representations using convolutional neural networks’. Proc. IEEE Computer Vision and Pattern Recognition (CVPR), 2014, pp. 17171724.
    28. 28)
      • 28. Zhang, Y., Shi, B.: ‘Improving pooling method for regularization of convolutional networks based on the failure probability density’, Opt. – Int. J. Light Electron Opt., 2017, 145, (Suppl. C), pp. 258265.
    29. 29)
      • 29. Snoek, J., Rippel, O., Swersky, K., et al: ‘Scalable Bayesian optimization using deep neural networks’. Int. Conf. Machine Learning (ICML), 2015, pp. 21712180.
    30. 30)
      • 30. Yang, J., Yang, M.-H.: ‘Learning hierarchical image representation with sparsity, saliency and locality’. Proc. British Machine Vision Conf., 2011, pp. 111.
    31. 31)
      • 31. Thangarajah, A., Wu, Q.J., Yimin, Y.: ‘Fusion-based foreground enhancement for background subtraction using multivariate multi-model Gaussian distribution’, Inf. Sci., 2018, 430, pp. 414431.
    32. 32)
      • 32. Ai, D., Duan, G., Han, X., et al: ‘Multiple feature and fusion based on generalized n-dimensional independent component analysis’. Int. Conf. Pattern Recognition (ICPR), 2012, vol. 21, pp. 971974.
    33. 33)
      • 33. Bahrampour, S., Nasrabadi, N.M., Ray, A., et al: ‘Multimodal task-driven dictionary learning for image classification’, IEEE Trans. Image Process., 2016, 25, pp. 2438.
    34. 34)
      • 34. Thangarajah, A., Wu, Q.M.J., Safaei, A., et al: ‘A late fusion approach for harnessing multi-CNN model high-level features’. IEEE Int. Conf. Systems, Man, and Cybernetics, SMC, 2017, pp. 566571.
    35. 35)
      • 35. Cai, S., Zhang, L., Zuo, W., et al: ‘A probabilistic collaborative representation based approach for pattern classification’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 29502959.
    36. 36)
      • 36. Chen, S., Yang, J., Luo, L., et al: ‘Low-rank latent pattern approximation with applications to robust image classification’, IEEE Trans. Image Process., 2017, 26, (11), pp. 55195530.
    37. 37)
      • 37. Dong, C., Loy, C.-C., He, K., et al: ‘Image super-resolution using deep convolutional networks’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 38, pp. 295307.
    38. 38)
      • 38. Gao, Z., Fatih, P., Hongdong, L.: ‘Robust visual tracking with deep convolutional neural network based object proposals on pets’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2633.
    39. 39)
      • 39. Wenling, S., Kihyuk, S., Diogo, A., et al: ‘The extraordinary link between deep neural networks and the nature of the universe’, MIT Technol. Rev., 2016.
    40. 40)
      • 40. Krizhevsky, A., Hinton, G.: ‘Learning multiple layers of features from tiny images’. Technical Report, University of Toronto, 2009.
    41. 41)
      • 41. Krizhevsky, A., Sutskever, I., Hinton, G.: ‘ImageNet classification with deep convolutional neural networks’. Advances in Neural Information Processing Systems (NIPS), 2012, pp. 10971105.
    42. 42)
      • 42. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 19.
    43. 43)
      • 43. Szegedy, C., Vanhoucke, V., Ioffe, S., et al: ‘Rethinking the inception architecture for computer vision’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 28182826.
    44. 44)
      • 44. Luo, Y., Tao, D., Wen, Y., et al: ‘Tensor canonical correlation analysis for multi-view dimension reduction’, IEEE Trans. Knowl. Data Eng., 2015, 27, (11), pp. 31113124.
    45. 45)
      • 45. Hsu, C.-W., Lin, C.-J.: ‘A comparison of methods for multiclass support vector machines’, IEEE Trans. Neural Netw., 2002, 13, (2), pp. 415425.
    46. 46)
      • 46. Fei-Fei, L., Fergus, R., Perona, P.: ‘Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories’, Comput. Vis. Image Underst., 2007, 106, (1), pp. 5970.
    47. 47)
      • 47. Griffin, G., Holub, A., Perona, P.: ‘The Caltech-256: Caltech’. Technical Report, 2007, 7694.
    48. 48)
      • 48. Quattoni, A., Torralba, A.: ‘Recognizing indoor scenes’. Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009.
    49. 49)
      • 49. Everingham, M., Eslami, S.M.A., Van Gool, L., et al: ‘The Pascal visual object classes challenge: a retrospective’, Int. J. Comput. Vis., 2015, 111, (1), pp. 98136.
    50. 50)
      • 50. Snoek, J., Larochelle, H., Adams, R.P.: ‘Practical Bayesian optimization of machine learning algorithms’, Adv. Neural Inf. Process. Syst., 2012, 25, pp. 29512959.
    51. 51)
      • 51. Zhou, B., Lapedriza, A., Xiao, J., et al: ‘Learning deep features for scene recognition using places database’, Adv. Neural Inf. Process. Syst., 2014, 27, pp. 487495.

Related content

This is a required field
Please enter a valid email address