Hierarchical spatial pyramid max pooling based on SIFT features and sparse coding for image classification

Hong Han; Qiqiang Han; Xiaojun Li; Jianyin Gu

Hierarchical spatial pyramid max pooling based on SIFT features and sparse coding for image classification

View Fulltext

Author(s): Hong Han ¹ ; Qiqiang Han ¹ ; Xiaojun Li ¹ ; Jianyin Gu ¹
- Affiliations: 1: School of Electronic Engineering, Xidian University, No. 2 South Taibai Road, Xi'an 710071, People's Republic of China
Source: Volume 7, Issue 2, April 2013, p. 144 – 150
DOI: 10.1049/iet-cvi.2012.0145 , Print ISSN 1751-9632, Online ISSN 1751-9640

Received 12/07/2012, Accepted 22/12/2012, Revised 29/10/2012, Published

It is essential to build good image representations for many computer vision tasks. In this study, the authors propose a hierarchical spatial pyramid max pooling method based on scale-invariant feature transform (SIFT) features and sparse coding, which builds image representations through a hierarchical network. It includes three parts: SIFT features’ extraction, sparse coding and spatial pyramid max pooling. To mimic visual cortex, spatial pyramid max pooling is, firstly, performed on the original SIFT features in the image patches, which distils the features and extracts the most distinctive and significant feature, the SIFT-pooled feature, in each local patch, instead of using the original SIFT features as usual. Then, a dictionary is trained using some random SIFT-pooled features and sparse coding is performed using the trained dictionary for all SIFT-pooled features through K-singular value decomposition algorithm. Finally, on the sparse codes of all image patches, spatial pyramid max pooling is carried again on the image level. The image representations will be built by concatenating the pooling features of each level. The authors use the algorithm and simple linear support vector machine (SVM) for image classification on three datasets: Caltech-101, Caltech-256 and 15-Scenes and the experimental results show that the authors algorithm can reach a competitive performance compared with recently published results.

References

1. 1)
  - 28. van Gemert, J.C., Geusebroek, J.-M., Veenman, C.J., Smeulders, A.W.M.: ‘Kernel codebooks for scene categorization’. Proc. Eur. Conf. Computer Vision, 2008, no. 3, pp. 696–709.
2. 2)
  - 10. Li, F.-F., Fergus, R., Perona, P.: ‘Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories’. Int. Conf. Comput. Vision Patt. Recogn. Workshop on Generative-Model Based Vision, 2004.
3. 3)
  - 13. Vizireanu, D.N.: ‘Generalizations of binary morphological shape decomposition’, J. Electron. Imaging, 2007, 16, (1), 01302, pp. 1–6 (doi: 10.1117/1.2712464).
4. 4)
  - 14. Vizireanu, D.N., Udrea, R.M.: ‘Visual-oriented morphological foreground content grayscale frames interpolation method’, J. Electron. Imaging, 2009, V. 18, (2), 020502, pp. 1–3.
5. 5)
  - 15. Udrea, R.M., Vizireanu, D.N.: ‘Iterative generalization of morphological skeleton’, J. Electron. Imaging, 2007, 16, (1), 010501, pp. 1–3 (doi: 10.1117/1.2713739).
6. 6)
  - 20. Serre, T., Wolf, L., Poggio, T.: ‘Object recognition with features inspired by visual cortex’. Int. Conf. Comput. Vision Patt. Recogn., 2007, no. 2, pp. 994–1000.
7. 7)
  - 25. Gao, S., Tsang, I., Chia, L., Zhao, P.: ‘Local features are not lonely-Laplacian sparse coding for image classification’. Int. Conf. Comput. Vision Patt. Recogn., 2010, pp. 3555–3561.
8. 8)
  - 1. Yu, K., Lin, Y., Lafferty, J.: ‘Learning image representations from the pixel level via hierarchicalSparse coding’. Int. Conf. Comput. Vision Patt. Recogn., 2011, pp. 1713–1720.
9. 9)
  - 6. Coates, A., Ng, A.: ‘The importance of encoding versus training with sparse coding and vector quantization’. Int. Conf. Mach. Learn., 2011, pp. 921–928.
10. 10)
  - 27. van Gemert, J.C., Veenman, C.J., Smeulders, A.W.M., Geusebroek, J.M.: ‘Visual word ambiguity’, IEEE Patt. Anal. Mach. Intell., 2010, 32, (7), pp. 1271–1283 (doi: 10.1109/TPAMI.2009.132).
11. 11)
  - 3. Tedmori, S., AI-Najdawi, N.: ‘Hierarchical stochastic fast search motion estimation algorithm’, IET Comput. Vis., 2012, 6, (1), pp. 21–28 (doi: 10.1049/iet-cvi.2010.0188).
12. 12)
  - 7. Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Guo, Y.: ‘Locality-constrained linear coding for image classification’. Int. Conf. Comput. Vision Patt. Recogn., 2010, pp. 3360–3367.
13. 13)
  - 9. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. Int. Conf. Comput. Vision Patt. Recogn., 2006, pp. 2169–2178.
14. 14)
  - 16. Vizireanu, D.N., Halunga, S., Marghescu, G.: ‘Morphological skeleton decomposition interframe interpolationmethod’, J. Electron. Imaging, 2010, 19, (2), 023018, pp. 1–3 (doi: 10.1117/1.3452321).
15. 15)
  - 23. Zeiler, M., Krishnan, D., Taylor, G., Fergus, R.: ‘Deconvolutional networks’. Int. Conf. Comput. Vision Patt. Recogn., 2010, pp. 2528–2535.
16. 16)
  - 4. Lowe, D.: ‘Distinctive image features from scale-invariant keypoints’, Int. J. Comput. Vis., 2004, 60, (4), pp. 91–110 (doi: 10.1023/B:VISI.0000029664.99615.94).
17. 17)
  - 17. Aharon, M., Elad, M., Bruckstein, A.: ‘K-SVD: an algorithm for designing over-complete dictionaries for sparse representation’, IEEE Trans. Signal Process. 2006, 54, (11), pp. 4311–4322 (doi: 10.1109/TSP.2006.881199).
18. 18)
  - 11. Griffin, G., Holub, G., Perona, P.A.D.: ‘Caltech-256 object category dataset’. Technical Report 7694, California Institute of Technology, 2007.
19. 19)
  - 12. Yu, K., Lin, Y., Lafferty, J.: ‘Learning image representations from the pixel level via hierarchical sparse coding’. Int. Conf. Comput. Vision Patt. Recogn., 2011, pp. 1713–1720.
20. 20)
  - 5. Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: ‘Learning mid-level features for recognition’. Int. Conf. Comput. Vision Patt. Recogn., 2010, pp. 2559–2566.
21. 21)
  - 8. Yang, J., Ku, K., Gong, Y., Huang, T.: ‘Linear spatial pyramid matching using sparse coding for image classification’. Int. Conf. Comput. Vision Patt. Recogn., 2009, pp. 1794–1801.
22. 22)
  - 24. Boiman, O., Shechtman, E., Irani, M.: ‘In defense of nearest-neighbor based image classification’. Int. Conf. Comput. Vision Patt. Recogn., 2008.
23. 23)
  - 29. Jia, Y., Huang, C., Darrell, T.: ‘Beyond spatial pyramid: receptive field learning for pooled image features’. Int. Conf. Comput. Vision Patt. Recogn., 2012.
24. 24)
  - 22. Rubinstein, R., Zibulevsky, M., Elad, M.: ‘Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit’. Tech. Report, Israel Institute of Technology, 2008.
25. 25)
  - 19. Ranzato, M., Huang, F., Boureau, Y., LeCun, Y.: ‘Unsupervised learning of invariant feature hierarchies with applications to object recognition’. Int. Conf. Comput. Vision Patt. Recogn., 2007.
26. 26)
  - 18. Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: ‘Self-taught learning: transfer learning from unlabeled data’. Int. Conf. Mach. Learn., 2007, pp. 759–766.
27. 27)
  - 2. Bo, L., Ren, X., Fox, D.: ‘Hierarchical matching pursuit for image classification: architecture and fast algorithms’, Adv. Neural. Inf. Process. Syst., 2011, 23, pp. 2115–2123.
28. 28)
  - 26. Kim, J., Grauman, K.: ‘Asymmetric region-to-image matching for comparing images with generic object categories’. Int. Conf. Comput. Vision Patt. Recogn., 2010, pp. 2344–2351.
29. 29)
  - 21. Boureau, Y., Ponce, J., LeCun, Y.: ‘A theoretical analysis of feature pooling in visual recognition’. Int. Conf. Mach. Learn., 2010, pp. 111–118.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Hierarchical spatial pyramid max pooling based on SIFT features and sparse coding for image classification

References

Related content