© The Institution of Engineering and Technology
Effective representation and classification of scenes using very high resolution (VHR) remote sensing images cover a wide range of applications. Although robust low-level image features have been proven to be effective for scene classification, they are not semantically meaningful and thus have difficulty to deal with challenging visual recognition tasks. In this study, the authors propose a new and effective auto-encoder-based method to learn a shared mid-level visual dictionary. This dictionary serves as a shared and universal basis to discover mid-level visual elements. On the one hand, the mid-level visual dictionary learnt using machine learning technique is more discriminative and contains rich semantic information, compared with the traditional low-level visual words. On the other hand, the mid-level visual dictionary is more robust to occlusions and image clutters. In the authors' scene-classification scheme, they use discriminative mid-level visual elements, rather than individual pixels or low-level image features, to represent images. This new image representation is able to capture much of the high-level meaning and contents of the image, facilitating challenging remote sensing image scene-classification tasks. Comprehensive evaluations on a challenging VHR remote sensing images data set and comparisons with state-of-the-art approaches demonstrate the effectiveness and superiority of their study.
References
-
-
1)
-
34. Song, H.O., Zickler, S., Althoff, T., et al: ‘Sparselet models for efficient multiclass object detection’. Proc. European Conf. Computer Vision, Firenze, Italy, 2012, pp. 802–815.
-
2)
-
18. Han, J., Zhou, P., Zhang, D., et al: ‘Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding’, ISPRS J. Photogramm. Remote Sens., 2014, 89, pp. 37–48 (doi: 10.1016/j.isprsjprs.2013.12.011).
-
3)
-
43. Nocedal, J.: ‘Updating quasi-Newton matrices with limited storage’, Math. Comput., 1980, 35, (151), pp. 773–782 (doi: 10.1090/S0025-5718-1980-0572855-7).
-
4)
-
G.E. Hinton ,
R.R. Salakhutdinov
.
Reducing the dimensionality of data with neural networks.
Science
,
5786 ,
504 -
507
-
5)
-
15. Tang, J., Shao, L., Li, X.: ‘Efficient dictionary learning for visual categorization’, Comput. Vis. Image Underst., 2014, 124, pp. 91–98 (doi: 10.1016/j.cviu.2014.02.007).
-
6)
-
10. Moustakidis, S., Mallinis, G., Koutsias, N., Theocharis, J.B., Petridis, V.: ‘SVM-based fuzzy decision trees for classification of high spatial resolution remote sensing images’, IEEE Trans. Geosci. Remote Sens., 2012, 50, (1), pp. 149–169 (doi: 10.1109/TGRS.2011.2159726).
-
7)
-
D.E. Rumelhart ,
G.E. Hinton ,
J.R. Williams
.
Learning representations by back-propagating errors.
Nature
,
533 -
536
-
8)
-
31. Juneja, M., Vedaldi, A., Jawahar, C.V., Zisserman, A.: ‘Blocks that shout: distinctive parts for scene classification’. Proc. IEEE Int. Conf. Computer Vision Pattern Recognition, Portland, USA, 2013, pp. 923–930.
-
9)
-
16. Ren, J., Zabalza, J., Marshall, S., Zheng, J.: ‘Effective feature extraction and data reduction in remote sensing using hyperspectral imaging’, IEEE Signal Process. Mag., 2014, 31, (4), pp. 149–154 (doi: 10.1109/MSP.2014.2312071).
-
10)
-
5. Yang, Y., Newsam, S.: ‘Spatial pyramid co-occurrence for image classification’. Proc. IEEE Int. Conf. on Computer Vision, Barcelona, Spain, 2011, pp. 1465–1472.
-
11)
-
6. Yang, Y., Newsam, S.: ‘Geographic image retrieval using local invariant features’, IEEE Trans. Geosci. Remote Sens., 2013, 51, (2), pp. 818–832 (doi: 10.1109/TGRS.2012.2205158).
-
12)
-
41. Ng, A.: ‘CS294A lecture notes: sparse autoencoder’ (Stanford University, Palo Alto, 2010).
-
13)
-
21. Wen, X., Shao, L., Fang, W., Xue, Y.: ‘Efficient feature selection and classification for vehicle detection’, IEEE Trans. Circuits Syst. Video Technol., 2014, .
-
14)
-
6. Li, H., Gu, H., Han, Y., Yang, J.: ‘Object-oriented classification of high-resolution remote sensing imagery based on an improved colour structure code and a support vector machine’, Int. J. Remote Sens., 2010, 31, (6), pp. 1453–1470 (doi: 10.1080/01431160903475266).
-
15)
-
9. Longbotham, N., Chaapel, C., Bleiler, L., et al: ‘Very high resolution multiangle urban classification analysis’, IEEE Trans. Geosci. Remote Sens., 2012, 50, (4), pp. 1155–1170 (doi: 10.1109/TGRS.2011.2165548).
-
16)
-
37. Kokkinos, I.: ‘Shufflets: shared mid-level parts for fast object detection’. Proc. IEEE Int. Conf. Computer Vision, Sydney, Australia, 2013, pp. 1393–1400.
-
17)
-
12. Tuia, D., Volpi, M., Dalla Mura, M., Rakotomamonjy, A., Flamary, R.: ‘Automatic feature learning for spatio-spectral image classification with sparse SVM’, IEEE Trans. Geosci. Remote Sens., 2014, 52, (10), pp. 6062–6074 (doi: 10.1109/TGRS.2013.2294724).
-
18)
-
23. Fernando, B., Fromont, E., Tuytelaars, T.: ‘Mining mid-level features for image classification’, Int. J. Comput. Vis., 2014, 108, (3), pp. 186–203 (doi: 10.1007/s11263-014-0700-1).
-
19)
-
14. Zabalza, J., Ren, J., Marshall, S., Wang, J.: ‘Singular spectrum analysis for effective feature extraction in hyperspectral imaging’, IEEE Geosci. Remote Sens. Lett., 2014, 11, (11), pp. 1886–1890 (doi: 10.1109/LGRS.2014.2312754).
-
20)
-
29. Doersch, C., Gupta, A., Efros, A.A.: ‘Mid-level visual element discovery as discriminative mode seeking’. Proc. Conf. Advances Neural Information Processing Syst., NV, USA, 2013, pp. 494–502.
-
21)
-
22. Cheng, G., Han, J., Zhou, P., Guo, L.: ‘Multi-class geospatial object detection and geographic image classification based on collection of part detectors’, ISPRS J. Photogramm. Remote Sens., 2014, 98, pp. 119–132 (doi: 10.1016/j.isprsjprs.2014.10.002).
-
22)
-
1. Cheng, G., Guo, L., Zhao, T., et al: ‘Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and pLSA’, Int. J. Remote Sens., 2013, 34, (1), pp. 45–59 (doi: 10.1080/01431161.2012.705443).
-
23)
-
4. Yang, Y., Newsam, S.: ‘Bag-of-visual-words and spatial extensions for land-use classification’. Proc. ACM SIGSPATIAL Int. Conf. Advances in Geographic Information Systems, San Jose, USA, 2010, pp. 270–279.
-
24)
-
28. Li, Q., Wu, J., Tu, Z.: ‘Harvesting mid-level visual concepts from large-scale internet images’. Proc. IEEE Int. Conf. Computer Vision Pattern Recognition, Portland, USA, 2013, pp. 851–858.
-
25)
-
8. Xu, S., Fang, T., Li, D., Wang, S.: ‘Object classification of aerial images with bag-of-visual words’, IEEE Geosci. Remote Sens. Lett., 2010, 7, (2), pp. 366–370 (doi: 10.1109/LGRS.2009.2035644).
-
26)
-
D.G. Lowe
.
Distinctive image features from scale-invariant keypoints.
Int. J. Comput. Vis
,
2 ,
91 -
110
-
27)
-
42. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: ‘Extracting and composing robust features with denoising autoencoders’. Proc. Conf. Advances Neural Information Processing Systems, British Columbia, Canada, 2008, pp. 1096–1103.
-
28)
-
11. Munoz-Mari, J., Tuia, D., Camps-Valls, G.: ‘Semisupervised classification of remote sensing images with active queries’, IEEE Trans. Geosci. Remote Sens., 2012, 50, (10), pp. 3751–3763 (doi: 10.1109/TGRS.2012.2185504).
-
29)
-
7. Huang, X., Zhang, L., Wang, L.: ‘Evaluation of morphological texture features for mangrove forest mapping and species discrimination using multispectral IKONOS imagery’, IEEE Geosci. Remote Sens. Lett., 2009, 6, (3), pp. 393–397 (doi: 10.1109/LGRS.2009.2014398).
-
30)
-
30. Sun, J., Ponce, J.: ‘Learning discriminative part detectors for image classification and cosegmentation’. Proc. IEEE Int. Conf. Computer Vision, Sydney, Australia, 2013, pp. 3400–3407.
-
31)
-
33. Zhu, F., Shao, L.: ‘Weakly-supervised cross-domain dictionary learning for visual recognition’, Int. J. Comput. Vis., 2014, 109, (1–2), pp. 42–59 (doi: 10.1007/s11263-014-0703-y).
-
32)
-
25. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. Proc. IEEE Int. Conf. Computer Vision Pattern Recognition, NY, USA, 2006, pp. 2169–2178.
-
33)
-
13. Shao, L., Liu, L., Li, X.: ‘Feature learning for image classification via multiobjective genetic programming’, IEEE Trans. Neural Netw. Learn. Syst., 2014, 25, (7), pp. 1359–1371 (doi: 10.1109/TNNLS.2013.2293418).
-
34)
-
32. Zhang, L., Zhen, X., Shao, L.: ‘Learning object-to-class kernels for scene classification’, IEEE Trans. Image Process., 2014, 23, (8), pp. 3241–3253 (doi: 10.1109/TIP.2014.2328894).
-
35)
-
20. Cheng, G., Han, J., Zhou, P., Guo, L.: ‘Scalable multi-class geospatial object detection in high-spatial-resolution remote sensing images’. Proc. IEEE Int. Geoscience and Remote Sensing Symp., Quebec, Canada, 2014, pp. 2479–2482.
-
36)
-
27. Singh, S., Gupta, A., Efros, A.A.: ‘Unsupervised discovery of mid-level discriminative patches’. Proc. European Conf. Computer Vision, Firenze, Italy, 2012, pp. 73–86.
-
37)
-
17. Cheng, G., Han, J., Guo, L., et al: ‘Object detection in remote sensing imagery using a discriminatively trained mixture model’, ISPRS J. Photogramm. Remote Sens., 2013, 85, pp. 32–43 (doi: 10.1016/j.isprsjprs.2013.08.001).
-
38)
-
19. Bhagavathy, S., Manjunath, B.S.: ‘Modeling and detection of geospatial objects using texture motifs’, IEEE Trans. Geosci. Remote Sens., 2006, 44, (12), pp. 3706–3715 (doi: 10.1109/TGRS.2006.881741).
-
39)
-
3. Cheriyadat, A.M.: ‘Unsupervised feature learning for aerial scene classification’, IEEE Trans. Geosci. Remote Sens., 2014, 52, (1), pp. 439–451 (doi: 10.1109/TGRS.2013.2241444).
-
40)
-
24. Li, F.F., Perona, P.: ‘A Bayesian hierarchical model for learning natural scene categories’. Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005, pp. 524–531.
-
41)
-
P.F. Felzenszwalb ,
R.B. Girshick ,
D. McAllester ,
D. Ramanan
.
Object detection with discriminatively trained part-based models.
IEEE Trans. Pattern Anal. Mach. Intell.
,
9 ,
1627 -
1645
-
42)
-
36. Girshick, R.B., Song, H.O., Darrell, T.: ‘Discriminatively activated sparselets’. Proc. IEEE Int. Conf. Machine Learning, Atlanta, USA, 2013, pp. 196–204.
-
43)
-
38. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. Proc. IEEE Int. Conf. Computer Vision Pattern Recognition, San Diego, USA, 2005, pp. 886–893.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2014.0270
Related content
content/journals/10.1049/iet-cvi.2014.0270
pub_keyword,iet_inspecKeyword,pub_concept
6
6