access icon free Auto-encoder-based shared mid-level visual dictionary learning for scene classification using very high resolution remote sensing images

Effective representation and classification of scenes using very high resolution (VHR) remote sensing images cover a wide range of applications. Although robust low-level image features have been proven to be effective for scene classification, they are not semantically meaningful and thus have difficulty to deal with challenging visual recognition tasks. In this study, the authors propose a new and effective auto-encoder-based method to learn a shared mid-level visual dictionary. This dictionary serves as a shared and universal basis to discover mid-level visual elements. On the one hand, the mid-level visual dictionary learnt using machine learning technique is more discriminative and contains rich semantic information, compared with the traditional low-level visual words. On the other hand, the mid-level visual dictionary is more robust to occlusions and image clutters. In the authors' scene-classification scheme, they use discriminative mid-level visual elements, rather than individual pixels or low-level image features, to represent images. This new image representation is able to capture much of the high-level meaning and contents of the image, facilitating challenging remote sensing image scene-classification tasks. Comprehensive evaluations on a challenging VHR remote sensing images data set and comparisons with state-of-the-art approaches demonstrate the effectiveness and superiority of their study.

Inspec keywords: geophysical image processing; learning (artificial intelligence); image resolution; image classification; remote sensing; image representation

Other keywords: scene classification; autoencoder-based shared midlevel visual dictionary learning; image representation; scene-classification scheme; very high resolution remote sensing images; image clutters; VHR remote sensing images; mid-level visual dictionary; auto-encoder-based method; mid-level visual elements; rich semantic information; occlusions; machine learning technique

Subjects: Computer vision and image processing techniques; Atmospheric, ionospheric and magnetospheric techniques and equipment; Instrumentation and techniques for geophysical, hydrospheric and lower atmosphere research; Image recognition; Knowledge engineering techniques; Geophysics computing

References

    1. 1)
      • 34. Song, H.O., Zickler, S., Althoff, T., et al: ‘Sparselet models for efficient multiclass object detection’. Proc. European Conf. Computer Vision, Firenze, Italy, 2012, pp. 802815.
    2. 2)
    3. 3)
    4. 4)
    5. 5)
    6. 6)
    7. 7)
    8. 8)
      • 31. Juneja, M., Vedaldi, A., Jawahar, C.V., Zisserman, A.: ‘Blocks that shout: distinctive parts for scene classification’. Proc. IEEE Int. Conf. Computer Vision Pattern Recognition, Portland, USA, 2013, pp. 923930.
    9. 9)
    10. 10)
      • 5. Yang, Y., Newsam, S.: ‘Spatial pyramid co-occurrence for image classification’. Proc. IEEE Int. Conf. on Computer Vision, Barcelona, Spain, 2011, pp. 14651472.
    11. 11)
    12. 12)
      • 41. Ng, A.: ‘CS294A lecture notes: sparse autoencoder’ (Stanford University, Palo Alto, 2010).
    13. 13)
      • 21. Wen, X., Shao, L., Fang, W., Xue, Y.: ‘Efficient feature selection and classification for vehicle detection’, IEEE Trans. Circuits Syst. Video Technol., 2014, doi: 10.1109/TCSVT.2014.2358031.
    14. 14)
    15. 15)
    16. 16)
      • 37. Kokkinos, I.: ‘Shufflets: shared mid-level parts for fast object detection’. Proc. IEEE Int. Conf. Computer Vision, Sydney, Australia, 2013, pp. 13931400.
    17. 17)
    18. 18)
    19. 19)
    20. 20)
      • 29. Doersch, C., Gupta, A., Efros, A.A.: ‘Mid-level visual element discovery as discriminative mode seeking’. Proc. Conf. Advances Neural Information Processing Syst., NV, USA, 2013, pp. 494502.
    21. 21)
    22. 22)
    23. 23)
      • 4. Yang, Y., Newsam, S.: ‘Bag-of-visual-words and spatial extensions for land-use classification’. Proc. ACM SIGSPATIAL Int. Conf. Advances in Geographic Information Systems, San Jose, USA, 2010, pp. 270279.
    24. 24)
      • 28. Li, Q., Wu, J., Tu, Z.: ‘Harvesting mid-level visual concepts from large-scale internet images’. Proc. IEEE Int. Conf. Computer Vision Pattern Recognition, Portland, USA, 2013, pp. 851858.
    25. 25)
    26. 26)
    27. 27)
      • 42. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: ‘Extracting and composing robust features with denoising autoencoders’. Proc. Conf. Advances Neural Information Processing Systems, British Columbia, Canada, 2008, pp. 10961103.
    28. 28)
    29. 29)
    30. 30)
      • 30. Sun, J., Ponce, J.: ‘Learning discriminative part detectors for image classification and cosegmentation’. Proc. IEEE Int. Conf. Computer Vision, Sydney, Australia, 2013, pp. 34003407.
    31. 31)
    32. 32)
      • 25. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. Proc. IEEE Int. Conf. Computer Vision Pattern Recognition, NY, USA, 2006, pp. 21692178.
    33. 33)
    34. 34)
    35. 35)
      • 20. Cheng, G., Han, J., Zhou, P., Guo, L.: ‘Scalable multi-class geospatial object detection in high-spatial-resolution remote sensing images’. Proc. IEEE Int. Geoscience and Remote Sensing Symp., Quebec, Canada, 2014, pp. 24792482.
    36. 36)
      • 27. Singh, S., Gupta, A., Efros, A.A.: ‘Unsupervised discovery of mid-level discriminative patches’. Proc. European Conf. Computer Vision, Firenze, Italy, 2012, pp. 7386.
    37. 37)
    38. 38)
    39. 39)
    40. 40)
      • 24. Li, F.F., Perona, P.: ‘A Bayesian hierarchical model for learning natural scene categories’. Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005, pp. 524531.
    41. 41)
    42. 42)
      • 36. Girshick, R.B., Song, H.O., Darrell, T.: ‘Discriminatively activated sparselets’. Proc. IEEE Int. Conf. Machine Learning, Atlanta, USA, 2013, pp. 196204.
    43. 43)
      • 38. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. Proc. IEEE Int. Conf. Computer Vision Pattern Recognition, San Diego, USA, 2005, pp. 886893.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2014.0270
Loading

Related content

content/journals/10.1049/iet-cvi.2014.0270
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading