Recurrent neural networks for remote sensing image classification

Recurrent neural networks for remote sensing image classification

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Automatically classifying an image has been a central problem in computer vision for decades. A plethora of models has been proposed, from handcrafted feature solutions to more sophisticated approaches such as deep learning. The authors address the problem of remote sensing image classification, which is an important problem to many real world applications. They introduce a novel deep recurrent architecture that incorporates high-level feature descriptors to tackle this challenging problem. Their solution is based on the general encoder–decoder framework. To the best of the authors’ knowledge, this is the first study to use a recurrent network structure on this task. The experimental results show that the proposed framework outperforms the previous works in the three datasets widely used in the literature. They have achieved a state-of-the-art accuracy rate of 97.29% on the UC Merced dataset.


    1. 1)
      • 1. Zhang, L., Zhang, L., Du, B.: ‘Deep learning for remote sensing data: A technical tutorial on the state of the art’, IEEE Geosci. Remote Sens. Mag., 2016, 4, (2), pp. 2240.
    2. 2)
      • 2. Romero, A., Gatta, C., Camps-Valls, G.: ‘Unsupervised deep feature extraction for remote sensing image classification’, IEEE Trans. Geosci. Remote Sens., 2016, 54, (3), pp. 13491362.
    3. 3)
      • 3. Mou, L., Ghamisi, P., Zhu, X.X.: ‘Deep recurrent neural networks for hyperspectral image classification’, IEEE Trans. Geosci. Remote Sens., 2017, 55, (7), pp. 36393655.
    4. 4)
      • 4. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. Proc. 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR'05), Washington, DC, USA, 2005, vol. 1, pp. 886893.
    5. 5)
      • 5. Lowe, D.G.: ‘Distinctive image features from scale-invariant keypoints’, Int. J. Comput. Vis., 2004, 60, (2), pp. 91110.
    6. 6)
      • 6. Marmanis, D., Datcu, M., Esch, T., et al: ‘Deep learning earth observation classification using imagenet pretrained networks’, IEEE Geosci. Remote Sens. Lett., 2016, 13, (1), pp. 105109.
    7. 7)
      • 7. Cho, K., van Merrienboer, B., Bahdanau, D., et al: ‘On the properties of neural machine translation: encoder–decoder approaches’, CoRR, abs/1409.1259, 2014.
    8. 8)
      • 8. Hu, F., Xia, G.S., Wang, Z., et al: ‘Unsupervised feature learning via spectral clustering of multidimensional patches for remotely sensed scene classification’, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 2015, 8, (5), pp. 20152030.
    9. 9)
      • 9. Yang, Y., Newsam, S.: ‘Bag-of-visual-words and spatial extensions for land-use classification’. Proc. 18th SIGSPATIAL Int. Conf. on Advances in Geographic Information Systems, GIS ‘10, New York, NY, USA, 2010, pp. 270279.
    10. 10)
      • 10. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR'06), New York, NY, USA, 2006, vol. 2, pp. 21692178.
    11. 11)
      • 11. Chen, S., Tian, Y.: ‘Pyramid of spatial relations for scene level land use classification’, IEEE Trans. Geosci. Remote Sens., 2015, 53, (4), pp. 19471957.
    12. 12)
      • 12. He, K., Zhang, X., Ren, S., et al: ‘Deep residual learning for image recognition’. 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 2016, pp. 770778.
    13. 13)
      • 13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’, in Pereira, F., Burges, C.J.C., Bottou, L., et al (Eds.): ‘Advances in neural information processing systems’, vol. 25 (Curran Associates, Inc., Red Hook, NY, USA, 2012), pp. 10971105.
    14. 14)
      • 14. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’, CoRR, abs/1409.1556, 2014.
    15. 15)
      • 15. Deng, J., Dong, W., Socher, R., et al: ‘Imagenet: a large-scale hierarchical image database’. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR09), Miami, FL, USA, 2009.
    16. 16)
      • 16. Szegedy, C., Liu, W., Jia, Y., et al: ‘Going deeper with convolutions’. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015.
    17. 17)
      • 17. Castelluccio, M., Poggi, G., Sansone, C., et al: ‘Land use classification in remote sensing images by convolutional neural networks’, CoRR, abs/1508.00092, 2015.
    18. 18)
      • 18. Xia, G.-S., Hu, J., Hu, F., et al: ‘AID: a benchmark dataset for performance evaluation of aerial scene classification’, IEEE Trans. Geosci. Remote Sens., 2017, 55, (7), pp. 39653981.
    19. 19)
      • 19. Cheng, G., Han, J., Lu, X.: ‘Remote sensing image scene classification: benchmark and state of the art’, Proc. IEEE, 2017, 105, (10), pp. 18651883.
    20. 20)
      • 20. Penatti, O.A.B., Nogueira, K., dos Santos, J.A.: ‘Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?’. 2015 IEEE Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA, June 2015, pp. 4451.
    21. 21)
      • 21. Cho, K., Courville, A., Bengio, Y.: ‘Describing multimedia content using attention-based encoder–decoder networks’, IEEE Trans. Multimed., 2015, 17, (11), pp. 18751886.
    22. 22)
      • 22. Cho, K., van Merriënboer, B., Gülçehre, Ç, et al: ‘Learning phrase representations using RNN encoder–decoder for statistical machine translation’. Proc. 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, October 2014, pp. 17241734.
    23. 23)
      • 23. Sutskever, I., Vinyals, O., Le, Q.V.: ‘Sequence to sequence learning with neural networks’. Proc. 27th Int. Conf. on Neural Information Processing Systems, NIPS'14, Cambridge, MA, USA, 2014, pp. 31043112.
    24. 24)
      • 24. Xu, K., Ba, J., Kiros, R., et al: ‘Show, attend and tell: neural image caption generation with visual attention’. Proc. 32nd Int. Conf. on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015, pp. 20482057.
    25. 25)
      • 25. Sharma, S., Kiros, R., Salakhutdinov, R.: ‘Action recognition using visual attention’, CoRR, abs/1511.04119, 2015.
    26. 26)
      • 26. Donahue, J., Jia, Y., Vinyals, O., et al: ‘Decaf: a deep convolutional activation feature for generic visual recognition’, CoRR, abs/1310.1531, 2013.
    27. 27)
      • 27. Oquab, M., Bottou, L., Laptev, I., et al: ‘Learning and transferring mid-level image representations using convolutional neural networks’. 2014 IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, OH, USA, June 2014, pp. 17171724.
    28. 28)
      • 28. Zeiler, M.D., Fergus, R.: ‘Visualizing and understanding convolutional networks’, CoRR, abs/1311.2901, 2013.
    29. 29)
      • 29. Razavian, A.S., Azizpour, H., Sullivan, J., et al: ‘CNN features off-the-shelf: an astounding baseline for recognition’, CoRR, abs/1403.6382, 2014.
    30. 30)
      • 30. Pascanu, R., Mikolov, T., Bengio, Y.: ‘Understanding the exploding gradient problem’, CoRR, abs/1211.5063, 2012.
    31. 31)
      • 31. Hochreiter, S., Schmidhuber, J.: ‘Long short-term memory’, Neural Comput., 1997, 9, (8), pp. 17351780.
    32. 32)
      • 32. Cho, K., van Merrienboer, B., Bahdanau, D., et al: ‘On the properties of neural machine translation: encoder-decoder approaches’, CoRR, 2014.
    33. 33)
      • 33. Li, X., Zhao, F., Guo, Y.: ‘Multi-label image classification with a probabilistic label enhancement model’. Proc. 30th Conf. on Uncertainty in Artificial Intelligence, UAI'14, Arlington, Virginia, USA, 2014, pp. 430439.
    34. 34)
      • 34. Wei, Y., Xia, W., Huang, J., et al: ‘CNN: single-label to multi-label’, CoRR, abs/1406.5726, 2014.
    35. 35)
      • 35. Xia, G.-S., Yang, W., Delon, J., et al: ‘Structural high-resolution satellite image indexing’. ISPRS TC VII Symp. – 100 Years ISPRS, Vienna, Austria, 2010, vol. 38, pp. 298303.
    36. 36)
      • 36. Ali Sharif, R., Hossein, A., Josephine, S., et al: ‘CNN features off-the-shelf: an astounding baseline for recognition’. 2014 IEEE Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW 2014), Columbus, OH, USA, 2014.
    37. 37)
      • 37. Nogueira, K., Penatti, O.A., dos Santos, J.A.: ‘Towards better exploiting convolutional neural networks for remote sensing scene classification’, Pattern Recognit., 2017, 61, pp. 539556.
    38. 38)
      • 38. Zhou, B., Khosla, A., Lapedriza, A., et al: ‘Places: an image database for deep scene understanding’, arXiv preprint arXiv:1610.02055, 2016.
    39. 39)
      • 39. Lakhal, M.I., Escalera, S., Cevikalp, H.: ‘CRN: end-to-end convolutional recurrent network structure applied to vehicle classification’. 13th Int. Conf. on Computer Vision Theory and Applications (VISAPP), Funchal, Portugal, 2018.

Related content

This is a required field
Please enter a valid email address