access icon free Scene text recognition by learning co-occurrence of strokes based on spatiality embedded dictionary

Text information contained in scene images is very helpful for high-level image understanding. In this study, the authors propose to learn co-occurrence of local strokes for scene text recognition by using a spatiality embedded dictionary (SED). Unlike spatial pyramid partitioning images into grids to incorporate spatial information, the authors SED associates every codeword with a particular response region and introduces more precise spatial information for robust character recognition. After localised soft coding and max pooling of the first layer, a sparse dictionary is learned to model co-occurrence of several local strokes, which further improves classification performance. Experimental results on two scene character recognition datasets ICDAR2003 and CHARS74 K demonstrate that their character recognition method outperforms state-of-the-art methods. Besides, competitive word recognition results are also reported for four benchmark word recognition datasets ICDAR2003, ICDAR2011, ICDAR2013 and street view text when combining their character recognition method with a conditional random field language model.

Inspec keywords: dictionaries; text detection; character recognition

Other keywords: max pooling; ICDAR2003 dataset; high-level image understanding; scene text recognition; sparse dictionary; spatiality embedded dictionary; local strokes; localised soft coding; robust character recognition; SED; text information; CHARS74 K dataset; scene images

Subjects: Image recognition; Computer vision and image processing techniques; Information analysis and indexing

References

    1. 1)
    2. 2)
      • 34. Wang, T., Wu, D., Coates, A., Ng, A.: ‘End-to-end text recognition with convolutional neural networks’. Int. Conf. Pattern Recognition (ICPR), 2012, pp. 33043308.
    3. 3)
      • 15. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2005, pp. 886893.
    4. 4)
      • 5. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: ‘Scene text recognition using part-based tree-structured character detection’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2013, pp. 29612968.
    5. 5)
      • 27. Wang, K., Belongie, S.: ‘Word spotting in the wild’. European Conf. Computer Vision, 2010, pp. 591604.
    6. 6)
      • 36. Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: ‘PhotoOCR: reading text in uncontrolled conditions’. IEEE Int. Conf. Computer Vision (ICCV), 2013, pp. 785792.
    7. 7)
      • 23. Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ‘ICDAR 2003 robust reading competitions’. Int. Conf. Document Analysis and Recognition (ICDAR), 2003, pp. 682687.
    8. 8)
      • 4. Mishra, A., Alahari, K., Jawahar, C.: ‘Top-down and bottom-up cues for scene text recognition’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012, pp. 26872694.
    9. 9)
      • 28. Shahab, A., Shafait, F., Dengel, A.: ‘ICDAR 2011 robust reading competition challenge 2: reading text in scene images’. Int. Conf. Document Analysis and Recognition (ICDAR), 2011, pp. 14911496.
    10. 10)
      • 24. Campos, T., Babu, B., Varma, M.: ‘Character recognition in natural images’. Computer Vision Theory and Applications, 2009, pp. 273280.
    11. 11)
      • 19. Mairal, J., Bach, F., Ponce, J., Sapiro, G.: ‘Online learning for matrix factorization and sparse coding’, J. Mach. Learn. Res., 2010, 11, pp. 1960.
    12. 12)
      • 11. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2006, vol. 2, pp. 21692178.
    13. 13)
    14. 14)
      • 31. Neumann, L., Matas, J.: ‘A method for text localization and recognition in real-world images’. Asian Conf. Computer Vision, 2010, pp. 770783.
    15. 15)
      • 32. Abbyy finereader 9.0’. Available at http://www.abbyy.com.
    16. 16)
      • 22. Pearl, J.: ‘Probabilistic reasoning in intelligent systems: networks of plausible inference’ (Morgan Kaufmann, San Francisco, 1988).
    17. 17)
      • 8. Yang, J., Yu, K., Gong, Y., Huang, T.: ‘Linear spatial pyramid matching using sparse coding for image classification’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009, pp. 17941801.
    18. 18)
      • 20. Stolcke, A.: ‘SRILM-an extensible language modeling toolkit’. INTERSPEECH, 2002.
    19. 19)
      • 7. Liu, L., Wang, L., Liu, X.: ‘In defense of soft-assignment coding’. , IEEE Int. Conf. Computer Vision (ICCV), 2011, pp. 24862493.
    20. 20)
      • 9. Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: ‘Learning mid-level features for recognition’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010, pp. 25592566.
    21. 21)
      • 18. Yao, B., Jiang, X., Khosla, A., Lin, A., Guibas, L., Li, F.: ‘Human action recognition by learning bases of action attributes and parts’. IEEE Int. Conf. Computer Vision (ICCV), 2011, pp. 13311338.
    22. 22)
      • 12. Viitaniemi, V., Laaksonen, J.: ‘Spatial extensions to bag of visual words’. ACM Int. Conf. Image and Video Retrieval, 2009, p. 37.
    23. 23)
      • 29. Newell, A., Griffin, L.: ‘Multiscale histogram of oriented gradient descriptors for robust character recognition’. Int. Conf. Document Analysis and Recognition (ICDAR), 2011, pp. 10851089.
    24. 24)
      • 25. Chars74 k’. Available at http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/.
    25. 25)
      • 13. Yi, C., Yang, X., Tian, Y.: ‘Feature representation for scene text character recognition: a comparative study’. Int. Conf. Document Analysis and Recognition (ICDAR), 2013, pp. 907911.
    26. 26)
      • 16. Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: ‘Liblinear: a library for large linear classification’, J. Mach. Learn. Res., 2008, 9, pp. 18711874.
    27. 27)
    28. 28)
      • 10. Boureau, Y., Ponce, J., LeCun, Y.: ‘A theoretical analysis of feature pooling in visual recognition’. Int. Conf. Machine Learning, 2010, pp. 111118.
    29. 29)
      • 33. Mishra, A., Alahari, K., Jawahar, C.V.: ‘Scene text recognition using higher order language priors’. British Machine Vision Conf. (BMVC), 2012, pp. 111.
    30. 30)
      • 35. Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: ‘Whole is greater than sum of parts: recognizing scene text words’. Int. Conf. Document Analysis and Recognition (ICDAR), 2013, pp. 398402.
    31. 31)
      • 2. Neumann, L., Matas, J.: ‘Real-time scene text localization and recognition’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012, pp. 35383545.
    32. 32)
      • 14. Wang, J., Yang, J., Yu, K., Lv, F., Huang, Y., Gong, T.: ‘Locality-constrained linear coding for image classification’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010, pp. 33603367.
    33. 33)
      • 1. Chen, X., Yuille, A.L.: ‘Detecting and reading text in natural scenes’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2004, pp. 366373.
    34. 34)
      • 30. Tian, S., Lu, S., Su, B., Tan, C.: ‘Scene text recognition using co-occurrence of histogram of oriented gradients’. Int. Conf. Document Analysis and Recognition (ICDAR), 2013, pp. 912916.
    35. 35)
      • 3. Wang, K., Babenko, B., Belongie, S.: ‘End-to-end scene text recognition’. IEEE Int. Conf. Computer Vision (ICCV), 2011, pp. 14571464.
    36. 36)
      • 26. Coates, A., Carpenter, B., Case, C., et al: ‘Text detection and character recognition in scene images with unsupervised feature learning’. Int. Conf. Document Analysis and Recognition (ICDAR), 2011, pp. 440445.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2014.0022
Loading

Related content

content/journals/10.1049/iet-cvi.2014.0022
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading