Text information contained in scene images is very helpful for high-level image understanding. In this study, the authors propose to learn co-occurrence of local strokes for scene text recognition by using a spatiality embedded dictionary (SED). Unlike spatial pyramid partitioning images into grids to incorporate spatial information, the authors SED associates every codeword with a particular response region and introduces more precise spatial information for robust character recognition. After localised soft coding and max pooling of the first layer, a sparse dictionary is learned to model co-occurrence of several local strokes, which further improves classification performance. Experimental results on two scene character recognition datasets ICDAR2003 and CHARS74 K demonstrate that their character recognition method outperforms state-of-the-art methods. Besides, competitive word recognition results are also reported for four benchmark word recognition datasets ICDAR2003, ICDAR2011, ICDAR2013 and street view text when combining their character recognition method with a conditional random field language model.

References

1. 1)
  - 17. Zou, H., Hastie, T.: ‘Regularization and variable selection via the elastic net’, J. R. Stat. Soc., B (Stat. Methodol.), 2005, 67, (2), pp. 301–320 (doi: 10.1111/j.1467-9868.2005.00503.x).
2. 2)
  - 34. Wang, T., Wu, D., Coates, A., Ng, A.: ‘End-to-end text recognition with convolutional neural networks’. Int. Conf. Pattern Recognition (ICPR), 2012, pp. 3304–3308.
3. 3)
  - 15. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2005, pp. 886–893.
4. 4)
  - 5. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: ‘Scene text recognition using part-based tree-structured character detection’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2013, pp. 2961–2968.
5. 5)
  - 27. Wang, K., Belongie, S.: ‘Word spotting in the wild’. European Conf. Computer Vision, 2010, pp. 591–604.
6. 6)
  - 36. Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: ‘PhotoOCR: reading text in uncontrolled conditions’. IEEE Int. Conf. Computer Vision (ICCV), 2013, pp. 785–792.
7. 7)
  - 23. Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ‘ICDAR 2003 robust reading competitions’. Int. Conf. Document Analysis and Recognition (ICDAR), 2003, pp. 682–687.
8. 8)
  - 4. Mishra, A., Alahari, K., Jawahar, C.: ‘Top-down and bottom-up cues for scene text recognition’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2687–2694.
9. 9)
  - 28. Shahab, A., Shafait, F., Dengel, A.: ‘ICDAR 2011 robust reading competition challenge 2: reading text in scene images’. Int. Conf. Document Analysis and Recognition (ICDAR), 2011, pp. 1491–1496.
10. 10)
  - 24. Campos, T., Babu, B., Varma, M.: ‘Character recognition in natural images’. Computer Vision Theory and Applications, 2009, pp. 273–280.
11. 11)
  - 19. Mairal, J., Bach, F., Ponce, J., Sapiro, G.: ‘Online learning for matrix factorization and sparse coding’, J. Mach. Learn. Res., 2010, 11, pp. 19–60.
12. 12)
  - 11. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2006, vol. 2, pp. 2169–2178.
13. 13)
  - J.C. van Gemert , C.J. Veenman , A.W.M. Smeulders , J.M. Geusebroek . Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. , 7 , 1271 - 1283
14. 14)
  - 31. Neumann, L., Matas, J.: ‘A method for text localization and recognition in real-world images’. Asian Conf. Computer Vision, 2010, pp. 770–783.
15. 15)
  - 32. ‘Abbyy finereader 9.0’. Available at http://www.abbyy.com.
16. 16)
  - 22. Pearl, J.: ‘Probabilistic reasoning in intelligent systems: networks of plausible inference’ (Morgan Kaufmann, San Francisco, 1988).
17. 17)
  - 8. Yang, J., Yu, K., Gong, Y., Huang, T.: ‘Linear spatial pyramid matching using sparse coding for image classification’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009, pp. 1794–1801.
18. 18)
  - 20. Stolcke, A.: ‘SRILM-an extensible language modeling toolkit’. INTERSPEECH, 2002.
19. 19)
  - 7. Liu, L., Wang, L., Liu, X.: ‘In defense of soft-assignment coding’. , IEEE Int. Conf. Computer Vision (ICCV), 2011, pp. 2486–2493.
20. 20)
  - 9. Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: ‘Learning mid-level features for recognition’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2559–2566.
21. 21)
  - 18. Yao, B., Jiang, X., Khosla, A., Lin, A., Guibas, L., Li, F.: ‘Human action recognition by learning bases of action attributes and parts’. IEEE Int. Conf. Computer Vision (ICCV), 2011, pp. 1331–1338.
22. 22)
  - 12. Viitaniemi, V., Laaksonen, J.: ‘Spatial extensions to bag of visual words’. ACM Int. Conf. Image and Video Retrieval, 2009, p. 37.
23. 23)
  - 29. Newell, A., Griffin, L.: ‘Multiscale histogram of oriented gradient descriptors for robust character recognition’. Int. Conf. Document Analysis and Recognition (ICDAR), 2011, pp. 1085–1089.
24. 24)
  - 25. ‘Chars74 k’. Available at http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/.
25. 25)
  - 13. Yi, C., Yang, X., Tian, Y.: ‘Feature representation for scene text character recognition: a comparative study’. Int. Conf. Document Analysis and Recognition (ICDAR), 2013, pp. 907–911.
26. 26)
  - 16. Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: ‘Liblinear: a library for large linear classification’, J. Mach. Learn. Res., 2008, 9, pp. 1871–1874.
27. 27)
  - V. Kolmogorov . Convergent tree-reweighed message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. , 10 , 1568 - 1583
28. 28)
  - 10. Boureau, Y., Ponce, J., LeCun, Y.: ‘A theoretical analysis of feature pooling in visual recognition’. Int. Conf. Machine Learning, 2010, pp. 111–118.
29. 29)
  - 33. Mishra, A., Alahari, K., Jawahar, C.V.: ‘Scene text recognition using higher order language priors’. British Machine Vision Conf. (BMVC), 2012, pp. 1–11.
30. 30)
  - 35. Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: ‘Whole is greater than sum of parts: recognizing scene text words’. Int. Conf. Document Analysis and Recognition (ICDAR), 2013, pp. 398–402.
31. 31)
  - 2. Neumann, L., Matas, J.: ‘Real-time scene text localization and recognition’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3538–3545.
32. 32)
  - 14. Wang, J., Yang, J., Yu, K., Lv, F., Huang, Y., Gong, T.: ‘Locality-constrained linear coding for image classification’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3360–3367.
33. 33)
  - 1. Chen, X., Yuille, A.L.: ‘Detecting and reading text in natural scenes’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2004, pp. 366–373.
34. 34)
  - 30. Tian, S., Lu, S., Su, B., Tan, C.: ‘Scene text recognition using co-occurrence of histogram of oriented gradients’. Int. Conf. Document Analysis and Recognition (ICDAR), 2013, pp. 912–916.
35. 35)
  - 3. Wang, K., Babenko, B., Belongie, S.: ‘End-to-end scene text recognition’. IEEE Int. Conf. Computer Vision (ICCV), 2011, pp. 1457–1464.
36. 36)
  - 26. Coates, A., Carpenter, B., Case, C., et al: ‘Text detection and character recognition in scene images with unsupervised feature learning’. Int. Conf. Document Analysis and Recognition (ICDAR), 2011, pp. 440–445.

Scene text recognition by learning co-occurrence of strokes based on spatiality embedded dictionary

References

Related content