© The Institution of Engineering and Technology
Text information contained in scene images is very helpful for high-level image understanding. In this study, the authors propose to learn co-occurrence of local strokes for scene text recognition by using a spatiality embedded dictionary (SED). Unlike spatial pyramid partitioning images into grids to incorporate spatial information, the authors SED associates every codeword with a particular response region and introduces more precise spatial information for robust character recognition. After localised soft coding and max pooling of the first layer, a sparse dictionary is learned to model co-occurrence of several local strokes, which further improves classification performance. Experimental results on two scene character recognition datasets ICDAR2003 and CHARS74 K demonstrate that their character recognition method outperforms state-of-the-art methods. Besides, competitive word recognition results are also reported for four benchmark word recognition datasets ICDAR2003, ICDAR2011, ICDAR2013 and street view text when combining their character recognition method with a conditional random field language model.
References
-
-
1)
-
17. Zou, H., Hastie, T.: ‘Regularization and variable selection via the elastic net’, J. R. Stat. Soc., B (Stat. Methodol.), 2005, 67, (2), pp. 301–320 (doi: 10.1111/j.1467-9868.2005.00503.x).
-
2)
-
34. Wang, T., Wu, D., Coates, A., Ng, A.: ‘End-to-end text recognition with convolutional neural networks’. Int. Conf. Pattern Recognition (ICPR), 2012, pp. 3304–3308.
-
3)
-
15. Dalal, N., Triggs, B.: ‘Histograms of oriented gradients for human detection’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2005, pp. 886–893.
-
4)
-
5. Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: ‘Scene text recognition using part-based tree-structured character detection’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2013, pp. 2961–2968.
-
5)
-
27. Wang, K., Belongie, S.: ‘Word spotting in the wild’. European Conf. Computer Vision, 2010, pp. 591–604.
-
6)
-
36. Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: ‘PhotoOCR: reading text in uncontrolled conditions’. IEEE Int. Conf. Computer Vision (ICCV), 2013, pp. 785–792.
-
7)
-
23. Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ‘ICDAR 2003 robust reading competitions’. Int. Conf. Document Analysis and Recognition (ICDAR), 2003, pp. 682–687.
-
8)
-
4. Mishra, A., Alahari, K., Jawahar, C.: ‘Top-down and bottom-up cues for scene text recognition’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012, pp. 2687–2694.
-
9)
-
28. Shahab, A., Shafait, F., Dengel, A.: ‘ICDAR 2011 robust reading competition challenge 2: reading text in scene images’. Int. Conf. Document Analysis and Recognition (ICDAR), 2011, pp. 1491–1496.
-
10)
-
24. Campos, T., Babu, B., Varma, M.: ‘Character recognition in natural images’. Computer Vision Theory and Applications, 2009, pp. 273–280.
-
11)
-
19. Mairal, J., Bach, F., Ponce, J., Sapiro, G.: ‘Online learning for matrix factorization and sparse coding’, J. Mach. Learn. Res., 2010, 11, pp. 19–60.
-
12)
-
11. Lazebnik, S., Schmid, C., Ponce, J.: ‘Beyond bags of features: spatial pyramid matching for recognizing natural scene categories’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2006, vol. 2, pp. 2169–2178.
-
13)
-
J.C. van Gemert ,
C.J. Veenman ,
A.W.M. Smeulders ,
J.M. Geusebroek
.
Visual word ambiguity.
IEEE Trans. Pattern Anal. Mach. Intell.
,
7 ,
1271 -
1283
-
14)
-
31. Neumann, L., Matas, J.: ‘A method for text localization and recognition in real-world images’. Asian Conf. Computer Vision, 2010, pp. 770–783.
-
15)
-
32. ‘Abbyy finereader 9.0’. .
-
16)
-
22. Pearl, J.: ‘Probabilistic reasoning in intelligent systems: networks of plausible inference’ (Morgan Kaufmann, San Francisco, 1988).
-
17)
-
8. Yang, J., Yu, K., Gong, Y., Huang, T.: ‘Linear spatial pyramid matching using sparse coding for image classification’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2009, pp. 1794–1801.
-
18)
-
20. Stolcke, A.: ‘SRILM-an extensible language modeling toolkit’. INTERSPEECH, 2002.
-
19)
-
7. Liu, L., Wang, L., Liu, X.: ‘In defense of soft-assignment coding’. , IEEE Int. Conf. Computer Vision (ICCV), 2011, pp. 2486–2493.
-
20)
-
9. Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: ‘Learning mid-level features for recognition’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010, pp. 2559–2566.
-
21)
-
18. Yao, B., Jiang, X., Khosla, A., Lin, A., Guibas, L., Li, F.: ‘Human action recognition by learning bases of action attributes and parts’. IEEE Int. Conf. Computer Vision (ICCV), 2011, pp. 1331–1338.
-
22)
-
12. Viitaniemi, V., Laaksonen, J.: ‘Spatial extensions to bag of visual words’. ACM Int. Conf. Image and Video Retrieval, 2009, p. 37.
-
23)
-
29. Newell, A., Griffin, L.: ‘Multiscale histogram of oriented gradient descriptors for robust character recognition’. Int. Conf. Document Analysis and Recognition (ICDAR), 2011, pp. 1085–1089.
-
24)
-
25)
-
13. Yi, C., Yang, X., Tian, Y.: ‘Feature representation for scene text character recognition: a comparative study’. Int. Conf. Document Analysis and Recognition (ICDAR), 2013, pp. 907–911.
-
26)
-
16. Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: ‘Liblinear: a library for large linear classification’, J. Mach. Learn. Res., 2008, 9, pp. 1871–1874.
-
27)
-
V. Kolmogorov
.
Convergent tree-reweighed message passing for energy minimization.
IEEE Trans. Pattern Anal. Mach. Intell.
,
10 ,
1568 -
1583
-
28)
-
10. Boureau, Y., Ponce, J., LeCun, Y.: ‘A theoretical analysis of feature pooling in visual recognition’. Int. Conf. Machine Learning, 2010, pp. 111–118.
-
29)
-
33. Mishra, A., Alahari, K., Jawahar, C.V.: ‘Scene text recognition using higher order language priors’. British Machine Vision Conf. (BMVC), 2012, pp. 1–11.
-
30)
-
35. Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: ‘Whole is greater than sum of parts: recognizing scene text words’. Int. Conf. Document Analysis and Recognition (ICDAR), 2013, pp. 398–402.
-
31)
-
2. Neumann, L., Matas, J.: ‘Real-time scene text localization and recognition’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3538–3545.
-
32)
-
14. Wang, J., Yang, J., Yu, K., Lv, F., Huang, Y., Gong, T.: ‘Locality-constrained linear coding for image classification’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3360–3367.
-
33)
-
1. Chen, X., Yuille, A.L.: ‘Detecting and reading text in natural scenes’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2004, pp. 366–373.
-
34)
-
30. Tian, S., Lu, S., Su, B., Tan, C.: ‘Scene text recognition using co-occurrence of histogram of oriented gradients’. Int. Conf. Document Analysis and Recognition (ICDAR), 2013, pp. 912–916.
-
35)
-
3. Wang, K., Babenko, B., Belongie, S.: ‘End-to-end scene text recognition’. IEEE Int. Conf. Computer Vision (ICCV), 2011, pp. 1457–1464.
-
36)
-
26. Coates, A., Carpenter, B., Case, C., et al: ‘Text detection and character recognition in scene images with unsupervised feature learning’. Int. Conf. Document Analysis and Recognition (ICDAR), 2011, pp. 440–445.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cvi.2014.0022
Related content
content/journals/10.1049/iet-cvi.2014.0022
pub_keyword,iet_inspecKeyword,pub_concept
6
6