access icon openaccess Natural scene text detection based on multiscale connectionist text proposal network

The technique of recognising text in natural scene pictures is widely used in social production. For the existing identification methods, it is difficult to accurately identify in complex environments. The accuracy of the detection determines the efficiency of the identification. A text detection method based on Multiscale Connectionist Text Proposal Network is proposed. The Multiscale-Region Proposal Network regresses and classifies the extracted region to obtain the final candidate region. Taking a large number of commodity image samples as a dataset, the multi-scale joint text proposal network is used to detect and locate the text content area in the image. The experimental results show that the proposed algorithm improves the detection accuracy in complex environments.

Inspec keywords: image colour analysis; text analysis; neural nets; object detection; image classification; edge detection; feature extraction; natural scenes; text detection; image segmentation

Other keywords: existing identification methods; detection accuracy; text content area; natural scene pictures; Multiscale-Region Proposal Network regresses; natural scene text detection; multiscale connectionist text proposal network; complex environments; multiscale joint text proposal network; text detection method

Subjects: Optical, image and video signal processing; Image recognition; Computer vision and image processing techniques; Document processing and analysis techniques

References

    1. 1)
      • 9. Ren, S., He, K., Girshick, R., et al: ‘Faster R-CNN: towards real-time object detection with region proposal networks’, IEEE Trans. Pattern Anal. Mach. Intell., 2015, 39, (6), pp. 9299.
    2. 2)
      • 7. Yi, C.-C., Tian, Y.-L.: ‘Localizing text in scene images by boundary clustering,stroke segmentation,and string fragment classification’, IEEE Trans. Image Process., 2012, 21, (9), pp. 42564268.
    3. 3)
      • 3. Girshick, R., Donahue, J., Darrell, T., et al: ‘Rich feature hierarchies for accurate object detection and semantic segmentation’. Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA., 2014.
    4. 4)
      • 5. Epshtein, B., Ofek, E., Wexler, Y.: ‘Detecting text in natural scenes with stroke width transform’. CVPR 2010: Proc. of the 2010 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Washington, DC, 2010, pp. 29632970.
    5. 5)
      • 12. Liu, W., Anguelov, D., Erhan, D., et al: ‘SSD: single shot multibox detector’. European Conf. on Computer Vision, Amsterdam, The Netherlands, 2016, pp. 2227.
    6. 6)
      • 16. Pont-Tuset, J., Arbelaez, P., Barron, J.T., et al: ‘Multiscale combinatorial grouping for image segmentation and object proposal generation’, IEEE Trans. Pattern Anal. Mach. Intell., 2016, 39, (1), pp. 128140.
    7. 7)
      • 13. Tian, Z., Huang, W., He, T., et al: ‘Detecting text in natural image with connectionist text proposal network’, 2016.
    8. 8)
      • 2. Huang, M., Bai, Y.-D., Liang, Q.-P.: ‘A context speculation method based on fuzzy logic inference in crowd-sensing process’. Int. Conf. on Wireless Communication and Sensor Network, WCSN2014, Wuhan, China, 13–14 December 2014, pp. 411417.
    9. 9)
      • 14. Yu, C.C., Hsu, W.H.: ‘Automated text detection system in natural scenes using bottom-up text-line construction’.
    10. 10)
      • 4. Yin, X.C., Yin, X., Huang, K., et al: ‘Robust text detection in natural scene images’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36, (5), pp. 970983.
    11. 11)
      • 10. Liao, M., Shi, B., Bai, X., et al: ‘Textboxes: a fast text detector with a single deep neural network’. arXiv:1611.06779v1.
    12. 12)
      • 6. Chen, H.Z., Tsai, S.S., Schroth, G., et al: ‘Robust text detection in natural images with edge-enhanced maximally stable extremal regions’. ICIP 2011: Proc. of the 2011 IEEE Int. Conf. on Image Processing, Piscataway, NJ, 2011, pp. 26092612.
    13. 13)
      • 11. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’, arXiv preprint arXiv:1409.1556, 2014.
    14. 14)
      • 8. Matas, J., Chum, O., Urban, M., et al: ‘Robust wide baseline stereo from maximally stable extremal regions’, Image Vis. Comput., 2004, 22, (10), pp. 762767.
    15. 15)
      • 15. Gupta, A., Vedaldi, A., Zisserman, A.: ‘Synthetic data for text localisation in natural images’. Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA., 2016.
    16. 16)
      • 1. Huang, M., Li, X., Wu, S., et al: ‘Speech scenario adaptation and discourse topic recognition on mobile smart terminal’. 2015 Int. Conf. on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 7–9 December 2015, pp. 486491.
    17. 17)
      • 17. Chum, O., Matas, J., Kittler, J.: ‘Locally optimized RANSAC’. Joint Pattern Recogn. Symp., Magdeburg, Germany, 2003, vol. 2781, pp. 236243.
http://iet.metastore.ingenta.com/content/journals/10.1049/joe.2019.1154
Loading

Related content

content/journals/10.1049/joe.2019.1154
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading