Real-time text tracking in natural scenes

Real-time text tracking in natural scenes

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Computer Vision — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

The authors present a system that automatically detects, recognises and tracks text in natural scenes in real-time. The focus of the author's method is on large text found in outdoor environments, such as shop signs, street names, billboards and so on. Built on top of their previously developed techniques for scene text detection and orientation estimation, the main contribution of this work is to present a complete end-to-end scene text reading system based on text tracking. They propose to use a set of unscented Kalman filters to maintain each text region's identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The system is designed for continuous, unsupervised operation in a handheld or wearable system over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised to maximise usage of available processing power and achieve real-time operation. They demonstrate the performance of the system on sequences recorded in outdoor scenarios.


    1. 1)
      • 1. Epshtein, B., Ofek, E., Wexler, Y.: ‘Detecting text in natural scenes with stroke width transform’. Computer Vision and Pattern Recognition, 2010, pp. 29632970.
    2. 2)
    3. 3)
      • 3. Neumann, L., Matas, J.: ‘Real-time scene text localization and recognition’. Computer Vision and Pattern Recognition, 2012, pp. 35383545.
    4. 4)
      • 4. Chen, H., Tsai, S., Schroth, G., Chen, D., Grzeszczuk, R., Girod, B.: ‘Robust text detection in natural images with edge-enhanced maximally stable external regions’. Int. Conf. on Image Processing, 2011, pp. 26092612.
    5. 5)
      • 5. Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ‘ICDAR 2003 robust reading competitions’. Int. Conf. on Document Analysis and Recognition, 2003, pp. 682687.
    6. 6)
    7. 7)
    8. 8)
      • 8. Merino, C., Mirmehdi, M.: ‘A framework towards realtime detection and tracking of text’. Camera Based Document Analysis and Recognition, 2007, pp. 1017.
    9. 9)
    10. 10)
    11. 11)
      • 11. Gllavata, J., Ewerth, R., Freisleben, B.: ‘Tracking text in MPEG videos’. Int. Conf. on Multimedia, 2004, pp. 240243.
    12. 12)
      • 12. Myers, G.K., Burns, B.: ‘A robust method for tracking scene text in video imagery’. Camera Based Document Analysis and Recognition, 2005, vol. 1.
    13. 13)
      • 13. Shiratori, H., Goto, H., Kobayashi, H.: ‘An efficient text capture method for moving robots using DCT feature and text tracking’. Int. Conf. on Pattern Recognition, 2006, pp. 10501053.
    14. 14)
      • 14. Tanaka, M., Goto, H.: ‘Autonomous text capturing robot using improved DCT feature and text tracking’. Int. Conf. on Document Analysis and Recognition, 2007, 2, pp. 11781182.
    15. 15)
      • 15. Tanaka, M., Goto, H.: ‘Text-tracking wearable camera system for visually-impaired people’. Int. Conf. on Pattern Recognition, 2008, pp. 14.
    16. 16)
      • 16. Goto, H., Tanaka, M.: ‘Text-tracking wearable camera system for the blind’. Int. Conf. on Document Analysis and Recognition, 2009, pp. 141145.
    17. 17)
      • 17. Na, Y., Wen, D.: ‘An effective video text tracking algorithm based on sift feature and geometric constraint’. Advances in Multimedia Information Processing, 2010, pp. 392403.
    18. 18)
      • 18. Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: ‘Snoopertrack: text detection and tracking for outdoor videos’. Int. Conf. on Image Processing, 2011, pp. 505508.
    19. 19)
    20. 20)
      • 20. Hartley, R.I., Zisserman, A.: ‘Multiple view geometry in computer vision’ (Cambridge University Press, 2004, 2nd edn.).
    21. 21)
      • 21. Phan, T.Q., Shivakumara, P., Lu, T., Tan, C.L.: ‘Recognition of video text through temporal integration’. Int. Conf. on Document Analysis and Recognition, 2013, pp. 589593.
    22. 22)
    23. 23)
      • 23. Wang, K., Babenko, B., Belongie, S.: ‘End-to-end scene text recognition’. Int. Conf. on Computer Vision, 2011, pp. 14571464.
    24. 24)
      • 24. Mishra, A., Alahari, K., Jawahar, C.: ‘Top-down and bottom-up cues for scene text recognition’. Computer Vision and Pattern Recognition, 2012, pp. 26872694.
    25. 25)
    26. 26)
      • 26. Merino-Gracia, C., Lenc, K., Mirmehdi, M.: ‘A head-mounted device for recognizing text in natural scenes’, in Iwamura, M., Shafait, F.(Eds.): ‘Camera based document analysis and recognition’, 2012, (LNCS, 7139), pp. 2941.
    27. 27)
      • 27. Targhi, A., Hayman, E., Eklundh, J., Shahshahani, M.: ‘The Eigen-transform & applications’. Asian Conf. of Computer Vision, I, 2006, pp. 7079.
    28. 28)
      • 28. de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: ‘Computational geometry: algorithms and applications’ (Springer-Verlag, 2000, 2nd edn.).
    29. 29)
      • 29. Pilu, M.: ‘Extraction of illusory linear clues in perspectively skewed documents’. Computer Vision and Pattern Recognition, 2001, pp. 363368.
    30. 30)
      • 30. Toussaint, G.: ‘Solving geometric problems with the rotating calipers’. Mediterranean Electrotechnical Conf., 1983, pp. 1017.
    31. 31)
      • 31. Doucet, A., de Freitas, J., Gordon, N.: ‘Sequential Monte Carlo methods in practice’ (Springer-Verlag, 2001).
    32. 32)
      • 32. Wan, E., Van Der Merwe, R.: ‘The unscented Kalman filter for nonlinear estimation’. Adaptive Systems for Signal Processing, Communications, and Control, 2000, pp. 153158.
    33. 33)
      • 33. Klein, G., Murray, D.: ‘Parallel tracking and mapping for small AR workspaces’. ISMAR, 2007, pp. 225234.
    34. 34)
    35. 35)
      • 35. Bernardin, K., Stiefelhagen, R.: ‘Evaluating multiple object tracking performance: the CLEAR MOT metrics’, J. Image Video Process., 2008, 2008, pp. 1:11:10.

Related content

This is a required field
Please enter a valid email address