access icon free Statistical geometric components of straight lines (SGCSL) feature extraction method for offline Arabic/Persian handwritten words recognition

In this study, the authors present a new feature extraction method for handwritten Arabic/Persian language word recognition. This feature is based on the angle, number, location, and size of straight lines which represents geometric and quantitative attributes of a word. At first, word image is broken into an m × n window and straight lines are extracted from each window. Then, the proposed features are taken from these lines and combined together. Finally, the features of the images are used for training and testing support vector machine classifier. The proposed method is tested on three datasets: IBN-SINA and IFN/ENIT for Arabic words and Iran-cities for Persian words recognition. Recognition accuracy of the proposed method is about 67.47, 86.22 and 80.78% for the Iran-cities, IBN-SINA and IFN/ENIT Arabic dataset, respectively, which is better than state-of-the-art methods.

Inspec keywords: text analysis; feature extraction; handwritten character recognition; handwriting recognition; natural language processing; support vector machines; hidden Markov models; image classification

Other keywords: handwritten Arabic/Persian language word recognition; Arabic words; statistical geometric components; offline Arabic/Persian handwritten words recognition; feature extraction method; geometric attributes; Persian words recognition; training; state-of-the-art methods; word image; quantitative attributes; recognition accuracy; straight lines; testing support vector machine classifier; IFN/ENIT Arabic dataset

Subjects: Knowledge engineering techniques; Natural language interfaces; Image recognition; Optical, image and video signal processing; Computer vision and image processing techniques; Other topics in statistics

References

    1. 1)
      • 25. Wahi, A., Sundaramurthy, S., Poovizhi, P.: ‘Handwritten Tamil character recognition using Zernike moments and legendre polynomial’, Artif. Intell. Evol. Algorithms Eng. Syst., 2015, 2015, pp. 595603.
    2. 2)
      • 34. Jayech, K., Mahjoub, M.A., Amara, N.E.B.: ‘Arabic handwritten word recognition based on dynamic Bayesian network’, Int. Arab J. Inf. Technol., 2016, 13, (6B), pp. 10241031.
    3. 3)
      • 33. Pechwitz, M., Maddouri, S.S., Märgner, V., et al: ‘IFN/ENIT-database of handwritten Arabic words’. Proc. CIFED, Hammamet, Tunisia, 2002, pp. 127136.
    4. 4)
      • 2. Vinciarelli, A.: ‘A survey on off-line cursive word recognition’, Pattern Recognit., 2002, 35, (7), pp. 14331446.
    5. 5)
      • 14. Hamdani, M., El Abed, H., Kherallah, M., et al: ‘Combining multiple HMMs using on-line and off-line features for off-line Arabic handwriting recognition’. 10th Int. Conf. on Document Analysis and Recognition, ICDAR'09., Barcelona, Spain, July 2009, pp. 201205.
    6. 6)
      • 21. Elnagar, A., Harous, S.: ‘Recognition of handwritten Hindu numerals using structural descriptors’, J. Exp. Theor. Artif. Intell., 2003, 15, (3), pp. 299314.
    7. 7)
      • 3. Parvez, M.T., Mahmoud, S.A.: ‘Offline Arabic handwritten text recognition: a survey’, ACM Comput. Surv., 2013, 45, (2), p. 23.
    8. 8)
      • 18. Keyvanpour, M.R., Imani, M.B.: ‘Semi-supervised text categorization: exploiting unlabeled data using ensemble learning algorithms’, Intell. Data Anal., 2013, 17, (3), pp. 367385.
    9. 9)
      • 9. Al-Khayat, M.: ‘Learning-based Arabic word spotting using a hierarchical classifier’. PhD thesis, Concordia University, 2014.
    10. 10)
      • 32. Farrahi Moghaddam, R., Cheriet, M., Adankon, M.M., et al: ‘IBN SINA: a database for research on processing and understanding of Arabic manuscripts images’. Proc. ninth IAPR Int. Workshop on Document Analysis Systems, Boston, USA, June 2010, pp. 1118.
    11. 11)
      • 10. Marti, U.-V., Bunke, H.: ‘The IAM-database: an English sentence database for offline handwriting recognition’, Int. J. Doc. Anal. Recognit., 2002, 5, (1), pp. 3946.
    12. 12)
      • 6. Madhvanath, S., Govindaraju, V.: ‘The role of holistic paradigms in handwritten word recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2001, 23, (2), pp. 149164.
    13. 13)
      • 23. Chen, J., Cao, H., Prasad, R., et al: ‘Gabor features for offline Arabic handwriting recognition’. Proc. 9th IAPR Int. Workshop on Document Analysis Systems, Boston, Massachusetts, USA, June 2010, pp. 5358.
    14. 14)
      • 30. Eraqi, H.M., Abdelazeem, S.: ‘A new efficient graphemes segmentation technique for offline Arabic handwriting’. 2012 Int. Conf. on Frontiers in Handwriting Recognition (ICFHR), Bari, Italy, September 2012, pp. 95100.
    15. 15)
      • 19. Gader, P.D., Mohamed, M., Chiang, J.-H.: ‘Handwritten word recognition with character and inter-character neural networks’, IEEE Trans. Syst. Man Cybern. B, Cybern., 1997, 7, (1), pp. 158164.
    16. 16)
      • 26. Parvez, M.T., Mahmoud, S.A.: ‘Arabic handwriting recognition using structural and syntactic pattern attributes’, Pattern Recognit., 2013, 46, (1), pp. 141154.
    17. 17)
      • 20. Oprean, C., Likforman-Sulem, L., Popescu, A., et al: ‘Handwritten word recognition using Web resources and recurrent neural networks’, Int. J. Doc. Anal. Recognit., 2015, 18, (4), pp. 287301.
    18. 18)
      • 15. Gimenez, A., Khoury, I., Andres-Ferrer, J., et al: ‘Handwriting word recognition using windowed Bernoulli HMMs’, Pattern Recognit. Lett., 2014, 35, pp. 149156.
    19. 19)
      • 11. Graves, A., Liwicki, M., Fernandez, S., et al: ‘A novel connectionist system for unconstrained handwriting recognition’, IEEE Trans. Pattern Anal. Mach. Intell., 2009, 31, (5), pp. 855868.
    20. 20)
      • 31. Herout, A., Dubska, M., Havel, J.: ‘Review of Hough transform for line detection’ in Zdonik, S., Ning, P., Shekhar, S. (Eds.): ‘Real-time detection of lines and grids’ (Springer, London, 2013), pp. 316.
    21. 21)
      • 22. Dehghan, M., Faez, K., Ahmadi, M., et al: ‘Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM’, Pattern Recognit., 2001, 34, (5), pp. 10571065.
    22. 22)
      • 16. Sadri, J., Suen, C.Y., Bui, T.D.: ‘Application of support vector machines for recognition of handwritten Arabic/Persian digits’. Proc. Second Iranian Conf. on Machine Vision and Image Processing, Tehran, Iran, 2003, pp. 300307.
    23. 23)
      • 13. Kessentini, Y., Paquet, T., Hamadou, A.B.: ‘Off-line handwritten word recognition using multi-stream hidden Markov models’, Pattern Recognit. Lett., 2010, 31, (1), pp. 6070.
    24. 24)
      • 27. Wei, X., Lu, S., Wen, Y., et al: ‘Recognition of handwritten Chinese address with writing variations’, Pattern Recognit. Lett., 2016, 73, pp. 6875.
    25. 25)
      • 29. Lam, L., Lee, S.W., Suen, C.Y.: ‘Thinning methodologies-a comprehensive survey’, IEEE Trans. Pattern Anal. Mach. Intell., 1992, 14, (9), pp. 869885.
    26. 26)
      • 28. Zhou, X.-D., Wang, D.-H., Tian, F., et al: ‘Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields’, IEEE Trans. Pattern Anal. Mach. Intell., 2013, 35, (10), pp. 24132426.
    27. 27)
      • 24. Almazan, J., Gordo, A., Fornes, A., et al: ‘Word spotting and recognition with embedded attributes’, IEEE Trans. Pattern Anal. Mach. Intell., 2014, 36, (12), pp. 25522566.
    28. 28)
      • 4. Al-Emami, S., Usher, M.: ‘On-line recognition of handwritten Arabic characters’, IEEE Trans. Pattern Anal. Mach. Intell., 1990, 2, (7), pp. 704710.
    29. 29)
      • 8. Imani, Z., Ahmadyfard, Z., Zohrevand, A.: ‘Holistic Farsi handwritten word recognition using gradient features’, J. AI Data Min., 2016, 4, (1), pp. 1925.
    30. 30)
      • 5. Otsu, N.: ‘A threshold selection method from gray-level histograms’, IEEE Trans. Syst. Man Cybern., 1979, 9, (1), pp. 6266.
    31. 31)
      • 12. Awaidah, S.M., Mahmoud, S.A.: ‘A multiple feature/resolution scheme to Arabic (Indian) numerals recognition using hidden Markov models’, Signal Process., 2009, 89, (6), pp. 11761184.
    32. 32)
      • 17. Mowlaei, A., Faez, K.: ‘Recognition of isolated handwritten Persian/Arabic characters and numerals using support vector machines’. IEEE 13th Workshop on Neural Networks for Signal Processing, NNSP'03, Toulouse, France, September 2003, pp. 547554.
    33. 33)
      • 1. Lorigo, L.M., Govindaraju, V.: ‘Offline Arabic handwriting recognition: a survey’, IEEE Trans. Pattern Anal. Mach. Intell., 2006, 28, (5), pp. 712724.
    34. 34)
      • 7. Khayyat, M., Lam, L., Suen, C.Y.: ‘Learning-based word spotting system for Arabic handwritten documents’, Pattern Recognit., 2014, 47, (3), pp. 10211030.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-ipr.2017.0839
Loading

Related content

content/journals/10.1049/iet-ipr.2017.0839
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading