Recognition of isolated digits using DNN–HMM and harmonic noise model

Recognition of isolated digits using DNN–HMM and harmonic noise model

For access to this article, please select a purchase option:

Buy eFirst article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Signal Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Speech recognition is an area that is constantly developing. In this study, the authors present a new system of speech recognition applied to the Arabic language. The system proposed here is based on the harmonic plus noise model (HNM). This model is rather used in speech synthesis tasks and is known for providing excellent speech production quality. Thus, their contribution lies in replacing the conventional mel-frequency cepstrum coefficients (MFCC) parameters with a set of acoustic parameters, extracted through the HNM estimation process. The HNM model allows development of a more adapted processing by distinguishing voiced and unvoiced speech frames and by characterising the harmonic property of speech. As common, their system consists of both training and recognition phases. Deep neural networks and hidden Markov models (DNN–HMM) are used for modelling the voiced frames corresponding to the harmonic part. The DNN model is estimated with static and dynamic parameters. Moreover, the unvoiced frames, which represent the noise part of the HNM, are clustered with an HMM model. The spoken Arabic digits are used to measure the performance of the proposed recognition system and a comparison with the MFCC-based approach is performed.


    1. 1)
      • 1. Sarma, M., Sarma, K.K.: ‘The phoneme-based speech segmentation using hybrid soft computing framework’ (Springer India, Warsaw, 2014).
    2. 2)
      • 2. Molloy, T.L., Ford, J.J.: ‘Towards strongly consistent online HMM parameter estimation using one-step Kerridge inaccuracy’, Signal Process., 2015, 115, pp. 7993.
    3. 3)
      • 3. Champion, C., Houghton, S.: ‘Application of continuous state hidden Markov models to a classical problem in speech recognition’, Comput. Speech Lang., 2016, 36, pp. 347364.
    4. 4)
      • 4. Solera-Urena, R., García-Moral, A.I., Pelaez-Moreno, C., et al: ‘Real-time robust automatic speech recognition using compact support vector machines’, IEEE Trans. Audio Speech Lang. Process., 2012, 20, (4), pp. 13471361.
    5. 5)
      • 5. Yeqing, Y., Tao, T.: ‘An new speech recognition method based on prosodic analysis and SVM in Zhuang language’. Int. Conf. Mechatronic Science, Electric Engineering and Computer, Jilin, China, September 2011, pp. 12091212.
    6. 6)
      • 6. Li, X., Wu, X.: ‘Decision tree based state tying for speech recognition using DNN derived embeddings’. Ninth Int. Symp. Chinese Spoken Language Proc. (ISCSLP), Singapore, Singapore, 2014, pp. 123127.
    7. 7)
      • 7. Chen, Z., Yu, K.: ‘An investigation of implementation and performance analysis of DNN based speech synthesis system’. Int. Conf. Signal Proc. (ICSP), Hangzhou, China, 2014, pp. 577582.
    8. 8)
      • 8. Kipyatkova, I., Karpov, A.: ‘DNN-based acoustic modeling for Russian speech recognition using Kaldi’. SPECOM, Cham, 2016(LNCS, 9811), pp. 246253.
    9. 9)
      • 9. Xiangang, L., Yuning, Y., Zaihu, P., et al: ‘A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition’, Neurocomputing, 2015, 170, pp. 251256.
    10. 10)
      • 10. Selouani, S.A., Alotaibi, Y.A.: ‘Adaptation of foreign accented speakers in native Arabic ASR systems’, Appl. Comput. Inf., 2011, 9, (1), pp. 110.
    11. 11)
      • 11. Bourouba, H., Djemili, R., Bedda, M., et al: ‘New hybrid system (supervised classifier/HMM) for isolated Arabic speech recognition’. Second Conf. Information and Communication Technologies (ICTTA'06), Damascus, Syria, 2006, pp. 12641269.
    12. 12)
      • 12. Muhammad, G., AlMalki, K., Mesallam, T., et al: ‘Automatic Arabic digit speech recognition and formant analysis for voicing disordered people’. IEEE Symp. Computers and Informatics (ISCI), Kuala Lumpur, Malaysia, 2011, pp. 699702.
    13. 13)
      • 13. El-Ramly, S.H., Abdel-Kader, N.S., El-Adawi, R.: ‘Neural networks used for speech recognition’. Proc. 19th National Radio Science Conf. (NRSC), Alexandria, March 2002, pp. 200207.
    14. 14)
      • 14. Bahi, H., Sellami, M.: ‘A hybrid approach for Arabic speech recognition’. ACS/IEEE Int. Conf. Computer Systems and Applications, Tunis, July 2003, pp. 1418.
    15. 15)
      • 15. Alotaibi, Y.: ‘Spoken Arabic digits recognizer using recurrent neural networks’. Proc. Fourth IEEE Int. Symp. Signal Processing and Information Technology, Rome, 2004, pp. 195199.
    16. 16)
      • 16. Essa, E.M., Tolba, A.S., Elmougy, S.: ‘A comparison of combined classifier architectures for Arabic speech recognition’. Int. Conf. Computer Engineering and Systems, Cairo, Egypt, 2008, pp. 149153.
    17. 17)
      • 17. Satori, H., Harti, M., Chenfour, N.: ‘Introduction to Arabic speech recognition using CMU Sphinx system’. Information and Communication Technologies Int. Symp. ICTIS'07, Fez, Morocco, 2007, pp. 3135.
    18. 18)
      • 18. Stylianou, Y.: ‘Harmonic plus noise model for speech combined with statistical methods, for speech and speaker modification’. PhD thesis, ENST, Paris, France, 1996.
    19. 19)
      • 19. Menacer, M., Mella, O., Fohr, D., et al: ‘An enhanced automatic speech recognition system for Arabic’. Third Arabic Natural Language Proc. Workshop, Valencia, Spain, April 2017, pp. 157165.
    20. 20)
      • 20. Li, L., Zhao, Y., Jiang, D., et al: ‘Hybrid deep neural network–hidden Markov model (DNN–HMM) based speech emotion recognition’. Humaine Association Conf. Affective Computing and Intelligent Interaction, Geneva, Switzerland, 2013, pp. 312317.
    21. 21)
      • 21. Zhao, T., Zhao, Y., Chen, X.: ‘Ensemble acoustic modeling for CD-DNN–HMM using random forests of phonetic decision trees’, J. Signal Process. Syst., 2016, 82, pp. 187196.
    22. 22)
      • 22. Guerid, A., Saboune, H., Houacine, A.: ‘Recognition of isolated digits using HMM and harmonic noise model’. ICCAT, Cairo, Egypt, January 2017.
    23. 23)
      • 23. Cappe, O., Laroche, J., Moulines, E.: ‘Regularized estimation of cepstrum envelope from discrete frequency points’. IEEE ASSP Workshop on Application of Signal Processing to Audio and Acoustics, NY, USA, 1995, pp. 213216.
    24. 24)
      • 24. Cappe, O., Moulines, E.: ‘Regularization techniques for discrete cepstrum estimation’, IEEE Signal Process. Lett., 1996, 3, (4), pp. 100102.
    25. 25)
      • 25. Guerid, A., Houacine, A.: ‘The influence of the Bark's transformation on the spectral modeling’. Seventh Conf. Electrical Engineering (CGE'07), Algiers, April 2011.
    26. 26)
      • 26. Press, W.H., Flannery, B.P., Teukolsky, S.A., et al: ‘Numerical recipes in C; the art of scientific computing’ (Cambridge University Press, New York, USA, 1992, 2nd edn.).
    27. 27)
      • 27. Boite, R., Kunt, M.: ‘Traitement de la parole’ (Press Polytechniques Romandes, Lausanne, Switzerland, 1987).
    28. 28)
      • 28. Rabiner, L., Juang, B.H.: ‘Fundamentals of speech recognition’ (Prentice-Hall, Englewood Cliffs, NJ, 1993).
    29. 29)
      • 29. Aissiou, M., Guerti, M.: ‘Genetic supervised classification of standard Arabic fricative consonants for the automatic speech recognition’, Medwell J. Appl. Sci., 2007, 2, (4), pp. 458476.

Related content

This is a required field
Please enter a valid email address