Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Recognition of isolated digits using DNN–HMM and harmonic noise model

Speech recognition is an area that is constantly developing. In this study, the authors present a new system of speech recognition applied to the Arabic language. The system proposed here is based on the harmonic plus noise model (HNM). This model is rather used in speech synthesis tasks and is known for providing excellent speech production quality. Thus, their contribution lies in replacing the conventional mel-frequency cepstrum coefficients (MFCC) parameters with a set of acoustic parameters, extracted through the HNM estimation process. The HNM model allows development of a more adapted processing by distinguishing voiced and unvoiced speech frames and by characterising the harmonic property of speech. As common, their system consists of both training and recognition phases. Deep neural networks and hidden Markov models (DNN–HMM) are used for modelling the voiced frames corresponding to the harmonic part. The DNN model is estimated with static and dynamic parameters. Moreover, the unvoiced frames, which represent the noise part of the HNM, are clustered with an HMM model. The spoken Arabic digits are used to measure the performance of the proposed recognition system and a comparison with the MFCC-based approach is performed.

References

    1. 1)
      • 20. Li, L., Zhao, Y., Jiang, D., et al: ‘Hybrid deep neural network–hidden Markov model (DNN–HMM) based speech emotion recognition’. Humaine Association Conf. Affective Computing and Intelligent Interaction, Geneva, Switzerland, 2013, pp. 312317.
    2. 2)
      • 19. Menacer, M., Mella, O., Fohr, D., et al: ‘An enhanced automatic speech recognition system for Arabic’. Third Arabic Natural Language Proc. Workshop, Valencia, Spain, April 2017, pp. 157165.
    3. 3)
      • 6. Li, X., Wu, X.: ‘Decision tree based state tying for speech recognition using DNN derived embeddings’. Ninth Int. Symp. Chinese Spoken Language Proc. (ISCSLP), Singapore, Singapore, 2014, pp. 123127.
    4. 4)
      • 3. Champion, C., Houghton, S.: ‘Application of continuous state hidden Markov models to a classical problem in speech recognition’, Comput. Speech Lang., 2016, 36, pp. 347364.
    5. 5)
      • 7. Chen, Z., Yu, K.: ‘An investigation of implementation and performance analysis of DNN based speech synthesis system’. Int. Conf. Signal Proc. (ICSP), Hangzhou, China, 2014, pp. 577582.
    6. 6)
      • 15. Alotaibi, Y.: ‘Spoken Arabic digits recognizer using recurrent neural networks’. Proc. Fourth IEEE Int. Symp. Signal Processing and Information Technology, Rome, 2004, pp. 195199.
    7. 7)
      • 18. Stylianou, Y.: ‘Harmonic plus noise model for speech combined with statistical methods, for speech and speaker modification’. PhD thesis, ENST, Paris, France, 1996.
    8. 8)
      • 11. Bourouba, H., Djemili, R., Bedda, M., et al: ‘New hybrid system (supervised classifier/HMM) for isolated Arabic speech recognition’. Second Conf. Information and Communication Technologies (ICTTA'06), Damascus, Syria, 2006, pp. 12641269.
    9. 9)
      • 24. Cappe, O., Moulines, E.: ‘Regularization techniques for discrete cepstrum estimation’, IEEE Signal Process. Lett., 1996, 3, (4), pp. 100102.
    10. 10)
      • 16. Essa, E.M., Tolba, A.S., Elmougy, S.: ‘A comparison of combined classifier architectures for Arabic speech recognition’. Int. Conf. Computer Engineering and Systems, Cairo, Egypt, 2008, pp. 149153.
    11. 11)
      • 1. Sarma, M., Sarma, K.K.: ‘The phoneme-based speech segmentation using hybrid soft computing framework’ (Springer India, Warsaw, 2014).
    12. 12)
      • 29. Aissiou, M., Guerti, M.: ‘Genetic supervised classification of standard Arabic fricative consonants for the automatic speech recognition’, Medwell J. Appl. Sci., 2007, 2, (4), pp. 458476.
    13. 13)
      • 5. Yeqing, Y., Tao, T.: ‘An new speech recognition method based on prosodic analysis and SVM in Zhuang language’. Int. Conf. Mechatronic Science, Electric Engineering and Computer, Jilin, China, September 2011, pp. 12091212.
    14. 14)
      • 28. Rabiner, L., Juang, B.H.: ‘Fundamentals of speech recognition’ (Prentice-Hall, Englewood Cliffs, NJ, 1993).
    15. 15)
      • 25. Guerid, A., Houacine, A.: ‘The influence of the Bark's transformation on the spectral modeling’. Seventh Conf. Electrical Engineering (CGE'07), Algiers, April 2011.
    16. 16)
      • 23. Cappe, O., Laroche, J., Moulines, E.: ‘Regularized estimation of cepstrum envelope from discrete frequency points’. IEEE ASSP Workshop on Application of Signal Processing to Audio and Acoustics, NY, USA, 1995, pp. 213216.
    17. 17)
      • 17. Satori, H., Harti, M., Chenfour, N.: ‘Introduction to Arabic speech recognition using CMU Sphinx system’. Information and Communication Technologies Int. Symp. ICTIS'07, Fez, Morocco, 2007, pp. 3135.
    18. 18)
      • 2. Molloy, T.L., Ford, J.J.: ‘Towards strongly consistent online HMM parameter estimation using one-step Kerridge inaccuracy’, Signal Process., 2015, 115, pp. 7993.
    19. 19)
      • 27. Boite, R., Kunt, M.: ‘Traitement de la parole’ (Press Polytechniques Romandes, Lausanne, Switzerland, 1987).
    20. 20)
      • 4. Solera-Urena, R., García-Moral, A.I., Pelaez-Moreno, C., et al: ‘Real-time robust automatic speech recognition using compact support vector machines’, IEEE Trans. Audio Speech Lang. Process., 2012, 20, (4), pp. 13471361.
    21. 21)
      • 21. Zhao, T., Zhao, Y., Chen, X.: ‘Ensemble acoustic modeling for CD-DNN–HMM using random forests of phonetic decision trees’, J. Signal Process. Syst., 2016, 82, pp. 187196.
    22. 22)
      • 13. El-Ramly, S.H., Abdel-Kader, N.S., El-Adawi, R.: ‘Neural networks used for speech recognition’. Proc. 19th National Radio Science Conf. (NRSC), Alexandria, March 2002, pp. 200207.
    23. 23)
      • 14. Bahi, H., Sellami, M.: ‘A hybrid approach for Arabic speech recognition’. ACS/IEEE Int. Conf. Computer Systems and Applications, Tunis, July 2003, pp. 1418.
    24. 24)
      • 9. Xiangang, L., Yuning, Y., Zaihu, P., et al: ‘A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition’, Neurocomputing, 2015, 170, pp. 251256.
    25. 25)
      • 22. Guerid, A., Saboune, H., Houacine, A.: ‘Recognition of isolated digits using HMM and harmonic noise model’. ICCAT, Cairo, Egypt, January 2017.
    26. 26)
      • 26. Press, W.H., Flannery, B.P., Teukolsky, S.A., et al: ‘Numerical recipes in C; the art of scientific computing’ (Cambridge University Press, New York, USA, 1992, 2nd edn.).
    27. 27)
      • 8. Kipyatkova, I., Karpov, A.: ‘DNN-based acoustic modeling for Russian speech recognition using Kaldi’. SPECOM, Cham, 2016(LNCS, 9811), pp. 246253.
    28. 28)
      • 10. Selouani, S.A., Alotaibi, Y.A.: ‘Adaptation of foreign accented speakers in native Arabic ASR systems’, Appl. Comput. Inf., 2011, 9, (1), pp. 110.
    29. 29)
      • 12. Muhammad, G., AlMalki, K., Mesallam, T., et al: ‘Automatic Arabic digit speech recognition and formant analysis for voicing disordered people’. IEEE Symp. Computers and Informatics (ISCI), Kuala Lumpur, Malaysia, 2011, pp. 699702.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2018.5131
Loading

Related content

content/journals/10.1049/iet-spr.2018.5131
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address