access icon free Memory-efficient buffering method and enhanced reference template for embedded automatic speech recognition system

This work realises a memory-efficient embedded automatic speech recognition (ASR) system on a resource-constrained platform. A buffering method called ultra-low queue-accumulator buffering is presented to efficiently use the constrained memory to extract the linear prediction cepstral coefficient (LPCC) feature in the embedded ASR system. The optimal order of the LPCC is evaluated to balance the recognition accuracy and the computational cost. In the decoding part, the proposed enhanced cross-words reference templates (CWRTs) method is incorporated into the template matching method to reach the speaker-independent characteristic of ASR tasks without the large memory burden of the conventional CWRTs method. The proposed techniques are implemented on a 16-bit microprocessor GPCE063A platform with a 49.152 MHz clock, using a sampling rate of 8 kHz. Experimental results demonstrate that recognition accuracy reaches 95.22% in a 30-sentence speaker-independent embedded ASR task, using only 0.75 kB RAM.

Inspec keywords: buffer storage; microprocessor chips; speech coding; decoding; feature extraction; speaker recognition

Other keywords: decoding part; 30-sentence speaker-independent embedded ASR task; memory-efficient embedded automatic speech recognition system; frequency 8 kHz; memory-efficient buffering method; constrained memory; enhanced cross-word reference template method; microprocessor GPCE063A platform; word length 16 bit; RAM; template matching method; LPCC feature extraction; frequency 49.152 MHz; resource-constrained platform; embedded ASR system; enhanced reference template; linear prediction cepstral coefficient feature extraction; CWRTs method; speaker-independent characteristic; ultra-low queue-accumulator buffering

Subjects: Memory circuits; Speech processing techniques; Speech recognition and synthesis; Semiconductor storage; Microprocessors and microcomputers; Speech and audio coding

References

    1. 1)
      • 6. Zhang, J.: ‘Research of improved DTW algorithm in embedded speech recognition system’. Proc. Int. Conf. on Intelligent Control and Information Processing, Dalian, China, August 2010, pp. 7375.
    2. 2)
    3. 3)
      • 36. Pearce, D., Hirsch, H.: ‘The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions’. Proc. ISCA ITRW ASR2000, Paris, France, September 2000, pp. 2932.
    4. 4)
    5. 5)
    6. 6)
    7. 7)
      • 29. Yang, X., Tan, B., Zhang, J., Gong, J.: ‘Comparative study on voice activity detection algorithm’. Proc. Int. Conf. on Electrical and Control Engineering (ICECE), Wuhan, China, June 2010, pp. 599602.
    8. 8)
    9. 9)
    10. 10)
      • 7. Wan, C., Liu, L.: ‘Research and improvement on embedded system application of DTW-based speech recognition’. Proc. Int. Conf. on Anti-counterfeiting, Security and Identification, Guiyang, China, August 2008, pp. 401404.
    11. 11)
    12. 12)
      • 22. Zhang, J., Qin, B.: ‘DTW speech recognition algorithm of optimization template matching’. Proc. World Automation Congress, Puerto Vallarta, Mexico, June 2012, pp. 14.
    13. 13)
      • 30. Aida–Zade, К.R., Ardil, C., Rustamov, S.S.: ‘Investigation of combined use of MFCC and LPC features in speech recognition systems’, Int. J. Electr. Comput. Electron. Commun. Eng., 2008, 2, (7), pp. 612.
    14. 14)
    15. 15)
    16. 16)
      • 25. Abdulla, W.H., Chow, D., Sin, G.: ‘Cross-words reference template for DTW-based speech recognition’. Proc. IEEE Conf. on Convergent Technologies for the Asia-Pacific, Bangalore, India, October 2003, pp. 15761579.
    17. 17)
      • 8. Wang, D., Zhang, L., Liu, J., Liu, R.: ‘Embedded speech recognition system on 8-bit MCU core’. Proc. IEEE Int. Conf. on Acoustic, Speech, and Signal Processing, Montreal, Canada, May 2004, pp. 301304.
    18. 18)
    19. 19)
      • 23. Phadke, S., Limaye, R., Verma, S., Subramanian, K.: ‘On design and implementation of an embedded automatic speech recognition system’. Proc. Int. Conf. on VLSI Design, Mumbai, India, January 2004, pp. 127132.
    20. 20)
    21. 21)
      • 28. Lokhande, N.N., Nehe, N.S., Vikhe, P.S.: ‘Voice activity detection algorithm for speech recognition applications’. Proc. Int. Conf. on Computational Intelligence (ICCIA), New York, USA, March 2012.
    22. 22)
    23. 23)
      • 5. Liu, B.: ‘Research and implementation of the speech recognition technology based on DSP’. Proc. Int. Conf. on Artificial Intelligence, Management Science and Electronic Commerce, Zhengzhou, China, August 2011, pp. 41884191.
    24. 24)
      • 13. Levy, C., Linares, G., Nocera, P., Bonastre, J.-F.: ‘Reducing computational and memory cost for cellular phone embedded speech recognition system’. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Quebec, Canada, May 2004, pp. 309312.
    25. 25)
      • 15. Kouemou, G., Opitz, F.: ‘Hidden Markov models in radar target classification’. Proc. Inc. IET Conf. on Radar Systems, Edinburgh, UK, October 2007, pp. 15.
    26. 26)
      • 9. Chou, C.-H., He, G.-H., Chen, B.-W., et al: ‘Speaker-independent isolated word recognition based on enhanced cross-words reference templates for embedded systems’. Proc. Hong Kong Int. Conf. on Engineering and Applied Science, Marriott, Hong Kong, December 2012, pp. 409413.
    27. 27)
      • 34. George, E.P.B., Gwilym, M.J., Gregory, C.R.: ‘Autocorrelation properties of stationary models’, in George, E.P.B., Gwilym, M.J., Gregory, C.R. (Eds.): ‘Time series analysis: forecasting and control’ (John Wiley & Sons, 2013, 4th edn.), pp. 2135.
    28. 28)
    29. 29)
    30. 30)
      • 26. Rao, P.: ‘Audio features for classification’, in Prasad, B., Prasanna, S.R.M. (Eds.): ‘Speech, audio, image and biomedical signal processing using neural networks’ (Springer, 2008, 1st edn.), pp. 175179.
    31. 31)
    32. 32)
      • 4. Qu, Q., Li, L.: ‘Realization of embedded speech recognition module based on STM32’. Proc. IEEE Int. Symp. on Communications and Information Technologies, Hangzhou, China, October 2011, pp. 7377.
    33. 33)
      • 17. Chee, L.-S., Ooi, C.-A., Hariharan, M., Yaacob, S.: ‘MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA’. Proc. IEEE Student Conf. Research and Development, University Putra Malaysia, Malaysia, November 2009, pp. 146149.
    34. 34)
      • 27. Yusuke, K., Tatsuya, K.: ‘Evaluation of voice activity detection by combining multiple features with weight adaptation’. Proc. Int. Conf. on Spoken Language Processing (ICSLP), September 2006, pp. 19661969.
    35. 35)
      • 14. Vacher, M., Istrate, D., Serignat, J.F.: ‘Speech and sound analysis: an application of probabilistic models’. Proc. Int. Symp. On System Theory, Automation, Robotics, Computers, Informatics, Electronics and Instrumentation, Craiova, Romania, October 2007, pp. 173178.
    36. 36)
    37. 37)
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2014.0008
Loading

Related content

content/journals/10.1049/iet-cdt.2014.0008
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading