This work realises a memory-efficient embedded automatic speech recognition (ASR) system on a resource-constrained platform. A buffering method called ultra-low queue-accumulator buffering is presented to efficiently use the constrained memory to extract the linear prediction cepstral coefficient (LPCC) feature in the embedded ASR system. The optimal order of the LPCC is evaluated to balance the recognition accuracy and the computational cost. In the decoding part, the proposed enhanced cross-words reference templates (CWRTs) method is incorporated into the template matching method to reach the speaker-independent characteristic of ASR tasks without the large memory burden of the conventional CWRTs method. The proposed techniques are implemented on a 16-bit microprocessor GPCE063A platform with a 49.152 MHz clock, using a sampling rate of 8 kHz. Experimental results demonstrate that recognition accuracy reaches 95.22% in a 30-sentence speaker-independent embedded ASR task, using only 0.75 kB RAM.

References

1. 1)
  - 6. Zhang, J.: ‘Research of improved DTW algorithm in embedded speech recognition system’. Proc. Int. Conf. on Intelligent Control and Information Processing, Dalian, China, August 2010, pp. 73–75.
2. 2)
  - 24. Levinson, S.E., Rabiner, L.R., Rosenberg, A.E., Wilpon, J.G.: ‘Interactive clustering techniques for selecting speaker-independent reference templates for isolated word recognition’, IEEE Trans. Acoust. Speech Signal Process., 1979, 27, (2), pp. 134–141 (doi: 10.1109/TASSP.1979.1163222).
3. 3)
  - 36. Pearce, D., Hirsch, H.: ‘The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions’. Proc. ISCA ITRW ASR2000, Paris, France, September 2000, pp. 29–32.
4. 4)
  - 21. Park, J.-S., Jang, G.-J., Kim, J.-H.: ‘Multistage utterance verification for keyword recognition-based online spoken content retrieval’, IEEE Trans. Consumer Electron., 2012, 58, (3), pp. 1000–1005 (doi: 10.1109/TCE.2012.6311348).
5. 5)
  - 20. De Wachter, M., Matton, M., Demuynck, K., Wambacq, P., Cools, R., Van Compernolle, D.: ‘Template-based continuous speech recognition’, IEEE Trans. Audio, Speech Lang. Process., 2007, 15, (4), pp. 1377–1390 (doi: 10.1109/TASL.2007.894524).
6. 6)
  - 16. Siniscalchi, S.M., Yu, D., Deng, L., Lee, C.-H.: ‘Speech recognition using long-span temporal patterns in a deep network model’, IEEE Signal Process. Lett., 2013, 20, (3), pp. 201–204 (doi: 10.1109/LSP.2013.2237901).
7. 7)
  - 29. Yang, X., Tan, B., Zhang, J., Gong, J.: ‘Comparative study on voice activity detection algorithm’. Proc. Int. Conf. on Electrical and Control Engineering (ICECE), Wuhan, China, June 2010, pp. 599–602.
8. 8)
  - 31. Antoniol, G., Rollo, V.F., Venturi, G.: ‘Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories’, Newslett. ACM SIGSOFT Softw. Eng. Notes, 2005, 30, (4), pp. 1–5 (doi: 10.1145/1082983.1083156).
9. 9)
  - 37. Do, C.-T., Pastor, D., Goalic, A.: ‘On the recognition of cochlear implant-like spectrally reduced speech with MFCC and HMM-based ASR’, IEEE Trans. Audio Speech Lang. Process., 2010, 18, (5), pp. 1065–1068 (doi: 10.1109/TASL.2009.2032945).
10. 10)
  - 7. Wan, C., Liu, L.: ‘Research and improvement on embedded system application of DTW-based speech recognition’. Proc. Int. Conf. on Anti-counterfeiting, Security and Identification, Guiyang, China, August 2008, pp. 401–404.
11. 11)
  - 12. Kim, H.K., Choi, S.H., Lee, H.S.: ‘On approximating line spectral frequencies to LPC cepstral coefficients’, IEEE Trans. Speech Audio Process., 2000, 8, (2), pp. 195–199 (doi: 10.1109/89.824705).
12. 12)
  - 22. Zhang, J., Qin, B.: ‘DTW speech recognition algorithm of optimization template matching’. Proc. World Automation Congress, Puerto Vallarta, Mexico, June 2012, pp. 1–4.
13. 13)
  - 30. Aida–Zade, К.R., Ardil, C., Rustamov, S.S.: ‘Investigation of combined use of MFCC and LPC features in speech recognition systems’, Int. J. Electr. Comput. Electron. Commun. Eng., 2008, 2, (7), pp. 6–12.
14. 14)
  - 1. Choi, J.-H., Xu, Y., Sakurai, T.: ‘Statistical leakage current reduction in high-leakage environments using locality of block activation in time domain’, IEEE J. Solid-State Circuits, 2004, 39, (9), pp. 1497–1503 (doi: 10.1109/JSSC.2004.829380).
15. 15)
  - 2. Zhang, L., Wu, C., Mao, L.-F., Zheng, J.: ‘Integrated SRAM compiler with clamping diode to reduce leakage and dynamic power in nano-CMOS process’, IET Micro Nano Lett., 2012, 7, (2), pp. 171–173 (doi: 10.1049/mnl.2011.0680).
16. 16)
  - 25. Abdulla, W.H., Chow, D., Sin, G.: ‘Cross-words reference template for DTW-based speech recognition’. Proc. IEEE Conf. on Convergent Technologies for the Asia-Pacific, Bangalore, India, October 2003, pp. 1576–1579.
17. 17)
  - 8. Wang, D., Zhang, L., Liu, J., Liu, R.: ‘Embedded speech recognition system on 8-bit MCU core’. Proc. IEEE Int. Conf. on Acoustic, Speech, and Signal Processing, Montreal, Canada, May 2004, pp. 301–304.
18. 18)
  - 10. Wang, J.-F., Kuan, T.-W., Wang, J.-C., Sun, T.-W.: ‘Dynamic fixed-point arithmetic design of embedded SVM-based speaker identification system’, Lect. Notes Comput. Sci., 2010, 6064, pp. 524–531 (doi: 10.1007/978-3-642-13318-3_65).
19. 19)
  - 23. Phadke, S., Limaye, R., Verma, S., Subramanian, K.: ‘On design and implementation of an embedded automatic speech recognition system’. Proc. Int. Conf. on VLSI Design, Mumbai, India, January 2004, pp. 127–132.
20. 20)
  - 19. Solera-Urena, R., Garcia-Moral, A.I., Pelaez-Moreno, C., Martinez-Ramon, M., Diaz-de-Maria, F.: ‘Real-time robust automatic speech recognition using compact support vector machines’, IEEE Trans. Audio Speech Lang. Process., 2012, 20, (4), pp. 1347–1361 (doi: 10.1109/TASL.2011.2178597).
21. 21)
  - 28. Lokhande, N.N., Nehe, N.S., Vikhe, P.S.: ‘Voice activity detection algorithm for speech recognition applications’. Proc. Int. Conf. on Computational Intelligence (ICCIA), New York, USA, March 2012.
22. 22)
  - 33. Itakura, F.: ‘Minimum prediction residual principle applied to speech recognition’, IEEE Trans. Acoust. Speech Signal Process., 1975, 23, (1), pp. 67–72 (doi: 10.1109/TASSP.1975.1162641).
23. 23)
  - 5. Liu, B.: ‘Research and implementation of the speech recognition technology based on DSP’. Proc. Int. Conf. on Artificial Intelligence, Management Science and Electronic Commerce, Zhengzhou, China, August 2011, pp. 4188–4191.
24. 24)
  - 13. Levy, C., Linares, G., Nocera, P., Bonastre, J.-F.: ‘Reducing computational and memory cost for cellular phone embedded speech recognition system’. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Quebec, Canada, May 2004, pp. 309–312.
25. 25)
  - 15. Kouemou, G., Opitz, F.: ‘Hidden Markov models in radar target classification’. Proc. Inc. IET Conf. on Radar Systems, Edinburgh, UK, October 2007, pp. 1–5.
26. 26)
  - 9. Chou, C.-H., He, G.-H., Chen, B.-W., et al: ‘Speaker-independent isolated word recognition based on enhanced cross-words reference templates for embedded systems’. Proc. Hong Kong Int. Conf. on Engineering and Applied Science, Marriott, Hong Kong, December 2012, pp. 409–413.
27. 27)
  - 34. George, E.P.B., Gwilym, M.J., Gregory, C.R.: ‘Autocorrelation properties of stationary models’, in George, E.P.B., Gwilym, M.J., Gregory, C.R. (Eds.): ‘Time series analysis: forecasting and control’ (John Wiley & Sons, 2013, 4th edn.), pp. 21–35.
28. 28)
  - 18. Lim, C., Chang, J.-H.: ‘Enhancing support vector machine-based speech/music classification using conditional maximum a posteriori criterion’, IET Signal Processing, 2012, 6, (4), pp. 335–340 (doi: 10.1049/iet-spr.2011.0139).
29. 29)
  - 32. Cybenko, G.: ‘The numerical stability of the Levinson-Durbin algorithm for Toeplitz systems of equations’, SIAM J. Sci. Stat. Comput., 1980, 1, (3), pp. 303–319 (doi: 10.1137/0901021).
30. 30)
  - 26. Rao, P.: ‘Audio features for classification’, in Prasad, B., Prasanna, S.R.M. (Eds.): ‘Speech, audio, image and biomedical signal processing using neural networks’ (Springer, 2008, 1st edn.), pp. 175–179.
31. 31)
  - 3. Kim, N.S., Austin, T., Baauw, D., et al: ‘Leakage current: Moore's law meets static power’, Computer, 2003, 36, (12), pp. 68–75 (doi: 10.1109/MC.2003.1250885).
32. 32)
  - 4. Qu, Q., Li, L.: ‘Realization of embedded speech recognition module based on STM32’. Proc. IEEE Int. Symp. on Communications and Information Technologies, Hangzhou, China, October 2011, pp. 73–77.
33. 33)
  - 17. Chee, L.-S., Ooi, C.-A., Hariharan, M., Yaacob, S.: ‘MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA’. Proc. IEEE Student Conf. Research and Development, University Putra Malaysia, Malaysia, November 2009, pp. 146–149.
34. 34)
  - 27. Yusuke, K., Tatsuya, K.: ‘Evaluation of voice activity detection by combining multiple features with weight adaptation’. Proc. Int. Conf. on Spoken Language Processing (ICSLP), September 2006, pp. 1966–1969.
35. 35)
  - 14. Vacher, M., Istrate, D., Serignat, J.F.: ‘Speech and sound analysis: an application of probabilistic models’. Proc. Int. Symp. On System Theory, Automation, Robotics, Computers, Informatics, Electronics and Instrumentation, Craiova, Romania, October 2007, pp. 173–178.
36. 36)
  - 11. Wang, J.-F., Kuan, T.-W., Wang, J.-C., Gu, G.-H.: ‘Ubiquitous and robust text-independent speaker recognition for home automation digital life’, Lect. Notes Comput. Sci., 2008, 5061, pp. 297–310 (doi: 10.1007/978-3-540-69293-5_24).
37. 37)
  - N. Kong , L.B. Milstein . Average SNR of a generalized diversity selection combining scheme. IEEE Commun. Lett. , 57 - 59

Memory-efficient buffering method and enhanced reference template for embedded automatic speech recognition system

References

Related content