Comparative study of automatic speech recognition techniques

Michelle Cutajar; Edward Gatt; Ivan Grech; Owen Casha; Joseph Micallef

Comparative study of automatic speech recognition techniques

View Fulltext

Author(s): Michelle Cutajar ¹ ; Edward Gatt ¹ ; Ivan Grech ¹ ; Owen Casha ¹ ; Joseph Micallef ¹
- Affiliations: 1: Faculty of Information and Communication Technology, Department of Microelectronics and Nanoelectronics, University of Malta, Tal-Qroqq, Msida, MSD 2080, Malta
Source: Volume 7, Issue 1, February 2013, p. 25 – 46
DOI: 10.1049/iet-spr.2012.0151 , Print ISSN 1751-9675, Online ISSN 1751-9683

Received 21/05/2012, Accepted 08/01/2013, Revised 26/11/2012, Published

Over the past decades, extensive research has been carried out on various possible implementations of automatic speech recognition (ASR) systems. The most renowned algorithms in the field of ASR are the mel-frequency cepstral coefficients and the hidden Markov models. However, there are also other methods, such as wavelet-based transforms, artificial neural networks and support vector machines, which are becoming more popular. This review article presents a comparative study on different approaches that were proposed for the task of ASR, and which are widely used nowadays.

References

1. 1)
  - 27. Muller, D.N., de Siqueira, M.L., Navaux, P.O.A.: ‘A connectionist approach to speech understanding’. Int. Joint Conf. on Neural Networks, 2006 (IJCNN’06), Vancouver, BC, 2006, pp. 3790–3797.
2. 2)
  - 55. Sarosi, G., Mozsary, M., Mihajlik, P., Fegyo, T.: ‘Comparison of feature extraction methods for speech recognition in noise-free and in traffic noise environment’. Sixth Conf. on Speech Technology and Human-Computer Dialogue (SpeD), 2011, Brasov, 2011, pp. 1–8.
3. 3)
  - 70. Abdulla, W.H., Kasabov, N.: ‘The concepts of hidden Markov model in speech recognition’ (University of Otago, 1999).
4. 4)
  - 123. Deshmukh, N., Picone, J.: ‘Methodologies for language modeling and search in continuous speech recognition’. Proc. IEEE Southeastcon’95. Visualize the Future, Raleigh, NC, 1995, pp. 192–198.
5. 5)
  - 66. Lazli, L., Sellami, M.: ‘Connectionist probability estimators in HMM Arabic speech recognition using fuzzy logic’. Proc. MLDM, 2003, pp. 379–388.
6. 6)
  - 72. Lee, K.F.H.H.W.: ‘Speaker-independent phone recognition using hidden Markov models’, IEEE Trans. Acoust. Speech Signal Process., 1989, 37, (11), pp. 1641–1648.
7. 7)
  - 17. Alkhaldi, W., Fakhr, W., Hamdy, N.: ‘Automatic speech recognition in noisy environments using wavelet transform’. 2002. Available from: http://www.wseas.us/e-library/conferences/skiathos2002/papers/447-231.pdf.
8. 8)
  - 100. Tang, H., Meng, C.H., Lee, L.S.: ‘An initial attempt for phoneme recognition using Structured Support Vector Machine (SVM)’. 2010 IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP), Dallas, TX, 2010, pp. 4926–4929.
9. 9)
  - 106. Franc, V., Hlavac, V.: ‘Multi-class support vector machine’. Proc. ICPR, Quebec, 2002, pp. 236–239.
10. 10)
  - 26. Toth, L.: ‘A hierarchical, context-dependent neural network architecture for improved phone recognition’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2011, Prague, 2011, pp. 5040–5043.
11. 11)
  - 60. Lee, J.Y., Hung, J.: ‘Exploiting principal component analysis in modulation spectrum enhancement for robust speech recognition’. 2011 Eighth Int. Conf. on Fuzzy Systems and Knowledge Discovery (FSKD), Shanghai, 2011, pp. 1947–1951.
12. 12)
  - 36. Sudhakar, : ‘The discrete wavelet transform’. 2003. Available from: http://etd.lib.fsu.edu/theses/available/etd-11242003-185039/unrestricted/09_ds_chapter2.pdf.
13. 13)
  - 49. Nataraj, K.S., Jagbandhu, J., Pandey, P.C., Shah, M.S.: ‘Improving the consistency of vocal tract shape estimation’. National Conf. on Communications (NCC), 2011, Bangalore, 2011, pp. 1–5.
14. 14)
  - 79. Jiang, H., Li, X., Liu, C.: ‘Large margin hidden Markov models for speech recognition’, IEEE Trans. Audio Speech Language Process., 2006, 14, (5), pp. 1584–1595.
15. 15)
  - 28. Smaragdis, P., Radhakrishnan, R., Wilson, K.W.: ‘Content extraction through audio signal analysis’, in Divakaran, A., (Ed.): ‘Multimedia content analysis’ (Springer, 2009), pp. 1–34.
16. 16)
  - 124. Rosenfeld, R.: ‘Two decades of statistical language modeling: where do we go from here?’, Proc. IEEE, 2000, 88, (8), pp. 1270–1278.
17. 17)
  - 84. Ahad, A., Fayyaz, A., Mehmood, T.: ‘Speech recognition using multilayer perceptron’. IEEE Proc. Students Conf., 2002 (ISCON’02), 2002, pp. 103–109.
18. 18)
  - 33. Nehe, N.S., Holambe, R.S.: ‘New feature extraction techniques for Marathi digit recognition’, Int. J. Recent Trends Eng., 2009, 2, (2), pp. 22–24.
19. 19)
  - 68. Juang, B.H., Rabiner, L.R.: ‘Hidden Markov models for speech recognition’, Technometrics, 1991, 33, (3), pp. 251–272.
20. 20)
  - 114. Ganapathiraju, A., Hamaker, J.E., Picone, J.: ‘Applications of support vector machines to speech recognition’, IEEE Trans. Signal Process., 2004, 52, (8), pp. 2348–2355.
21. 21)
  - 47. Thiang, , Wijoyo, S.: ‘Speech recognition using linear predictive coding and artificial neural network for controlling movement of mobile robot’. Int. Conf. on Information and Electronics Engineering, Singapore, 2011, pp. 179–183.
22. 22)
  - 22. Du, X.P., He, P.L.: ‘The clustering solution of speech recognition models with SOM’. Lecture Notes in Computer Science. Advances in Neural Networks – ISNN 2006 (SpringerBerlin/Heidelberg, 2006), pp. 150–157.
23. 23)
  - 6. Junior, S.B., Guido, R.C., Chen, S., Vieira, L.S., Sanchez, F.L.: ‘Improved dynamic time warping based on the discrete wavelet transform’. Ninth IEEE Int. Symp. Multimedia Workshops, 2007 (ISMW’07), Taichung, Taiwan, pp. 256–263.
24. 24)
  - 2. Hennebert, J., Hasler, M., Dedieu, H.: ‘Neural networks in speech recognition’. Sixth Microcomputer School, Prague, Czech Republic, 1994, pp. 23–40.
25. 25)
  - 80. Sha, F., Saul, L.K.: ‘Large margin hidden Markov models for automatic speech recognition’, Adv. Neural Inf. Process. Syst., 2007, 1, pp. 1249–1256.
26. 26)
  - 107. Hsu, C.W., Lin, C.J.: ‘A comparison of methods for multiclass support vector machines’, IEEE Trans. Neural Netw., 2002, 13, (2), pp. 415–425.
27. 27)
  - 50. Cheng, O., Abdulla, W., Salcic, Z.: ‘Performance evaluation of front-end processing for speech recognition systems’. School of Engineering Report. The University of Auckland, Electrical and Computer Engineering, 2005. Report No. 621..
28. 28)
  - 103. Solera-Urena, R., Padrell-Sendra, J., Martin-Iglesias, D., Gallardo-Antolin, A., Pelaez-Moreno, C., Diaz-De-Maria, F.: ‘SVMs for automatic speech recognition: a survey’, Progress in nonlinear speech processing (Springer-Verlag, Berlin, Heidelberg, 2007), pp. 190–216.
29. 29)
  - 73. Ketabdar, H., Bourlard, H.: ‘Enhanced phone posteriors for improving speech recognition systems’, IEEE Trans. Audio Speech Lang. Process., 2010, 18, (6), pp. 1094–1106.
30. 30)
  - 53. Yusof, Z., Ahmed, M.: ‘2009. Available from: http://rps.bmi.unikl.edu.my/jnp/archive/2009/2009-197.pdf.
31. 31)
  - 119. García Moral, A.I., Solera Ureña, R., Peláez-Moreno, C., Díaz-de-María, F.: ‘Hybrid models for automatic speech recognition: a comparison of classical ANN and kernel based methods’. (Springer, 2007, LNCS), pp. 51–54.
32. 32)
  - 111. Abe, S.: ‘Analysis of multiclass support vector machines’. Proc. Int. Conf. on Computational Intelligence for Modelling, Control and Automation (CIMCA 2003), Vienna, Austria, 2003, pp. 385–396.
33. 33)
  - 76. Jung, S., Son, J., Bae, K.: ‘Feature extraction based on wavelet domain hidden Markov tree model for robust speech recognition’. AI 2004: Advances in Artificial Intelligence, (Springer, Berlin/Heidelberg, 2004), pp. 1154–1159.
34. 34)
  - 134. Sayers, C.: ‘Self Organising Feature Maps and their Applications to Robotics’. Technical Reports (CIS). Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, 1991. Report No.: MS-CIS-91-46.
35. 35)
  - 127. Zhao, Y., Wakita, H., Zhuang, X.: ‘An HMM based speaker-independent continuous speech recognition system with experiments on the TIMIT database’. 1991 Int. Conf. on Acoustics Speech and Signal Processing, 1991 (ICASSP-91), Toronto, ON, 1991, pp. 333–336.
36. 36)
  - 67. Birkenes, Ø., Matsui, T., Tanabe, K., Siniscalchi, S.M., Myrvoll, T.A., Johnsen, M.H.: ‘Penalised logistic regression with HMM log-likelihood regressors for speech recognition’, IEEE Trans. Audio Speech Language Process., 2010, 18, (6), pp. 1440–1454.
37. 37)
  - 1. Vimal Krishnan, V.R., Babu Anto, P.: ‘Feature parameter extraction from wavelet subband analysis for the recognition of isolated malayalam spoken words’, Int. J. Comput. Netw. Secur., 2009, 1, (1), pp. 52–55.
38. 38)
  - 118. Toth, L., Kocsor, A.: ‘Application of kernel-based feature space transformations and learning methods to phoneme classification’, Appl. Intell., 2004, 21, (2), pp. 129–142.
39. 39)
  - 78. Young, S., Evermann, G.M.G., Hain, T., Kershaw, D.: ‘HTK - Hidden Markov Model Toolkit (Ver 3.4)’. 2006. Available from: http://htk.eng.cam.ac.uk/.
40. 40)
  - 87. Cutajar, M., Gatt, E., Micallef, J., Grech, I., Casha, O.: ‘Digital hardware implementation of Self-Organising Maps’. 15th IEEE Mediterranean Electrotechnical Conf. MELECON 2010 – 2010, Valletta, 2010, pp. 1123–1128.
41. 41)
  - 77. Chang, T.H., Luo, Z.Q., Deng, L., Chi, C.Y.: ‘A convex optimisation method for joint mean and variance parameter estimation of large-margin CDHMM’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2008 (ICASSP 2008), Las Vegas, NV, 2008, pp. 4053–4056.
42. 42)
  - 135. Hao, Y., Tiantian, X., Paszczynski, S., Wilamowski, B.M.: ‘Advantages of radial basis function networks for dynamic system design’, IEEE Trans. Ind. Electron., 2011, 58, (12), pp. 5438–5450.
43. 43)
  - 85. Pour, M.M., Farokhi, F.: ‘A new approach for Persian speech recognition’. IEEE Int. Advance Computing Conf., 2009 (IACC 2009), Patiala, 2009, pp. 153–158.
44. 44)
  - 94. Veera, A.K.: ‘Speech recognition based on artificial neural networks’. 2004. Available from: http://www.cis.hut.fi/Opinnot/T-61.6040/pellom-2004/project-reports/project_07.pdf.
45. 45)
  - 117. Li, J.: ‘An empirical comparison between SVMs and ANNs for speech recognition’. The First Instructional Conf. on Machine Learning, iCML-2003, 2003.
46. 46)
  - 71. Rabiner, L.: ‘A tutorial on HMM and selected applications in speech recognition’, Proc. IEEE, 1989, 77, (2), pp. 257–286.
47. 47)
  - 58. Hu, X., Zhan, L., Xue, Y., Zhou, W., Zhang, L.: ‘Spoken arabic digits recognition based on wavelet neural networks’. 2011 IEEE Int. Conf. on Systems, Man and Cybernetics (SMC), Anchorage, AK, 2011, pp. 1481–1485.
48. 48)
  - 32. Xuefei, L.: ‘A new wavelet threshold denoising algorithm in speech recognition’. Asia-Pacific Conf. on Information Processing, 2009 (APCIP 2009), Shenzhen, 2009, pp. 310–313.
49. 49)
  - 133. Fontaine, V., Ris, C., Leich, H.: ‘Nonlinear discriminant analysis with neural networks for speech recognition’. Proc. EUSIPCO 96, EURASIP1996, pp. 1583–1586.
50. 50)
  - 37. Vetterli, M., Herley, C.: ‘Wavelets and filter banks: relationships and new results’. 1990 Int. Conf. on Acoustics, Speech, and Signal Processing, 1990 (ICASSP'90), Albuquerque, NM, USA, 1990, pp. 1723–1726.
51. 51)
  - 62. Viszlay, P., Juhaar, J., Pleva, M.: ‘Alternative phonetic class definition in linear discriminant analysis of speech’. 19th Int. Conf. on Systems, Signals and Image Processing (IWSSIP), 2012, Vienna, 2012, pp. 655–658.
52. 52)
  - 15. Zamani, B., Akbari, A., Nasersharif, B., Jalalvand, A.: ‘Optimised discriminative transformations for speech features based on minimum classification error’, Pattern Recognit. Lett., 2011, 32, (7), pp. 948–955.
53. 53)
  - 31. Gupta, M., Gilbert, A.: ‘Robust speech recognition using wavelet coefficient features’. IEEE Workshop on Automatic Speech Recognition and Understanding, 2001 (ASRU ’01), 2001, pp. 445–448.
54. 54)
  - 4. O'Shaughnessy, D.: ‘Invited paper: automatic speech recognition: history, methods and challenges’, Pattern Recognit., 2008, 41, (10), pp. 2965–2979.
55. 55)
  - 130. Kesarkar, M.P.: ‘Feature extraction for speech recognition’. M.Tech. Credit Seminar Report. Electronic Systems Group, EE. Department, IIT, Bombay, 2003.
56. 56)
  - 3. Forsberg, M: ‘Why is speech recognition difficult?’, Chalmers University of Technology, 2003, http://www.speech.kth.se/~rolf/gslt_papers/MarkusForsberg.pdf.
57. 57)
  - 29. Wicks, M.A.: ‘The mel frequency scale and coefficients’. 1998. Available from: http://kom.aau.dk/group/04gr742/pdf/MFCC_worksheet.pdf.
58. 58)
  - 38. Hunt, A., Favero, R.: ‘Using principal component analysis with wavelets in speech recognition’. SST Conf., ASSTA Inc., Perth, 1994, pp. 296–301.
59. 59)
  - 14. Saha, G., Chakraborty, S., Senapati, S.: ‘A new silence removal and endpoint detection algorithm for speech and speaker recognition applications’. Proc. NCC 2005, 2005.
60. 60)
  - 25. Nouza, J., Zdansky, J., Cerva, P.: ‘System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search’. MELECON 2010 – 2010 15th IEEE Mediterranean Electrotechnical Conf., Valletta, 2010, pp. 202–205.
61. 61)
  - 56. Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: ‘RASTA-PLP speech analysis’. ICSI Technology Report. Internation Computer Science Institute, Berkeley, CA, 1991. Report No.: TR-91-069.
62. 62)
  - 57. Anusuya, M.A., Katti, S.K.: ‘Comparison of different speech feature extraction techniques with and without wavelet transform to Kannada speech recognition’, Int. J. Comput. Appl., 2011, 26, (4), pp. 19–23.
63. 63)
  - 16. Vimal Krishnan, V.R., Babu Anto, P.: ‘Features of wavelet packet decomposition and discrete wavelet transform for malayalam speech recognition’, Recent Trends Eng., 2009, 1, (2), pp. 93–96.
64. 64)
  - 86. Sivaram, G.S.V.S., Hermansky, H.: ‘Sparse multilayer perceptron for phoneme recognition’, IEEE Trans. Audio, Speech Lang. Process., 2012, 20, (1), pp. 23–29.
65. 65)
  - 5. Ranjan, S.: ‘A discrete wavelet transform based approach to Hindi speech recognition’. Int. Conf. on Signal Acquisition and Processing, 2010 (ICSAP’10), Bangalore, 2010, pp. 345–348.
66. 66)
  - 11. Rabiner, L.R., Schafer, R.W.: ‘Digital processing of speech signals’ (Prentice-Hall, 1978).
67. 67)
  - 65. Paulson, L.D.: ‘Speech recognition moves from software to hardware’, Computer, 2006, 39, (11), pp. 15–18.
68. 68)
  - 63. Garau, G., Renals, S.: ‘Combining spectral representations for large vocabulary continuous speech recognition’, IEEE Trans. Audio Speech Language Process., 2008, 16, (3), pp. 508–518.
69. 69)
  - 121. Dowding, J.: ‘Reducing search by partitioning the word network’. Proc. Workshop on Speech and Natural Language, 1989.
70. 70)
  - 75. Crouse, M.S., Nowak, R.D., Baraniuk, R.G.: ‘Wavelet-based statistical signal processing using hidden Markov models’, IEEE Trans. Signal Process., 1998, 46, (4), pp. 886–902.
71. 71)
  - 105. Weston, J., Watkins, C.: ‘Support vector machines for multiclass pattern recognition’. Proc. Seventh European Symp. Artificial Neural Networks, 1999, pp. 219–224.
72. 72)
  - 61. Takiguchi, T., Ariki, Y.: ‘PCA-based speech enhancement for distorted speech recognition’, J. Multimedia, 2007, 2, pp. 13–18.
73. 73)
  - 48. Bradbury, J.: ‘Linear predictive coding’. 2000. Available from: http://my.fit.edu/~vKepuska/ece5525/lpc_paper.pdf.
74. 74)
  - 126. Illina, I., Gong, Y.: ‘Improvement in N-best search for continuous speech recognition’. Proc. Fourth Int. Conf. on Spoken Language, 1996 (ICSLP’96), 1996, pp. 2147–2150.
75. 75)
  - 129. Wei, L., Weisheng, H.: ‘Improved Viterbi algorithm in continuous speech recognition’. 2010 Int. Conf. on Computer Application and System Modeling (ICCASM), Taiyuan, 2010, pp. 207–209.
76. 76)
  - 88. Cutajar, M., Gatt, E.: ‘Digital implementation of Self-Organising Maps. Final year project, Faculty of Engineering, Department of Microelectronics Engineering, Malta, 2009.
77. 77)
  - 93. Hou, X.: ‘Noise robust speech recognition based on wavelet-RBF neural network’. Poc. SPIE, 2009, vol. 7490.
78. 78)
  - 108. Duan, K., Keerthi, S.S.: ‘Which is the best multiclass SVM method? an empirical study’. Proc. Multiplie Classifier Systems, 2005, pp. 278–285.
79. 79)
  - 34. Polikar, R.: ‘The wavelet tutorial’. 1996. Available from: http://users.rowan.edu/~polikar/wavelets/wttutorial.html.
80. 80)
  - 7. Vimala, C., Radha, V.: ‘A review on speech recognition challenges and approaches’, World Comput. Sci. Inf. Technol., 2012, 2, (1), pp. 1–7.
81. 81)
  - 124. Rosenfeld, R.: ‘Two decades of statistical language modeling: where do we go from here?’, Proc. IEEE, 2000, 88, (8), pp. 1270–1278.
82. 82)
  - 68. Juang, B.H., Rabiner, L.R.: ‘Hidden Markov models for speech recognition’, Technometrics, 1991, 33, (3), pp. 251–272.
83. 83)
  - 42. Krishnan, M., Neophytou, C.P., Prescott, G.: ‘Wavelet transform speech recognition using vector quantisation, dynamic time warping and artificial neural networks’, Computer Aided Systems Engineering and Telecommunications & Information Science Laboratory, 1994.
84. 84)
  - 131. Thatphithakkul, N., Kruatrachue, B., Wutiwiwatchai, C., Marukatat, S., Boonpiam, V.: ‘Robust speech recognition using pca-based noise classification’. SPECOM, 2005 October, p. 2548.
85. 85)
  - 132. Dengfeng, K., Shuang, X., Bo, X.: ‘Optimization of tone recognition via applying linear discriminant analysis in feature extraction’. 2008 Third Int. Conf. on Innovative Computing Information and Control (ICICIC), Dalian, Liaoning China, 2008, pp. 528–531.
86. 86)
  - 43. Tan, B.T., Fu, M., Spray, A., Dermody, P.: ‘The use of wavelet transforms in phoneme recognition’. Proc. Fourth Int. Conf. on Spoken Language, 1996 (ICSLP’96), Philadelphia, PA, USA, 1996, pp. 2431–2434.
87. 87)
  - 18. Jurafsky, D., Martin, J.H.: ‘Speech and language processing’ (Prentice-Hall, 2009).
88. 88)
  - 109. Hastie, T., Tibshirani, R.: ‘Classification by pairwise coupling’, Annal. Stat., 1998, 26, (2), pp. 451–471.
89. 89)
  - 115. Xiao-feng, L., Xue-ying, Z., Ji-kang, D.: ‘Speech recognition based on support vector machine and error correcting output codes’. 2010 First Int. Conf. on Pervasive Computing Signal Processing and Applications (PCSPA), Harbin, 2010, pp. 336–339.
90. 90)
  - 83. Sivaram, G.S.V.S., Hermansky, H.: ‘Multilayer perceptron with sparse hidden outputs for phoneme recognition’. 2011 IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP), Prague, 2011, pp. 5336–5339.
91. 91)
  - 44. Modic, R., Lindberg, B., Petek, B.: ‘Comparative wavelet and MFCC speech recognition experiments on the Slovenian and English SpeechDat2’. Proc. ISCA Tutorial and Research Workshop on Non-Linear Speech Processing, Denmark, 2003.
92. 92)
  - 102. Sonkamble, B.A., Doye, D.D., Sonkamble, S.: ‘An efficient use of support vector machines for speech signal signal classification’. Proc. Eighth WSEAS Int. Conf. Computational Intelligence, Man–Mmachine Systems and Cybernetics, 2009, pp. 117–120.
93. 93)
  - 8. Anusuya, M., Katti, S.: ‘Front end analysis of speech recognition: a review’, Int. J. Speech Technol., 2011, 14, (2), pp. 99–145.
94. 94)
  - 39. Walker, S.L., Foo, S.Y.: ‘Optimal wavelets for speech signal representations’, Syst. Cybern. Inf., 2003, 1, (4), pp. 44–46.
95. 95)
  - 92. Umarani, S.D., Raviram, P., Wahidabanu, R.S.D.: ‘Implementation of HMM and radial basis function for speech recognition’. Int. Conf. on Intelligent Agent and Multi-Agent Systems, 2009 (IAMA 2009), Chennai, 2009, pp. 1–4.
96. 96)
  - 51. Venkateswarlu, R.L.K., Kumari, R.V.: ‘Novel approach for speech recgonition by using Self-Organised Maps’. 2011 Int. Conf. on Emerging Trends in Networks and Computer Communications (ETNCC), Udaipur, 2011, pp. 215–222.
97. 97)
  - 20. Leung, K.F., Leung, F.H.F., Lam, H.K., Tam, P.K.S.: ‘Recognition of speech commands using a modified neural fuzzy network and an improved GA’. 12th IEEE Int. Conf. on Fuzzy Systems, 2003, (FUZZ'03), Kowloon, China, 2003, pp. 190–195.
98. 98)
  - 13. Mporas, I., Ganchev, T., Siafarikas, M., Fakotakis, N.: ‘Comparison of speech features on the speech recognition task’, J. Comput. Sci., 2007, 3, (8), pp. 608–616.
99. 99)
  - 125. Lecorvé, G., Gravier, G., Sébillot, P.: ‘Automatically finding semantically consistent n-grams to add new words in LVCSR systems’. Proc. ICASSP 2011, 2011, pp. 4676–4679.
100. 100)
  - 69. Nguyen, P., Heigold, G., Zweig, G.: ‘Speech recognition with flat direct models’, Sel. Topics Signal Process. IEEE J., 2010, 4, (6), pp. 994–1006.
101. 101)
  - 46. Tufekci, Z., Gowdy, J.N., Gurbuz, S., Patterson, E.: ‘Applied mel-frequency discrete wavelet coefficients and parallel model compensation for noise-robust speech recognition’, Speech Commun. Sci. Direct, 2006, 48, pp. 1294–1307.
102. 102)
  - 104. Haykin, S.: ‘Neural networks: a comprehensive foundation’ (Prentice-Hall, 1999).
103. 103)
  - 95. Koizumi, T., Mori, M., Taniguchi, S., Maruya, M.: ‘Recurrent neural networks for phoneme recognition’. Proc. Fourth Int. Conf. on Spoken Language, 1996 (ICSLP’96), Philadelphia, 1996, pp. 326–329.
104. 104)
  - 113. Venkataramani, V., Chakrabartty, S., Byrne, W.: ‘Ginisupport vector machines for segmental minimum Bayes risk decoding of continuous speech’, Comput. Speech Lang., 2007, 21, (3), pp. 423–442.
105. 105)
  - 52. Li, T.F., Chang, S.C.: ‘Speech recognition of mandarin syllables using both linear predict coding cepstra and Mel frequency cepstra’. Proc. 19th Conf. on Computational Linguistics and Speech Processing, Taiwan, 2007.
106. 106)
  - 45. Zhou, P., Tang, L.Z., Xu, D.F.: ‘Speech recognition algorithm of parallel subband HMM based on wavelet analysis and neural network’, Inf. Technol. J., 2009, 8, pp. 796–800.
107. 107)
  - 81. Chen, J.C., Chien, J.T.: ‘Bayesian large margin hidden Markov models for speech recognition’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2009 (ICASSP 2009), Taipei, 2009, pp. 3765–3768.
108. 108)
  - 98. Helmi, N., Helmi, B.H.: ‘Speech recognition with fuzzy neural network for discrete words’. 2008 Fourth Int. Conf. on Natural Computation, 2008, pp. 265–269.
109. 109)
  - 101. Kruger, S.E., Schaffoner, M., Katz, M., Andelic, E., Wendemuth, A.: ‘Speech recognition with support vector machines in a hybrid system’. Proc. EuroSpeech 2005, 2005.
110. 110)
  - 19. Liddy, E.D.: ‘Natural language processing in encyclopedia of library and information science’ (Marcel Decker, Inc., NY, 2001, 2nd edn.).
111. 111)
  - 82. Trentin, E., Gori, M.: ‘Robust combination of neural networks and hidden Markov models for speech recognition’, IEEE Trans. Neural Netw., 2003, 14, (6), pp. 1519–1531.
112. 112)
  - 24. Korba, M.C.A., Messadeg, D., Djemili, R.H.B.: ‘Robust speech recognition using perceptual wavelet denoising and mel-frequency product spectrum cepstral coefficient features’, Informatica, 2008, 32, pp. 283–288.
113. 113)
  - 41. Tavanaei, A., Manzuri, M.T., Sameti, H.: ‘Mel-scaled discrete wavelet transform and dynamic features for the Persian phoneme recognition’. Int. Symp. Artificial Intelligence and Signal Processing (AISP), 2011, Tehran, 2011, pp. 138–140.
114. 114)
  - 91. Venkateswarlu, R.L.K., Kumari, R.V., Jayasri, G.V.: ‘Speech recognition using radial basis function neural network’. Third Int. Conf. on Electronics Computer Technology (ICECT), 2011, Kanyakumari, 2011, pp. 441–445.
115. 115)
  - 9. Morgan, N.: ‘Deep and wide: multiple layers in automatic speech recognition’, IEEE Trans Audio Speech Lang. Process., 2012, 20, (1), pp. 7–13.
116. 116)
  - 120. Jamieson, K., Gupta, M.R., Swanson, E., Anderson, H.S.: ‘Training a support vector machine to classify signals in a real environment given clean training data’. 2010 IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP), Dallas, TX, 2010, pp. 2214–2217.
117. 117)
  - 90. Paul, A.K., Das, D., Kamal, M.M.: ‘Bangla speech recognition system using LPC and ANN’. Seventh Int. Conf. on Advances in Pattern Recognition, 2009 (ICAPR’09), Kolkata, 2009, pp. 171–174.
118. 118)
  - 99. Sabah, R., Aino, R.N.: ‘Isolated digit speech recognition in Malay language using neuro-fuzzy approach’. 2009 Third Asia Int. Conf. on Modelling and Simulation, 2009, pp. 336–340.
119. 119)
  - 97. Uma Maheswari, N., Kabilan, A.P., Venkatesh, R.: ‘Speaker independent phoneme recognition using neural networks’, J. Theoret. Appl. Inf. Technol., 2009, 6(2), pp. 230–235.
120. 120)
  - 96. Vinyals, O., Ravuri, S.V., Povey, D.: ‘Revisiting recurrent neural networks for robust ASR’. 2012 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2012, Kyoto, pp. 4085–4088.
121. 121)
  - 10. O'Shaugnessy, D.: ‘Interacting with computers by voice: automatic speech recognition and synthesis’, Proc. IEEE, 2003, 91, (9), pp. 1272–1305.
122. 122)
  - 116. Thubthong, N., Kijsirikul, B.: ‘Support vector machines for Thai phoneme recognition’, Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 2001, 9, (6), pp. 803–813.
123. 123)
  - 23. Ben Messaoud, Z., Ben Hamida, A.: ‘CDHMM parameters selection for speaker-independent phone recognition in continuous speech system’. MELECON 2010 – 2010 15th IEEE Mediterranean Electrotechnical Conf., Valletta, 2010, pp. 253–258.
124. 124)
  - 54. Ganapathy, S., Thomas, S., Hermansky, H.: ‘Modulation frequency features for phoneme recognition in noisy speech’, J. Acoust. Soc. Am., 2009, 125, pp. EL8–EL12.
125. 125)
  - 128. Jang, J., Lin, S.: ‘Optimisation of Viterbi beam search in speech recognition’. Int. Symp. Chinese Spoken Language Processing, 2002.
126. 126)
  - 74. Hermansky, H., Ellis, D.P.W., Sharma, S.: ‘Tandem connectionist feature extraction for conventional HMM systems’. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2000 (ICASSP’00), Istanbul, Turkey, 2000, pp. 1635–1638.
127. 127)
  - 122. Kotwal, M.R.A., Hassan, F., Muhammad, G., Huda, M.N.: ‘Tandem MLNs based phonetic feature extraction for phoneme recognition’, Int. J. Comput. Inf. Syst. Ind. Manag. Appl., 2011, 3, pp. 88–95.
128. 128)
  - 12. Gamulkiewicz, B., Weeks, M.: ‘Wavelet based speech recognition’. 2003 IEEE 46th Midwest Symp. Circuits and Systems, Cairo, 2003, pp. 678–681.
129. 129)
  - 64. Ben Messaoud, Z., Ben Hamida, A.: ‘Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition’, Int. J. Speech Technol., 2011, 14, pp. 393–403.
130. 130)
  - 40. Milone, D.H., Di Persia, L.E.: ‘Learning hidden Markov models with hidden Markov trees as observation distributions’. Ninth Argentine Symp. Artificial Intelligence (ASAI 2007), Mar del Plata, Argentina, 2007, pp. 13–22.
131. 131)
  - 21. Lasserre, J., Bishop, C.M.: ‘Generative or Discriminative? Getting the best of both worlds’. Bayesian Statistics, vol. 8. Microsoft Research, 2007.
132. 132)
  - 112. Tsujinishi, D., Koshiba, Y., Abe, S.: ‘Why pairwise is better than one-against-all or all-at-once’. Proc. 2004 IEEE Int. Joint Conf. on Neural Networks, 2004, 2004.
133. 133)
  - 35. Mallat, S.G.: ‘A theory for multiresolution signal decomposition: the wavelet representation’, IEEE Trans. Pattern Anal. Mach. Intell., 1989, 11, (7), pp. 674–693.
134. 134)
  - 59. Veisi, H., Sameti, H.: ‘The integration of principal component analysis and cepstral mean subtraction in parallel model combination for robust speech recognition’, Digit. Signal Process., 2011, 21, (1), pp. 36–53.
135. 135)
  - 89. Campos, M.M., Carpenter, G.A.: ‘WSOM: building adaptive wavelets with self-organizing maps’. IEEE World Congress on Computational Intelligence. The 1998 IEEE Int. Joint Conf. on Neural Networks Proc., 1998., Anchorage, AK, USA, 1998, pp. 763–767.
136. 136)
  - 110. Clarkson, P., Moreno, P.J.: ‘On the use of support vector machines for phonetic classification’. Proc. 1999 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 1999, Phoenix, AZ, USA, 1999, pp. 585–588.
137. 137)
  - 30. Hung, J.-W., Fan, H.-T.: ‘Subband feature statistics normalisation techniques based on a discrete wavelet transform for robust speech recognition’, IEEE Signal Process. Lett., 2009, 16, (9), pp. 806–809.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Comparative study of automatic speech recognition techniques

References

Related content