http://iet.metastore.ingenta.com
1887

Comparative study of automatic speech recognition techniques

Comparative study of automatic speech recognition techniques

For access to this article, please select a purchase option:

Buy article PDF
$19.95
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Signal Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Over the past decades, extensive research has been carried out on various possible implementations of automatic speech recognition (ASR) systems. The most renowned algorithms in the field of ASR are the mel-frequency cepstral coefficients and the hidden Markov models. However, there are also other methods, such as wavelet-based transforms, artificial neural networks and support vector machines, which are becoming more popular. This review article presents a comparative study on different approaches that were proposed for the task of ASR, and which are widely used nowadays.

References

    1. 1)
      • 1. Vimal Krishnan, V.R., Babu Anto, P.: ‘Feature parameter extraction from wavelet subband analysis for the recognition of isolated malayalam spoken words’, Int. J. Comput. Netw. Secur., 2009, 1, (1), pp. 5255.
    2. 2)
      • 2. Hennebert, J., Hasler, M., Dedieu, H.: ‘Neural networks in speech recognition’. Sixth Microcomputer School, Prague, Czech Republic, 1994, pp. 2340.
    3. 3)
      • 3. Forsberg, M: ‘Why is speech recognition difficult?’, Chalmers University of Technology, 2003, http://www.speech.kth.se/~rolf/gslt_papers/MarkusForsberg.pdf.
    4. 4)
      • 4. O'Shaughnessy, D.: ‘Invited paper: automatic speech recognition: history, methods and challenges’, Pattern Recognit., 2008, 41, (10), pp. 29652979.
    5. 5)
      • 5. Ranjan, S.: ‘A discrete wavelet transform based approach to Hindi speech recognition’. Int. Conf. on Signal Acquisition and Processing, 2010 (ICSAP’10), Bangalore, 2010, pp. 345348.
    6. 6)
      • 6. Junior, S.B., Guido, R.C., Chen, S., Vieira, L.S., Sanchez, F.L.: ‘Improved dynamic time warping based on the discrete wavelet transform’. Ninth IEEE Int. Symp. Multimedia Workshops, 2007 (ISMW’07), Taichung, Taiwan, pp. 256263.
    7. 7)
      • 7. Vimala, C., Radha, V.: ‘A review on speech recognition challenges and approaches’, World  Comput. Sci. Inf. Technol., 2012, 2, (1), pp. 17.
    8. 8)
      • 8. Anusuya, M., Katti, S.: ‘Front end analysis of speech recognition: a review’, Int. J. Speech Technol., 2011, 14, (2), pp. 99145.
    9. 9)
      • 9. Morgan, N.: ‘Deep and wide: multiple layers in automatic speech recognition’, IEEE Trans Audio Speech Lang. Process., 2012, 20, (1), pp. 713.
    10. 10)
      • 10. O'Shaugnessy, D.: ‘Interacting with computers by voice: automatic speech recognition and synthesis’, Proc. IEEE, 2003, 91, (9), pp. 12721305.
    11. 11)
      • 11. Rabiner, L.R., Schafer, R.W.: ‘Digital processing of speech signals’ (Prentice-Hall, 1978).
    12. 12)
      • 12. Gamulkiewicz, B., Weeks, M.: ‘Wavelet based speech recognition’. 2003 IEEE 46th Midwest Symp. Circuits and Systems, Cairo, 2003, pp. 678681.
    13. 13)
      • 13. Mporas, I., Ganchev, T., Siafarikas, M., Fakotakis, N.: ‘Comparison of speech features on the speech recognition task’, J. Comput. Sci., 2007, 3, (8), pp. 608616.
    14. 14)
      • 14. Saha, G., Chakraborty, S., Senapati, S.: ‘A new silence removal and endpoint detection algorithm for speech and speaker recognition applications’. Proc. NCC 2005, 2005.
    15. 15)
      • 15. Zamani, B., Akbari, A., Nasersharif, B., Jalalvand, A.: ‘Optimised discriminative transformations for speech features based on minimum classification error’, Pattern Recognit. Lett., 2011, 32, (7), pp. 948955.
    16. 16)
      • 16. Vimal Krishnan, V.R., Babu Anto, P.: ‘Features of wavelet packet decomposition and discrete wavelet transform for malayalam speech recognition’, Recent Trends  Eng., 2009, 1, (2), pp. 9396.
    17. 17)
      • 17. Alkhaldi, W., Fakhr, W., Hamdy, N.: ‘Automatic speech recognition in noisy environments using wavelet transform’. 2002. Available from: http://www.wseas.us/e-library/conferences/skiathos2002/papers/447-231.pdf.
    18. 18)
      • 18. Jurafsky, D., Martin, J.H.: ‘Speech and language processing’ (Prentice-Hall, 2009).
    19. 19)
      • 19. Liddy, E.D.: ‘Natural language processing in encyclopedia of library and information science’ (Marcel Decker, Inc., NY, 2001, 2nd edn.).
    20. 20)
      • 20. Leung, K.F., Leung, F.H.F., Lam, H.K., Tam, P.K.S.: ‘Recognition of speech commands using a modified neural fuzzy network and an improved GA’. 12th IEEE Int. Conf. on Fuzzy Systems, 2003, (FUZZ'03), Kowloon, China, 2003, pp. 190195.
    21. 21)
      • 21. Lasserre, J., Bishop, C.M.: ‘Generative or Discriminative? Getting the best of both worlds’. Bayesian Statistics, vol. 8. Microsoft Research, 2007.
    22. 22)
      • 22. Du, X.P., He, P.L.: ‘The clustering solution of speech recognition models with SOM’. Lecture Notes in Computer Science. Advances in Neural Networks – ISNN 2006 (SpringerBerlin/Heidelberg, 2006), pp. 150157.
    23. 23)
      • 23. Ben Messaoud, Z., Ben Hamida, A.: ‘CDHMM parameters selection for speaker-independent phone recognition in continuous speech system’. MELECON 2010 – 2010 15th IEEE Mediterranean Electrotechnical Conf., Valletta, 2010, pp. 253258.
    24. 24)
      • 24. Korba, M.C.A., Messadeg, D., Djemili, R.H.B.: ‘Robust speech recognition using perceptual wavelet denoising and mel-frequency product spectrum cepstral coefficient features’, Informatica, 2008, 32, pp. 283288.
    25. 25)
      • 25. Nouza, J., Zdansky, J., Cerva, P.: ‘System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search’. MELECON 2010 – 2010 15th IEEE Mediterranean Electrotechnical Conf., Valletta, 2010, pp. 202205.
    26. 26)
      • 26. Toth, L.: ‘A hierarchical, context-dependent neural network architecture for improved phone recognition’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2011, Prague, 2011, pp. 50405043.
    27. 27)
      • 27. Muller, D.N., de Siqueira, M.L., Navaux, P.O.A.: ‘A connectionist approach to speech understanding’. Int. Joint Conf. on Neural Networks, 2006 (IJCNN’06), Vancouver, BC, 2006, pp. 37903797.
    28. 28)
      • 28. Smaragdis, P., Radhakrishnan, R., Wilson, K.W.: ‘Content extraction through audio signal analysis’, in Divakaran, A., (Ed.): ‘Multimedia content analysis’ (Springer, 2009), pp. 134.
    29. 29)
      • 29. Wicks, M.A.: ‘The mel frequency scale and coefficients’. 1998. Available from: http://kom.aau.dk/group/04gr742/pdf/MFCC_worksheet.pdf.
    30. 30)
      • 30. Hung, J.-W., Fan, H.-T.: ‘Subband feature statistics normalisation techniques based on a discrete wavelet transform for robust speech recognition’, IEEE Signal Process. Lett., 2009, 16, (9), pp. 806809.
    31. 31)
      • 31. Gupta, M., Gilbert, A.: ‘Robust speech recognition using wavelet coefficient features’. IEEE Workshop on Automatic Speech Recognition and Understanding, 2001 (ASRU ’01), 2001, pp. 445448.
    32. 32)
      • 32. Xuefei, L.: ‘A new wavelet threshold denoising algorithm in speech recognition’. Asia-Pacific Conf. on Information Processing, 2009 (APCIP 2009), Shenzhen, 2009, pp. 310313.
    33. 33)
      • 33. Nehe, N.S., Holambe, R.S.: ‘New feature extraction techniques for Marathi digit recognition’, Int. J. Recent Trends  Eng., 2009, 2, (2), pp. 2224.
    34. 34)
      • 34. Polikar, R.: ‘The wavelet tutorial’. 1996. Available from: http://users.rowan.edu/~polikar/wavelets/wttutorial.html.
    35. 35)
      • 35. Mallat, S.G.: ‘A theory for multiresolution signal decomposition: the wavelet representation’, IEEE Trans. Pattern Anal. Mach. Intell., 1989, 11, (7), pp. 674693.
    36. 36)
      • 36. Sudhakar, : ‘The discrete wavelet transform’. 2003. Available from: http://etd.lib.fsu.edu/theses/available/etd-11242003-185039/unrestricted/09_ds_chapter2.pdf.
    37. 37)
      • 37. Vetterli, M., Herley, C.: ‘Wavelets and filter banks: relationships and new results’. 1990 Int. Conf. on Acoustics, Speech, and Signal Processing, 1990 (ICASSP'90), Albuquerque, NM, USA, 1990, pp. 17231726.
    38. 38)
      • 38. Hunt, A., Favero, R.: ‘Using principal component analysis with wavelets in speech recognition’. SST Conf., ASSTA Inc., Perth, 1994, pp. 296301.
    39. 39)
      • 39. Walker, S.L., Foo, S.Y.: ‘Optimal wavelets for speech signal representations’, Syst. Cybern. Inf., 2003, 1, (4), pp. 4446.
    40. 40)
      • 40. Milone, D.H., Di Persia, L.E.: ‘Learning hidden Markov models with hidden Markov trees as observation distributions’. Ninth Argentine Symp. Artificial Intelligence (ASAI 2007), Mar del Plata, Argentina, 2007, pp. 1322.
    41. 41)
      • 41. Tavanaei, A., Manzuri, M.T., Sameti, H.: ‘Mel-scaled discrete wavelet transform and dynamic features for the Persian phoneme recognition’. Int. Symp. Artificial Intelligence and Signal Processing (AISP), 2011, Tehran, 2011, pp. 138140.
    42. 42)
      • 42. Krishnan, M., Neophytou, C.P., Prescott, G.: ‘Wavelet transform speech recognition using vector quantisation, dynamic time warping and artificial neural networks’, Computer Aided Systems Engineering and Telecommunications & Information Science Laboratory, 1994.
    43. 43)
      • 43. Tan, B.T., Fu, M., Spray, A., Dermody, P.: ‘The use of wavelet transforms in phoneme recognition’. Proc. Fourth Int. Conf. on Spoken Language, 1996 (ICSLP’96), Philadelphia, PA, USA, 1996, pp. 24312434.
    44. 44)
      • 44. Modic, R., Lindberg, B., Petek, B.: ‘Comparative wavelet and MFCC speech recognition experiments on the Slovenian and English SpeechDat2’. Proc. ISCA Tutorial and Research Workshop on Non-Linear Speech Processing, Denmark, 2003.
    45. 45)
      • 45. Zhou, P., Tang, L.Z., Xu, D.F.: ‘Speech recognition algorithm of parallel subband HMM based on wavelet analysis and neural network’, Inf. Technol. J., 2009, 8, pp. 796800.
    46. 46)
      • 46. Tufekci, Z., Gowdy, J.N., Gurbuz, S., Patterson, E.: ‘Applied mel-frequency discrete wavelet coefficients and parallel model compensation for noise-robust speech recognition’, Speech Commun. Sci. Direct, 2006, 48, pp. 12941307.
    47. 47)
      • 47. Thiang, , Wijoyo, S.: ‘Speech recognition using linear predictive coding and artificial neural network for controlling movement of mobile robot’. Int. Conf. on Information and Electronics Engineering, Singapore, 2011, pp. 179183.
    48. 48)
      • 48. Bradbury, J.: ‘Linear predictive coding’. 2000. Available from: http://my.fit.edu/~vKepuska/ece5525/lpc_paper.pdf.
    49. 49)
      • 49. Nataraj, K.S., Jagbandhu, J., Pandey, P.C., Shah, M.S.: ‘Improving the consistency of vocal tract shape estimation’. National Conf. on Communications (NCC), 2011, Bangalore, 2011, pp. 15.
    50. 50)
      • 50. Cheng, O., Abdulla, W., Salcic, Z.: ‘Performance evaluation of front-end processing for speech recognition systems’. School of Engineering Report. The University of Auckland, Electrical and Computer Engineering, 2005. Report No. 621..
    51. 51)
      • 51. Venkateswarlu, R.L.K., Kumari, R.V.: ‘Novel approach for speech recgonition by using Self-Organised Maps’. 2011 Int. Conf. on Emerging Trends in Networks and Computer Communications (ETNCC), Udaipur, 2011, pp. 215222.
    52. 52)
      • 52. Li, T.F., Chang, S.C.: ‘Speech recognition of mandarin syllables using both linear predict coding cepstra and Mel frequency cepstra’. Proc. 19th Conf. on Computational Linguistics and Speech Processing, Taiwan, 2007.
    53. 53)
      • 53. Yusof, Z., Ahmed, M.: ‘2009. Available from: http://rps.bmi.unikl.edu.my/jnp/archive/2009/2009-197.pdf.
    54. 54)
      • 54. Ganapathy, S., Thomas, S., Hermansky, H.: ‘Modulation frequency features for phoneme recognition in noisy speech’, J. Acoust. Soc. Am., 2009, 125, pp. EL8EL12.
    55. 55)
      • 55. Sarosi, G., Mozsary, M., Mihajlik, P., Fegyo, T.: ‘Comparison of feature extraction methods for speech recognition in noise-free and in traffic noise environment’. Sixth Conf. on Speech Technology and Human-Computer Dialogue (SpeD), 2011, Brasov, 2011, pp. 18.
    56. 56)
      • 56. Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: ‘RASTA-PLP speech analysis’. ICSI Technology Report. Internation Computer Science Institute, Berkeley, CA, 1991. Report No.: TR-91-069.
    57. 57)
      • 57. Anusuya, M.A., Katti, S.K.: ‘Comparison of different speech feature extraction techniques with and without wavelet transform to Kannada speech recognition’, Int. J. Comput. Appl., 2011, 26, (4), pp. 1923.
    58. 58)
      • 58. Hu, X., Zhan, L., Xue, Y., Zhou, W., Zhang, L.: ‘Spoken arabic digits recognition based on wavelet neural networks’. 2011 IEEE Int. Conf. on Systems, Man and Cybernetics (SMC), Anchorage, AK, 2011, pp. 14811485.
    59. 59)
      • 59. Veisi, H., Sameti, H.: ‘The integration of principal component analysis and cepstral mean subtraction in parallel model combination for robust speech recognition’, Digit. Signal Process., 2011, 21, (1), pp. 3653.
    60. 60)
      • 60. Lee, J.Y., Hung, J.: ‘Exploiting principal component analysis in modulation spectrum enhancement for robust speech recognition’. 2011 Eighth Int. Conf. on Fuzzy Systems and Knowledge Discovery (FSKD), Shanghai, 2011, pp. 19471951.
    61. 61)
      • 61. Takiguchi, T., Ariki, Y.: ‘PCA-based speech enhancement for distorted speech recognition’, J. Multimedia, 2007, 2, pp. 1318.
    62. 62)
      • 62. Viszlay, P., Juhaar, J., Pleva, M.: ‘Alternative phonetic class definition in linear discriminant analysis of speech’. 19th Int. Conf. on Systems, Signals and Image Processing (IWSSIP), 2012, Vienna, 2012, pp. 655658.
    63. 63)
      • 63. Garau, G., Renals, S.: ‘Combining spectral representations for large vocabulary continuous speech recognition’, IEEE Trans. Audio Speech Language Process., 2008, 16, (3), pp. 508518.
    64. 64)
      • 64. Ben Messaoud, Z., Ben Hamida, A.: ‘Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition’, Int. J. Speech Technol., 2011, 14, pp. 393403.
    65. 65)
      • 65. Paulson, L.D.: ‘Speech recognition moves from software to hardware’, Computer, 2006, 39, (11), pp. 1518.
    66. 66)
      • 66. Lazli, L., Sellami, M.: ‘Connectionist probability estimators in HMM Arabic speech recognition using fuzzy logic’. Proc. MLDM, 2003, pp. 379388.
    67. 67)
      • 67. Birkenes, Ø., Matsui, T., Tanabe, K., Siniscalchi, S.M., Myrvoll, T.A., Johnsen, M.H.: ‘Penalised logistic regression with HMM log-likelihood regressors for speech recognition’, IEEE Trans. Audio Speech Language Process., 2010, 18, (6), pp. 14401454.
    68. 68)
      • 68. Juang, B.H., Rabiner, L.R.: ‘Hidden Markov models for speech recognition’, Technometrics, 1991, 33, (3), pp. 251272.
    69. 69)
      • 69. Nguyen, P., Heigold, G., Zweig, G.: ‘Speech recognition with flat direct models’, Sel. Topics Signal Process. IEEE J., 2010, 4, (6), pp. 9941006.
    70. 70)
      • 70. Abdulla, W.H., Kasabov, N.: ‘The concepts of hidden Markov model in speech recognition’ (University of Otago, 1999).
    71. 71)
      • 71. Rabiner, L.: ‘A tutorial on HMM and selected applications in speech recognition’, Proc. IEEE, 1989, 77, (2), pp. 257286.
    72. 72)
      • 72. Lee, K.F.H.H.W.: ‘Speaker-independent phone recognition using hidden Markov models’, IEEE Trans. Acoust. Speech Signal Process., 1989, 37, (11), pp. 16411648.
    73. 73)
      • 73. Ketabdar, H., Bourlard, H.: ‘Enhanced phone posteriors for improving speech recognition systems’, IEEE Trans. Audio Speech Lang. Process., 2010, 18, (6), pp. 10941106.
    74. 74)
      • 74. Hermansky, H., Ellis, D.P.W., Sharma, S.: ‘Tandem connectionist feature extraction for conventional HMM systems’. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2000 (ICASSP’00), Istanbul, Turkey, 2000, pp. 16351638.
    75. 75)
      • 75. Crouse, M.S., Nowak, R.D., Baraniuk, R.G.: ‘Wavelet-based statistical signal processing using hidden Markov models’, IEEE Trans. Signal Process., 1998, 46, (4), pp. 886902.
    76. 76)
      • 76. Jung, S., Son, J., Bae, K.: ‘Feature extraction based on wavelet domain hidden Markov tree model for robust speech recognition’. AI 2004: Advances in Artificial Intelligence, (Springer, Berlin/Heidelberg, 2004), pp. 11541159.
    77. 77)
      • 77. Chang, T.H., Luo, Z.Q., Deng, L., Chi, C.Y.: ‘A convex optimisation method for joint mean and variance parameter estimation of large-margin CDHMM’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2008 (ICASSP 2008), Las Vegas, NV, 2008, pp. 40534056.
    78. 78)
      • 78. Young, S., Evermann, G.M.G., Hain, T., Kershaw, D.: ‘HTK - Hidden Markov Model Toolkit (Ver 3.4)’. 2006. Available from: http://htk.eng.cam.ac.uk/.
    79. 79)
      • 79. Jiang, H., Li, X., Liu, C.: ‘Large margin hidden Markov models for speech recognition’, IEEE Trans. Audio Speech Language Process., 2006, 14, (5), pp. 15841595.
    80. 80)
      • 80. Sha, F., Saul, L.K.: ‘Large margin hidden Markov models for automatic speech recognition’, Adv. Neural Inf. Process. Syst., 2007, 1, pp. 12491256.
    81. 81)
      • 81. Chen, J.C., Chien, J.T.: ‘Bayesian large margin hidden Markov models for speech recognition’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2009 (ICASSP 2009), Taipei, 2009, pp. 37653768.
    82. 82)
      • 82. Trentin, E., Gori, M.: ‘Robust combination of neural networks and hidden Markov models for speech recognition’, IEEE Trans. Neural Netw., 2003, 14, (6), pp. 15191531.
    83. 83)
      • 83. Sivaram, G.S.V.S., Hermansky, H.: ‘Multilayer perceptron with sparse hidden outputs for phoneme recognition’. 2011 IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP), Prague, 2011, pp. 53365339.
    84. 84)
      • 84. Ahad, A., Fayyaz, A., Mehmood, T.: ‘Speech recognition using multilayer perceptron’. IEEE Proc. Students Conf., 2002 (ISCON’02), 2002, pp. 103109.
    85. 85)
      • 85. Pour, M.M., Farokhi, F.: ‘A new approach for Persian speech recognition’. IEEE Int. Advance Computing Conf., 2009 (IACC 2009), Patiala, 2009, pp. 153158.
    86. 86)
      • 86. Sivaram, G.S.V.S., Hermansky, H.: ‘Sparse multilayer perceptron for phoneme recognition’, IEEE Trans. Audio, Speech Lang. Process., 2012, 20, (1), pp. 2329.
    87. 87)
      • 87. Cutajar, M., Gatt, E., Micallef, J., Grech, I., Casha, O.: ‘Digital hardware implementation of Self-Organising Maps’. 15th IEEE Mediterranean Electrotechnical Conf. MELECON 2010 – 2010, Valletta, 2010, pp. 11231128.
    88. 88)
      • 88. Cutajar, M., Gatt, E.: ‘Digital implementation of Self-Organising Maps. Final year project, Faculty of Engineering, Department of Microelectronics Engineering, Malta, 2009.
    89. 89)
      • 89. Campos, M.M., Carpenter, G.A.: ‘WSOM: building adaptive wavelets with self-organizing maps’. IEEE World Congress on Computational Intelligence. The 1998 IEEE Int. Joint Conf. on Neural Networks Proc., 1998., Anchorage, AK, USA, 1998, pp. 763767.
    90. 90)
      • 90. Paul, A.K., Das, D., Kamal, M.M.: ‘Bangla speech recognition system using LPC and ANN’. Seventh Int. Conf. on Advances in Pattern Recognition, 2009 (ICAPR’09), Kolkata, 2009, pp. 171174.
    91. 91)
      • 91. Venkateswarlu, R.L.K., Kumari, R.V., Jayasri, G.V.: ‘Speech recognition using radial basis function neural network’. Third Int. Conf. on Electronics Computer Technology (ICECT), 2011, Kanyakumari, 2011, pp. 441445.
    92. 92)
      • 92. Umarani, S.D., Raviram, P., Wahidabanu, R.S.D.: ‘Implementation of HMM and radial basis function for speech recognition’. Int. Conf. on Intelligent Agent and Multi-Agent Systems, 2009 (IAMA 2009), Chennai, 2009, pp. 14.
    93. 93)
      • 93. Hou, X.: ‘Noise robust speech recognition based on wavelet-RBF neural network’. Poc. SPIE, 2009, vol. 7490.
    94. 94)
      • 94. Veera, A.K.: ‘Speech recognition based on artificial neural networks’. 2004. Available from: http://www.cis.hut.fi/Opinnot/T-61.6040/pellom-2004/project-reports/project_07.pdf.
    95. 95)
      • 95. Koizumi, T., Mori, M., Taniguchi, S., Maruya, M.: ‘Recurrent neural networks for phoneme recognition’. Proc. Fourth Int. Conf. on Spoken Language, 1996 (ICSLP’96), Philadelphia, 1996, pp. 326329.
    96. 96)
      • 96. Vinyals, O., Ravuri, S.V., Povey, D.: ‘Revisiting recurrent neural networks for robust ASR’. 2012 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2012, Kyoto, pp. 40854088.
    97. 97)
      • 97. Uma Maheswari, N., Kabilan, A.P., Venkatesh, R.: ‘Speaker independent phoneme recognition using neural networks’, J. Theoret. Appl. Inf. Technol., 2009, 6(2), pp. 230235.
    98. 98)
      • 98. Helmi, N., Helmi, B.H.: ‘Speech recognition with fuzzy neural network for discrete words’. 2008 Fourth Int. Conf. on Natural Computation, 2008, pp. 265269.
    99. 99)
      • 99. Sabah, R., Aino, R.N.: ‘Isolated digit speech recognition in Malay language using neuro-fuzzy approach’. 2009 Third Asia Int. Conf. on Modelling and Simulation, 2009, pp. 336340.
    100. 100)
      • 100. Tang, H., Meng, C.H., Lee, L.S.: ‘An initial attempt for phoneme recognition using Structured Support Vector Machine (SVM)’. 2010 IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP), Dallas, TX, 2010, pp. 49264929.
    101. 101)
      • 101. Kruger, S.E., Schaffoner, M., Katz, M., Andelic, E., Wendemuth, A.: ‘Speech recognition with support vector machines in a hybrid system’. Proc. EuroSpeech 2005, 2005.
    102. 102)
      • 102. Sonkamble, B.A., Doye, D.D., Sonkamble, S.: ‘An efficient use of support vector machines for speech signal signal classification’. Proc. Eighth WSEAS Int. Conf. Computational Intelligence, Man–Mmachine Systems and Cybernetics, 2009, pp. 117120.
    103. 103)
      • 103. Solera-Urena, R., Padrell-Sendra, J., Martin-Iglesias, D., Gallardo-Antolin, A., Pelaez-Moreno, C., Diaz-De-Maria, F.: ‘SVMs for automatic speech recognition: a survey’, Progress in nonlinear speech processing (Springer-Verlag, Berlin, Heidelberg, 2007), pp. 190216.
    104. 104)
      • 104. Haykin, S.: ‘Neural networks: a comprehensive foundation’ (Prentice-Hall, 1999).
    105. 105)
      • 105. Weston, J., Watkins, C.: ‘Support vector machines for multiclass pattern recognition’. Proc. Seventh European Symp. Artificial Neural Networks, 1999, pp. 219224.
    106. 106)
      • 106. Franc, V., Hlavac, V.: ‘Multi-class support vector machine’. Proc. ICPR, Quebec, 2002, pp. 236239.
    107. 107)
      • 107. Hsu, C.W., Lin, C.J.: ‘A comparison of methods for multiclass support vector machines’, IEEE Trans. Neural Netw., 2002, 13, (2), pp. 415425.
    108. 108)
      • 108. Duan, K., Keerthi, S.S.: ‘Which is the best multiclass SVM method? an empirical study’. Proc. Multiplie Classifier Systems, 2005, pp. 278285.
    109. 109)
      • 109. Hastie, T., Tibshirani, R.: ‘Classification by pairwise coupling’, Annal. Stat., 1998, 26, (2), pp. 451471.
    110. 110)
      • 110. Clarkson, P., Moreno, P.J.: ‘On the use of support vector machines for phonetic classification’. Proc. 1999 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 1999, Phoenix, AZ, USA, 1999, pp. 585588.
    111. 111)
      • 111. Abe, S.: ‘Analysis of multiclass support vector machines’. Proc. Int. Conf. on Computational Intelligence for Modelling, Control and Automation (CIMCA 2003), Vienna, Austria, 2003, pp. 385396.
    112. 112)
      • 112. Tsujinishi, D., Koshiba, Y., Abe, S.: ‘Why pairwise is better than one-against-all or all-at-once’. Proc. 2004 IEEE Int. Joint Conf. on Neural Networks, 2004, 2004.
    113. 113)
      • 113. Venkataramani, V., Chakrabartty, S., Byrne, W.: ‘Ginisupport vector machines for segmental minimum Bayes risk decoding of continuous speech’, Comput. Speech Lang., 2007, 21, (3), pp. 423442.
    114. 114)
      • 114. Ganapathiraju, A., Hamaker, J.E., Picone, J.: ‘Applications of support vector machines to speech recognition’, IEEE Trans. Signal Process., 2004, 52, (8), pp. 23482355.
    115. 115)
      • 115. Xiao-feng, L., Xue-ying, Z., Ji-kang, D.: ‘Speech recognition based on support vector machine and error correcting output codes’. 2010 First Int. Conf. on Pervasive Computing Signal Processing and Applications (PCSPA), Harbin, 2010, pp. 336339.
    116. 116)
      • 116. Thubthong, N., Kijsirikul, B.: ‘Support vector machines for Thai phoneme recognition’, Int. J. Uncertain. Fuzziness  Knowl.-Based Syst., 2001, 9, (6), pp. 803813.
    117. 117)
      • 117. Li, J.: ‘An empirical comparison between SVMs and ANNs for speech recognition’. The First Instructional Conf. on Machine Learning, iCML-2003, 2003.
    118. 118)
      • 118. Toth, L., Kocsor, A.: ‘Application of kernel-based feature space transformations and learning methods to phoneme classification’, Appl. Intell., 2004, 21, (2), pp. 129142.
    119. 119)
      • 119. García Moral, A.I., Solera Ureña, R., Peláez-Moreno, C., Díaz-de-María, F.: ‘Hybrid models for automatic speech recognition: a comparison of classical ANN and kernel based methods’. (Springer, 2007, LNCS), pp. 5154.
    120. 120)
      • 120. Jamieson, K., Gupta, M.R., Swanson, E., Anderson, H.S.: ‘Training a support vector machine to classify signals in a real environment given clean training data’. 2010 IEEE Int. Conf. on Acoustics Speech and Signal Processing (ICASSP), Dallas, TX, 2010, pp. 22142217.
    121. 121)
      • 121. Dowding, J.: ‘Reducing search by partitioning the word network’. Proc. Workshop on Speech and Natural Language, 1989.
    122. 122)
      • 122. Kotwal, M.R.A., Hassan, F., Muhammad, G., Huda, M.N.: ‘Tandem MLNs based phonetic feature extraction for phoneme recognition’, Int. J. Comput. Inf. Syst. Ind. Manag. Appl., 2011, 3, pp. 8895.
    123. 123)
      • 123. Deshmukh, N., Picone, J.: ‘Methodologies for language modeling and search in continuous speech recognition’. Proc. IEEE Southeastcon’95. Visualize the Future, Raleigh, NC, 1995, pp. 192198.
    124. 124)
      • 124. Rosenfeld, R.: ‘Two decades of statistical language modeling: where do we go from here?’, Proc. IEEE, 2000, 88, (8), pp. 12701278.
    125. 125)
      • 125. Lecorvé, G., Gravier, G., Sébillot, P.: ‘Automatically finding semantically consistent n-grams to add new words in LVCSR systems’. Proc. ICASSP 2011, 2011, pp. 46764679.
    126. 126)
      • 126. Illina, I., Gong, Y.: ‘Improvement in N-best search for continuous speech recognition’. Proc. Fourth Int. Conf. on Spoken Language, 1996 (ICSLP’96), 1996, pp. 21472150.
    127. 127)
      • 127. Zhao, Y., Wakita, H., Zhuang, X.: ‘An HMM based speaker-independent continuous speech recognition system with experiments on the TIMIT database’. 1991 Int. Conf. on Acoustics Speech and Signal Processing, 1991 (ICASSP-91), Toronto, ON, 1991, pp. 333336.
    128. 128)
      • 128. Jang, J., Lin, S.: ‘Optimisation of Viterbi beam search in speech recognition’. Int. Symp. Chinese Spoken Language Processing, 2002.
    129. 129)
      • 129. Wei, L., Weisheng, H.: ‘Improved Viterbi algorithm in continuous speech recognition’. 2010 Int. Conf. on Computer Application and System Modeling (ICCASM), Taiyuan, 2010, pp. 207209.
    130. 130)
      • 130. Kesarkar, M.P.: ‘Feature extraction for speech recognition’. M.Tech. Credit Seminar Report. Electronic Systems Group, EE. Department, IIT, Bombay, 2003.
    131. 131)
      • 131. Thatphithakkul, N., Kruatrachue, B., Wutiwiwatchai, C., Marukatat, S., Boonpiam, V.: ‘Robust speech recognition using pca-based noise classification’. SPECOM, 2005 October, p. 2548.
    132. 132)
      • 132. Dengfeng, K., Shuang, X., Bo, X.: ‘Optimization of tone recognition via applying linear discriminant analysis in feature extraction’. 2008 Third Int. Conf. on Innovative Computing Information and Control (ICICIC), Dalian, Liaoning China, 2008, pp. 528531.
    133. 133)
      • 133. Fontaine, V., Ris, C., Leich, H.: ‘Nonlinear discriminant analysis with neural networks for speech recognition’. Proc. EUSIPCO 96, EURASIP1996, pp. 15831586.
    134. 134)
      • 134. Sayers, C.: ‘Self Organising Feature Maps and their Applications to Robotics’. Technical Reports (CIS). Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, 1991. Report No.: MS-CIS-91-46.
    135. 135)
      • 135. Hao, Y., Tiantian, X., Paszczynski, S., Wilamowski, B.M.: ‘Advantages of radial basis function networks for dynamic system design’, IEEE Trans. Ind. Electron., 2011, 58, (12), pp. 54385450.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2012.0151
Loading

Related content

content/journals/10.1049/iet-spr.2012.0151
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address