Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon openaccess Speaker identification using multimodal neural networks and wavelet analysis

The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem of identifying a speaker from its voice regardless of the content. In this study, the authors designed and implemented a novel text-independent multimodal speaker identification system based on wavelet analysis and neural networks. Wavelet analysis comprises discrete wavelet transform, wavelet packet transform, wavelet sub-band coding and Mel-frequency cepstral coefficients (MFCCs). The learning module comprises general regressive, probabilistic and radial basis function neural networks, forming decisions through a majority voting scheme. The system was found to be competitive and it improved the identification rate by 15% as compared with the classical MFCC. In addition, it reduced the identification time by 40% as compared with the back-propagation neural network, Gaussian mixture model and principal component analysis. Performance tests conducted using the GRID database corpora have shown that this approach has faster identification time and greater accuracy compared with traditional approaches, and it is applicable to real-time, text-independent speaker identification systems.

References

    1. 1)
      • 32. Wilpon, J.G., Lee, C.H., Rabiner, L.R.: ‘Improvements in connected digit recognition using higher order spectral and energy features’. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Toronto, Canada, 1991.
    2. 2)
    3. 3)
      • 20. Mallat, S.: ‘A wavelet tour of signal processing’ (Elsevier, UK, 1999).
    4. 4)
      • 3. Kinsner, W., Peters, D.: ‘A speech recognition system using linear predictive coding and dynamic time warping’. Proc. Annual Int. Conf. IEE, Engineering in Medicine & Biology Society, New Orleans, LA, 4–7 November 2006, no. 3, pp. 10701071.
    5. 5)
    6. 6)
      • 24. Deshpande, M.S., Holambe, R.S.: ‘Speaker identification using admissible wavelet packet based decomposition’, Int. J. Inf. Commun. Eng., 2011, 6, (1), pp. 2023.
    7. 7)
      • 30. Ye, J.: ‘Speech recognition using time domain features from phase space reconstructions’. PhD thesis.Marquette University Milwaukee, Wisconsin, 2004.
    8. 8)
      • 40. Morris, A., Bloothooft, G., Barry, W., Andreeva, B., Koreman, J.C.: ‘Human and machine identification of consonantal place of articulation from vocalic transition segments’. EUROSPEECH, 1997.
    9. 9)
      • 46. Saeidi, R., Mowlaee, P., Kinnunen, T., Tan, Z., Christensen, M., Jensen, H., Franti, P.: ‘Signal-to-signal ratio independent speaker identification for co-channel speech signals’. Proc. IEEE Int. Conf. Pattern Recognition, 2010, pp. 45454548.
    10. 10)
      • 4. Benesty, J., Sondhi, M., Huang, Y.: ‘Springer handbook of speech processing’ (Springer, 2007).
    11. 11)
    12. 12)
    13. 13)
      • 28. Amrouche, A., Rouvaen, J.: ‘Efficient system for speech recognition using general regression neural network’, Int. J. Intell. Technol., 2006, 1, (2), pp. 183189.
    14. 14)
      • 22. Vetterli, M., Kovacevic, J.: ‘Wavelets and subband coding’ (Prentice-Hall, New Jersey, 1995).
    15. 15)
      • 37. Holmes, W., Speech synthesis and recognition, (CRC Press, UK, 2001).
    16. 16)
    17. 17)
      • 11. Revada, L.K.V., Rambatla, V.K., Ande, K.V.N.: ‘A novel approach to speech recognition by using generalised regression neural networks’, IJCSI Int. J. Comput. Sci. Issues, 2011, 1, pp. 483489.
    18. 18)
    19. 19)
      • 6. Suvarna Kumar, G., Prasad Raju, K.A., Rao, M., et al: ‘Speaker recognition using GMM’, Int. J. Eng. Sci. Technol., 2010, 2, (6), pp. 24282436.
    20. 20)
      • 1. Pawar, R.V., Kajave, P.P., Mali, S.N.: ‘Speaker identification using neural networks’. Proc. World Academy of Science, Engineering and Technology, 2005, no. 7, pp. 429433.
    21. 21)
    22. 22)
      • 49. Revathi, A., Ganapathy, R., Venkataramani, Y.: ‘Text independent speaker recognition and speaker independent speech recognition using iterative clustering approach’, Int. J. Comput. Sci. Inf. Technol., 2009, 1, (2), pp. 3042.
    23. 23)
      • 44. Chi, T.S., Lin, T.H., Hsu, C.C.: ‘Spectro-temporal modulation energy based mask for robust speaker identification’, J. Acoust. Soc. Am., 2012, 131, (5), pp. 368374.
    24. 24)
      • 50. Gomez, P.: ‘A text independent speaker recognition system using a novel parametric neural network’, Int. J. Signal Process., Image Process. Pattern Recognit., 2011, 1, pp. 116.
    25. 25)
      • 16. Chetty, G., Wagner, M.: ‘Audio visual speaker verification based on hybrid fusion of cross modal features’, in Pattern Recognition and Machine Intelligence, (Springer, Berlin, 2007).
    26. 26)
    27. 27)
      • 35. The GRID audio corpus for speech recognition’. Available at http://www.dcs.shef.ac.uk/spandh/gridcorpus.
    28. 28)
      • 14. Ross, A., Jain, A.: ‘Information fusion in biometrics’, Pattern Recognit. Lett., 2003, 24, (3), pp. 21152125.
    29. 29)
      • 5. Abdalla, M.I., Ali, H.S.: ‘Wavelet-based Mel-frequency cepstral coefficients for speaker identification using hidden Markov models’, J. Telecommun., 2010, 1, (2), pp. 1621.
    30. 30)
      • 17. Chetty, G., Wagner, M.: ‘Investigating feature-level fusion for checking liveness in face-voice authentication’. Int. Symp. on Signal Processing and its Applications, 2005, vol. 1.
    31. 31)
    32. 32)
    33. 33)
    34. 34)
    35. 35)
      • 18. Arora, S., Bhattacharjee, D., Nasipuri, M., Malik, L., Kundu, M., Basu, D.K.: ‘Performance comparison of SVM and ANN for handwritten Devnagari character recognition’, IJCSI Int. J. Comput. Sci., 2010, 7, (3), pp. 110.
    36. 36)
      • 13. Hall, D.L., Llinas, J.: ‘Handbook of multi-sensor data fusion’ (CRC Press, UK,2011).
    37. 37)
    38. 38)
      • 10. Shukla, A., Tiwari, R., Hemant Kumar, M., Kala, R.: ‘Speaker identification using wavelet analysis and modular neural networks’, J. Acoust. Soc. India (JASI), 2009, 36, (1), pp. 1419.
    39. 39)
    40. 40)
      • 2. Rabiner, L., Juang, B.H.: ‘Fundamentals of speech recognition’ (Prentice-Hall, 1993).
    41. 41)
    42. 42)
      • 33. Rottland, J., Neukirchen, C., Willett, D., Rigoll, G.: ‘Large vocabulary speech recognition with context dependent MMI-connectionist/HMM systems using the WSJ database’. EUROSPEECH, 1997.
    43. 43)
      • 42. Morris, A., Wu, D., Koreman, J.: ‘GMM based clustering and speaker separability in the TIMIT speech database’, IEICE Trans. Fundam. Syst., 2005, 85, pp. 18.
    44. 44)
    45. 45)
      • 34. Hamzah, R., Jamil, N., Seman, N.: ‘Filled pause classification using energy-boosted Mel-frequency cepstrum coefficients’. Proc. Int. Conf. on Robotic, Vision, Signal Processing & Power Applications, 2014, pp. 311319.
    46. 46)
    47. 47)
      • 38. Gelbart, D.: ‘Ensemble feature selection for multi-stream automatic speech recognition’. Technical Report No. UCB/EECS-2008-160, University of California at Berkeley, December2008.
    48. 48)
    49. 49)
    50. 50)
      • 7. Kekre1, H.B., Kulkarni, V.: ‘Speaker identification by using vector quantization’, Int. J. Eng. Sci. Technol., 2010, 2, (5), pp. 13251331.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-bmt.2014.0011
Loading

Related content

content/journals/10.1049/iet-bmt.2014.0011
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address