Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Dual-channel VTS feature compensation for noise-robust speech recognition on mobile devices

One way to improve automatic speech recognition (ASR) performance on the latest mobile devices, which can be employed on a variety of noisy environments, consists of taking advantage of the small microphone arrays embedded in them. Since the performance of the classic beamforming techniques with small microphone arrays is rather limited, specific techniques are being developed to efficiently exploit this novel feature for noise-robust ASR purposes. In this study, a novel dual-channel minimum mean square error-based feature compensation method relying on a vector Taylor series (VTS) expansion of a dual-channel speech distortion model is proposed. In contrast to the single-channel VTS approach (which can be considered as the state-of-the-art for feature compensation), the authors’ technique particularly benefits from the spatial properties of speech and noise. Their proposal is assessed on a dual-microphone smartphone (a particular case of interest) by means of the AURORA2-2C synthetic corpus. Word recognition results, also validated with real noisy speech data, demonstrate the higher accuracy of their method by clearly outperforming minimum variance distortionless response beamforming and a single-channel VTS feature compensation approach, especially at low signal-to-noise ratios.

References

    1. 1)
      • 1. Barker, J., Marxer, R., Vincent, E., et al: ‘The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines’. ASRU 2015 – IEEE Automatic Speech Recognition and Understanding, Scottsdale, USA, 13–17 December 2015.
    2. 2)
      • 2. Baker, J. M., Deng, L., Khudanpur, S., et al: ‘Updated MINDS report on speech recognition and understanding, part 2’, IEEE Signal Process. Mag., 2009, 26, pp. 7885.
    3. 3)
      • 23. Faubel, F., McDonough, J., Klakow, D.: ‘On expectation maximization based channel and noise estimation beyond the vector Taylor series expansion’. ICASSP 2010 – 35th Int. Conf. on Acoustics, Speech, and Signal Processing, Dallas, USA, 14–19 March 2010.
    4. 4)
      • 21. Atal, B.S.: ‘Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification’, J. Acoust. Soc. Am., 1974, 55, pp. 13041312.
    5. 5)
      • 9. Mestre, X., Lagunas, M.Á.: ‘On diagonal loading for minimum variance beamformers’. ISSPIT 2003 – 3th Int. Symp. on Signal Processing and Information Technology, Darmstadt, Germany, 2003, pp. 459462.
    6. 6)
      • 16. Segura, J.C., Torre, A., Benitez, M.C., et al: ‘Model-based compensation of the additive noise for continuous speech recognition. Experiments using the AURORA II database and tasks’. EUROSPEECH 2001 – 7th European Conf. on Speech Communication and Technology, Aalborg, Denmark, 3–7 September 2001.
    7. 7)
      • 18. ETSI ES 202 050 - Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms.
    8. 8)
      • 7. Sugiyama, A., Miyahara, R.: ‘A new generalized sidelobe canceller with a compact array of microphones suitable for mobile terminals’. ICASSP 2014 – 39th Int. Conf. on Acoustics, Speech, and Signal Processing, Florence, Italy, 4–9 May 2014, pp. 820824.
    9. 9)
      • 11. López-Espejo, I., Gomez, A.M., González, J.A., et al: ‘Feature enhancement for robust speech recognition on smartphones with dual-microphone’. EUSIPCO 2014 – 22nd European Signal Processing Conf., Lisbon, Portugal, 1–5 September 2014, pp. 2125.
    10. 10)
      • 8. Yousefian, N., Akbaria, A., Rahmani, M.: ‘Using power level difference for near field dualmicrophone speech enhancement’, Appl. Acoust., 2009, 70, pp. 14121421.
    11. 11)
      • 10. López-Espejo, I., González, J.A., Gomez, A.M., et al: ‘A deep neural network approach for missing-data mask estimation on dual-microphone smartphones: application to noise-robust speech recognition’, Lect. Notes Comput. Sci., 2014, 8854, pp. 119128.
    12. 12)
      • 14. Moreno, P.J., Raj, B., Stern, R.M.: ‘A vector Taylor series approach for environment independent speech recognition’. ICASSP 1996 – 21st Int. Conf. on Acoustics, Speech, and Signal Processing, Atlanta, GA, 7–10 May 1996, pp. 733736.
    13. 13)
      • 29. Baby, D., Gemmeke, J.F., Virtanen, T., et al: ‘Exemplar-based speech enhancement for deep neural network based automatic speech recognition’. ICASSP 2015 – 40th Int. Conf. on Acoustics, Speech, and Signal Processing, Brisbane, Australia, 19–24 April 2015.
    14. 14)
      • 12. Tashev, I., Mihov, S., Gleghorn, T., et al: ‘Sound capture system and spatial filter for small devices’. EUROSPEECH 2008 – 9th Annual Conf. of the Int. Speech Communication Association, Brisbane, Australia, 22–26 September 2008, pp. 435438.
    15. 15)
      • 15. Moreno, P.: ‘Speech recognition in noisy environments’. PhD Thesis, Carnegie Mellon University, 1996.
    16. 16)
      • 22. González, J.A., Peinado, A.M., Gomez, A.M., et al: ‘Efficient MMSE estimation and uncertainty processing for multienvironment robust speech recognition’, IEEE Trans. Audio, Speech, Lang. Process., 2011, 19, 12061220.
    17. 17)
      • 3. Jeub, M., Herglotz, C., Nelke, C.M., et al: ‘Noise reduction for dualmicrophone mobile phones exploiting power level differences’. ICASSP 2012 – 37th Int. Conf. on Acoustics, Speech, and Signal Processing, Kyoto, Japan, 25–30 March 2012, pp. 16931696.
    18. 18)
      • 4. Zhang, J., Xia, R., Fu, Z., et al: ‘A fast two-microphone noise reduction algorithm based on power level ratio for mobile phone’. ISCSLP 2012 – 8th Int. Symp. on Chinese Spoken Language Processing, Hong Kong, 5–8 December 2012, pp. 206209.
    19. 19)
      • 13. Tashev, I., Seltzer, M., Acero, A.: ‘Microphone array for headset with spatial noise suppressor’. IWAENC 2005 – 9th Int. Workshop on Acoustic, Echo and Noise Control, 2005.
    20. 20)
      • 27. González, J.A., Peinado, A.M., Ma, N., et al: ‘MMSE-based missing feature reconstruction with temporal modeling for robust speech recognition’, IEEE Trans. Audio, Speech, Lang. Process., 2013, 21, pp. 624635.
    21. 21)
      • 5. Fu, Z., Fan, F., Huang, J.: ‘Dual-microphone noise reduction for mobile phone application’. ICASSP 2013 – 38th Int. Conf. on Acoustics, Speech, and Signal Processing, Vancouver, Canada, 26–31 May 2013, pp. 72397243.
    22. 22)
      • 28. Chang, S.Y., Wegmann, S.: ‘On the importance of modeling and robustness for deep neural network feature’. ICASSP 2015 – 40th Int. Conf. on Acoustics, Speech, and Signal Processing, Brisbane, Australia, 19–24 April 2015.
    23. 23)
      • 20. Acero, A., Deng, L., Kristjansson, T., et al: ‘HMM adaptation using vector Taylor series for noisy speech recognition’. ICSLP 2000 – 6th Int. Conf. of Spoken Language Processing, Beijing, China, 16–20 October 2000, pp. 229232.
    24. 24)
      • 24. Petersen, K.B., Pedersen, M.S.: ‘The matrix cookbook’ (Technical University of Denmark, 2008).
    25. 25)
      • 25. Stouten, V., Van Hamme, H., Wambacq, P.: ‘Model-based feature enhancement with uncertainty decoding for noise robust ASR’, Speech Commun., 2006, 48, pp. 15021514.
    26. 26)
      • 19. Peinado, A.M., Segura, J.C.: ‘Speech recognition over digital channels’ (Wiley, 2006).
    27. 27)
      • 26. ETSI ES 201 108 - Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms.
    28. 28)
      • 17. Pearce, D., Hirsch, H.G.: ‘The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions’. ICSLP 2000 – 6th Int. Conf. of Spoken Language Processing, Beijing, China, 16–20 October 2000, pp. 2932.
    29. 29)
      • 6. Koldovsky, Z., Tichavsky, P., Botka, D.: ‘Noise reduction in dual-microphone mobile phones using a bank of pre-measured target-cancellation filters’. ICASSP 2013 – 38th Int. Conf. on Acoustics, Speech, and Signal Processing, Vancouver, Canada, 26–31 May 2013, pp. 679683.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2016.0182
Loading

Related content

content/journals/10.1049/iet-spr.2016.0182
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address