Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Distributed multichannel speech enhancement based on perceptually-motivated Bayesian estimators of the spectral amplitude

In this study, the authors propose multichannel weighted Euclidean (WE) and weighted cosh (WCOSH) cost function estimators for speech enhancement in the distributed microphone scenario. The goal of the work is to illustrate the advantages of utilising additional microphones and modified cost functions for improving signal-to-noise ratio (SNR) and segmental SNR (SSNR) along with log-likelihood ratio (LLR) and perceptual evaluation of speech quality (PESQ) objective metrics over the corresponding single-channel baseline estimators. As with their single-channel counterparts, the perceptually-motivated multichannel WE and WCOSH estimators are functions of a weighting law parameter, which influences attention of the noisy spectral amplitude through a spectral gain function, emphasises spectral peak (formant) information, and accounts for auditory masking effects. Based on the simulation results, the multichannel WE and WCOSH cost function estimators produced gains in SSNR improvement, LLR output and PESQ output over the single-channel baseline results and unweighted cost functions with the best improvements occurring with negative values of the weighting law parameter across all input SNR levels and noise types.

References

    1. 1)
      • 21. Gradshteyn, I.S., Ryzhik, Z.M.: ‘Table of integrals, series, and products’ (Academic, New York, 1994, 5th edn.).
    2. 2)
      • 10. Polastre, J., Szewczyk, R., Mainwaring, A.: ‘Chapter 18: analysis of wireless sensor networks for habitat monitoring’, in Raghavendra, C.S., Sivalingam, K.M., Zruti, T. (Ed.): ‘Wireless sensor networks’ (Kluwer Academic Publishers, Norwell, MA, USA, 2004).
    3. 3)
      • 4. Loizou, P.C.: ‘Speech enhancement theory and practice’ (CRC Press, 2007).
    4. 4)
      • 7. Erkelens, J.S., Hendriks, R.C., Heusdens, R., Jensen, J.: ‘Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors’, IEEE Trans. Audio Speech Lang. Process., 2007, 15, pp. 17411752 (doi: 10.1109/TASL.2007.899233).
    5. 5)
      • 15. Bertrand, A., Callebaut, J., Moonen, M.: ‘Adaptive distributed noise reduction for speech enhancement in wireless acoustic sensor networks’. Int. Workshop on Acoustic Echo and Noise Control (IWAENC), Tel Aviv, Israel, 2010.
    6. 6)
      • 9. You, C.H., Koh, S.N., Rahardja, S.: ‘Beta-order MMSE spectral amplitude estimation for speech enhancement’, IEEE Trans. Speech Audio Process., 2005, 13, pp. 475486 (doi: 10.1109/TSA.2005.848883).
    7. 7)
      • 19. Martin, R.: ‘Speech enhancement based on minimum mean-square error estimation and supergaussian priors’, IEEE Trans. Acoust. Speech Signal Process., 2005, 13, pp. 845856 (doi: 10.1109/TSA.2005.851927).
    8. 8)
      • 24. Trawicki, M.B., Johnson, M.T.: ‘Distributed multichannel speech enhancement with minimum mean-square error short-time spectral amplitude, log-spectral amplitude, and spectral phase estimation’, Signal Process., 2012, 92, pp. 345356 (doi: 10.1016/j.sigpro.2011.07.021).
    9. 9)
      • 5. Hu, Y., Loizou, P.: ‘Evaluation of objective quality measures for speech enhancement’, IEEE Trans. Audio Speech Lang. Process., 2008, 16, pp. 229238 (doi: 10.1109/TASL.2007.911054).
    10. 10)
      • 6. Andrianakis, I., White, P.R.: ‘Speech spectral amplitude estimators using optimally shaped gamma and Chi priors’, Speech Commun., 2009, 51, (1), pp. 114 (doi: 10.1016/j.specom.2008.05.018).
    11. 11)
      • 26. Quackenbush, S.R., Barnwell, I.T.P., Clements, M.A.: ‘Objective measures of speech quality’ (Prentice-Hall, New York, 1998).
    12. 12)
      • 23. Varga, A., Steeneken, H.J.M.: ‘Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems’, Speech Commun., 1993, 12, pp. 247251 (doi: 10.1016/0167-6393(93)90095-3).
    13. 13)
      • 3. Loizou, P.C.: ‘Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum’, IEEE Trans. Acoust. Speech Signal Process., 2005, 13, pp. 857869 (doi: 10.1109/TSA.2005.851929).
    14. 14)
      • 16. Lotter, T., Benien, C., Vary, P.: ‘Multichannel direction-independent speech enhancement using spectral amplitude estimation’, EURASIP J. Appl. Signal Process., 2003, 2003, (1), pp. 11471156 (doi: 10.1155/S1110865703305025).
    15. 15)
      • 1. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator’, IEEE Trans. Acoust. Speech Signal Process., 1984, ASSP-32, pp. 11091121 (doi: 10.1109/TASSP.1984.1164453).
    16. 16)
      • 27. ITU-T: ‘Recommendation P.862: perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs’. 2001..
    17. 17)
      • 17. Trawicki, M.B., Johnson, M.T.: ‘Optimal distributed microphone phase estimation’. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, R.O.C., 2009.
    18. 18)
      • 20. Gradshteyn, I.S., Ryzhik, Z.M.: ‘Table of integrals, series, and products’ (Academic, New York City, NY, USA, 1980).
    19. 19)
      • 8. Plourde, E., Champagne, B.: ‘Auditory-based spectral amplitude estimators for speech enhancement’, IEEE Trans. Audio Speech Lang. Process., 2008, 16, pp. 16141623 (doi: 10.1109/TASL.2008.2004304).
    20. 20)
      • 25. Papamichalis, P.E.: ‘Practical approaches to speech coding’ (Prentice-Hall, New York, NY, USA, 1987).
    21. 21)
      • 14. Milani, A.A., Kannan, G., Panahi, I.M.S., Briggs, R.: ‘A multichannel speech enhancement method for functional MRI systems using a distributed microphone array’. Annual Int. Conf. IEEE Engineering in Medicine and Biology Society, Minneapolis, MN, USA, 2009.
    22. 22)
      • 18. Knapp, C.H., Carter, G.C.: ‘The generalized correlation method for estimation of time delay’, IEEE Trans. Acoust. Speech Signal Process., 1976, ASSP-24, pp. 320327 (doi: 10.1109/TASSP.1976.1162830).
    23. 23)
      • 13. Himawan, I., McCowan, I., Sridharan, S.: ‘Clustered blind beamforming from ad-hoc microphone arrays’, IEEE Trans. Audio Speech Lang. Process., 2001, 19, pp. 661676 (doi: 10.1109/TASL.2010.2055560).
    24. 24)
      • 11. Hendriks, R.C., Heusdens, R., Kjerns, U., Jensen, J.: ‘On optimal multichannel mean-squared error estimators for speech enhancement’, IEEE Signal Process. Lett., 2009, 16, pp. 885888 (doi: 10.1109/LSP.2009.2026205).
    25. 25)
      • 2. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum mean-square error log-spectral amplitude estimator’, IEEE Trans. Acoust. Speech Signal Process., 1985, 33, pp. 443445 (doi: 10.1109/TASSP.1985.1164550).
    26. 26)
      • 12. Trawicki, M.B.: ‘Distributed multichannel processing for signal enhancement’. Electrical and Computer Engineering, Marquette University, Milwaukee, Dissertation, 2009, pp. 228.
    27. 27)
      • 22. Garofolo, J., Lamel, L., Fisher, W.: ‘TIMIT acoustic-phonetic continuous speech corpus’ (Linguistic Data Consortium, 1993).
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2012.0167
Loading

Related content

content/journals/10.1049/iet-spr.2012.0167
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address