access icon free Deep neural network-based linear predictive parameter estimations for speech enhancement

This study presents a speech enhancement technique to improve noise corrupted speech via deep neural network (DNN)-based linear predictive (LP) parameter estimations of speech and noise. With regard to the LP coefficient estimation, an enhanced estimation method using a DNN with multiple layers was proposed. Excitation variances were then estimated via a maximum-likelihood scheme using observed noisy speech and estimated LP coefficients. A time-smoothed Wiener filter was further introduced to improve the enhanced speech quality. Performance was evaluated via log spectral distance, a composite multivariate adaptive regression splines modelling-based measure, and a segmental signal-to-noise ratio. The experimental results revealed that the proposed scheme outperformed competing methods.

Inspec keywords: speech enhancement; parameter estimation; regression analysis; splines (mathematics); neural nets; maximum likelihood estimation; Wiener filters

Other keywords: time-smoothed Wiener filter; maximum-likelihood scheme; LP parameter estimation method; signal-to-noise ratio; speech enhancement technique; performance evaluation; composite multivariate adaptive regression spline modelling; linear predictive parameter estimation; DNN; deep neural network; log spectral distance evaluation

Subjects: Other topics in statistics; Neural computing techniques; Speech processing techniques; Other topics in statistics; Speech and audio signal processing; Interpolation and function approximation (numerical analysis); Filtering methods in signal processing; Interpolation and function approximation (numerical analysis)

References

    1. 1)
      • 6. Sameti, H., Sheikhzadeh, H., Deng, L., et al: ‘HMM-based strategies for enhancement of speech signals embedded in nonstationary noise’, IEEE Trans. Speech Audio Process., 1998, 6, (5), pp. 445455.
    2. 2)
      • 21. Erhan, D., Bengio, Y., Courville, A., et al: ‘Why does unsupervised pre-training help deep learning?’, J. Mach. Learn. Res., 2010, 11, pp. 625660.
    3. 3)
      • 28. Garofolo, J., Lamel, L., Fisher, W., et al: ‘Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database’ (National Institute of Standards and Technology (NIST), Gaithersburgh, MD, 1988).
    4. 4)
      • 25. Eguchi, S., Copas, J.: ‘Interpreting Kullback-Leibler divergence with the Neyman-Pearson lemma’, J. Multivariate Anal., 2006, 97, (9), pp. 20342040.
    5. 5)
      • 1. Loizou, P.C.: ‘Speech enhancement: theory and practice’ (CRC press, Boca Raton, FL, USA, 2007).
    6. 6)
      • 8. Srinivasan, S., Samuelsson, J., Kleijn, W.B.: ‘Codebook-based Bayesian speech enhancement for nonstationary environments’, IEEE Trans. Audio, Speech, Language Process., 2007, 15, (2), pp. 441452.
    7. 7)
      • 3. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum mean square error short-time spectral amplitude estimator’, IEEE Trans. Acoust., Speech Signal Process., 1984, 32, (6), pp. 11091121.
    8. 8)
      • 24. Kullback, S., Leibler, R.A.: ‘On information and sufficiency’, Ann. Math. Stat., 1951, 22, (1), pp. 7986.
    9. 9)
      • 33. ‘Multilingual speech database for telephonometry’, NTT Advance Technology Corp., 1994.
    10. 10)
      • 27. Gray, R.M., Buzo, A., Gray, A.H., et al: ‘Distortion measure for speech processing’, IEEE Trans. Acoust. Speech Signal Process., 1980, 28, (4), pp. 367376.
    11. 11)
      • 15. Xu, Y., Du, J., Dai, L., et al: ‘An experimental study on speech enhancement based on deep neural networks’, IEEE Signal Process. Lett., 2014, 21, (1), pp. 6568.
    12. 12)
      • 17. Kondoz, A.M.: ‘Digital speech: coding for low bit rate communication systems’ (John Wiley & Sons Ltd, Chichester, West Sussex, England, 2004, 2nd edn.).
    13. 13)
      • 20. Li, Y., Kang, S.: ‘Artificial bandwidth extension using deep neural network-based spectral envelope estimation and enhanced excitation estimation’, IET Signal Process., 2016, 10, (4), pp. 422427.
    14. 14)
      • 31. Hu, Y., Loizou, P.C.: ‘Evaluation of objective quality measures for speech enhancement’, IEEE Trans. Audio, Speech, Lang. Process., 2008, 16, (1), pp. 229238.
    15. 15)
      • 2. Boll, S.F.: ‘Suppression of acoustic noise in speech using spectral subtraction’, IEEE Trans. Acoust. Speech Signal Process., 1979, 27, (2), pp. 113120.
    16. 16)
      • 29. Varga, A., Steeneken, H.J.M.: ‘Assessment for automatic speech recognition: II.NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems’, Speech Commun., 1993, 12, (3), pp. 247251.
    17. 17)
      • 16. Deng, F., Bao, C.: ‘Speech enhancement based on AR model parameters estimation’, Speech Commun., 2016, 79, pp. 3046.
    18. 18)
      • 10. Ling, Z., Kang, S., Zen, H., et al: ‘Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends’, IEEE Signal Process. Mag., 2015, 32, (3), pp. 3552.
    19. 19)
      • 9. Yegnanarayana, B., Murthy, P.S.: ‘Enhancement of reverberant speech using LP residual signal’, IEEE Trans. Speech Audio Process., 2000, 8, (3), pp. 267281.
    20. 20)
      • 18. Yamashita, T., Tanaka, M., Yoshida, E., et al: ‘To be Bernoulli or to be Gaussian, for a restricted Boltzmann machine’. Proc. 22nd Int. Conf. on Pattern Recognition, Stockholm, Sweden, August 2014, pp. 15201525.
    21. 21)
      • 12. Wang, Y., Wang, D.: ‘Towards scaling up classification-based speech separation’, IEEE Trans. Audio, Speech, Lang. Process., 2013, 21, (7), pp. 13811390.
    22. 22)
      • 23. Kay, S.M.: ‘Fundamentals of statistical signal processing, volume I: Estimation theory’ (Prentice-Hall, Englewood Cliffs, New Jersey, USA, 1993).
    23. 23)
      • 4. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum mean square error log-spectral amplitude estimator’, IEEE Trans. Acoust. Speech Signal Process., 1985, 33, (2), pp. 443445.
    24. 24)
      • 19. Hinton, G.E., Salakhutdinov, R.R.: ‘Reducing the dimensionality of data with neural networks’, Science, 2006, 313, (5786), pp. 504507.
    25. 25)
      • 14. Liu, D., Smaragdis, P., Kim, M.: ‘Experiments on deep learning for speech denoising’. Proc. INTERSPEECH, Singapore, September 2014, pp. 26852689.
    26. 26)
      • 7. Srinivasan, S., Samuelsson, J., Kleijn, W.B.: ‘Codebook driven short-term predictor parameter estimation for speech enhancement’, IEEE Trans. Audio, Speech, Language Process., 2006, 14, (1), pp. 163176.
    27. 27)
      • 5. Veisi, H., Sameti, H.: ‘Speech enhancement using hidden Markov models in mel-frequency domain’, Speech Commun., 2013, 55, (2), pp. 205220.
    28. 28)
      • 32. ITU-T Recommendation: ‘Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs’, 2001, p. 862.
    29. 29)
      • 11. Zhang, X., Wang, D.: ‘A deep ensemble learning method for monaural speech separation’, IEEE Trans. Audio, Speech, Lang. Process., 2016, 24, (5), pp. 967977.
    30. 30)
      • 22. Hinton, G.E., Osindero, S., Teh, Y.: ‘A fast learning algorithm for deep belief nets’, Neural Comput., 2006, 18, (7), pp. 15271554.
    31. 31)
      • 13. Jacobsen, A.P., Kolbaek, M.: ‘Spectral speech enhancement using deep neural networks’. Master thesis, Aalborg University, 2015.
    32. 32)
      • 26. Carlson, B.A., Clements, M.A.: ‘A computationally compact divergence measure for speech processing’, IEEE Trans. Pattern Anal. Mach Intell., 1991, 13, (12), pp. 12551260.
    33. 33)
      • 30. ITU-T Recommendation: ‘Test signals for use in telephonometry’, 2012, p. 501.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2016.0477
Loading

Related content

content/journals/10.1049/iet-spr.2016.0477
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading