This study presents a speech enhancement technique to improve noise corrupted speech via deep neural network (DNN)-based linear predictive (LP) parameter estimations of speech and noise. With regard to the LP coefficient estimation, an enhanced estimation method using a DNN with multiple layers was proposed. Excitation variances were then estimated via a maximum-likelihood scheme using observed noisy speech and estimated LP coefficients. A time-smoothed Wiener filter was further introduced to improve the enhanced speech quality. Performance was evaluated via log spectral distance, a composite multivariate adaptive regression splines modelling-based measure, and a segmental signal-to-noise ratio. The experimental results revealed that the proposed scheme outperformed competing methods.

References

1. 1)
  - 6. Sameti, H., Sheikhzadeh, H., Deng, L., et al: ‘HMM-based strategies for enhancement of speech signals embedded in nonstationary noise’, IEEE Trans. Speech Audio Process., 1998, 6, (5), pp. 445–455.
2. 2)
  - 21. Erhan, D., Bengio, Y., Courville, A., et al: ‘Why does unsupervised pre-training help deep learning?’, J. Mach. Learn. Res., 2010, 11, pp. 625–660.
3. 3)
  - 28. Garofolo, J., Lamel, L., Fisher, W., et al: ‘Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database’ (National Institute of Standards and Technology (NIST), Gaithersburgh, MD, 1988).
4. 4)
  - 25. Eguchi, S., Copas, J.: ‘Interpreting Kullback-Leibler divergence with the Neyman-Pearson lemma’, J. Multivariate Anal., 2006, 97, (9), pp. 2034–2040.
5. 5)
  - 1. Loizou, P.C.: ‘Speech enhancement: theory and practice’ (CRC press, Boca Raton, FL, USA, 2007).
6. 6)
  - 8. Srinivasan, S., Samuelsson, J., Kleijn, W.B.: ‘Codebook-based Bayesian speech enhancement for nonstationary environments’, IEEE Trans. Audio, Speech, Language Process., 2007, 15, (2), pp. 441–452.
7. 7)
  - 3. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum mean square error short-time spectral amplitude estimator’, IEEE Trans. Acoust., Speech Signal Process., 1984, 32, (6), pp. 1109–1121.
8. 8)
  - 24. Kullback, S., Leibler, R.A.: ‘On information and sufficiency’, Ann. Math. Stat., 1951, 22, (1), pp. 79–86.
9. 9)
  - 33. ‘Multilingual speech database for telephonometry’, NTT Advance Technology Corp., 1994.
10. 10)
  - 27. Gray, R.M., Buzo, A., Gray, A.H., et al: ‘Distortion measure for speech processing’, IEEE Trans. Acoust. Speech Signal Process., 1980, 28, (4), pp. 367–376.
11. 11)
  - 15. Xu, Y., Du, J., Dai, L., et al: ‘An experimental study on speech enhancement based on deep neural networks’, IEEE Signal Process. Lett., 2014, 21, (1), pp. 65–68.
12. 12)
  - 17. Kondoz, A.M.: ‘Digital speech: coding for low bit rate communication systems’ (John Wiley & Sons Ltd, Chichester, West Sussex, England, 2004, 2nd edn.).
13. 13)
  - 20. Li, Y., Kang, S.: ‘Artificial bandwidth extension using deep neural network-based spectral envelope estimation and enhanced excitation estimation’, IET Signal Process., 2016, 10, (4), pp. 422–427.
14. 14)
  - 31. Hu, Y., Loizou, P.C.: ‘Evaluation of objective quality measures for speech enhancement’, IEEE Trans. Audio, Speech, Lang. Process., 2008, 16, (1), pp. 229–238.
15. 15)
  - 2. Boll, S.F.: ‘Suppression of acoustic noise in speech using spectral subtraction’, IEEE Trans. Acoust. Speech Signal Process., 1979, 27, (2), pp. 113–120.
16. 16)
  - 29. Varga, A., Steeneken, H.J.M.: ‘Assessment for automatic speech recognition: II.NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems’, Speech Commun., 1993, 12, (3), pp. 247–251.
17. 17)
  - 16. Deng, F., Bao, C.: ‘Speech enhancement based on AR model parameters estimation’, Speech Commun., 2016, 79, pp. 30–46.
18. 18)
  - 10. Ling, Z., Kang, S., Zen, H., et al: ‘Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends’, IEEE Signal Process. Mag., 2015, 32, (3), pp. 35–52.
19. 19)
  - 9. Yegnanarayana, B., Murthy, P.S.: ‘Enhancement of reverberant speech using LP residual signal’, IEEE Trans. Speech Audio Process., 2000, 8, (3), pp. 267–281.
20. 20)
  - 18. Yamashita, T., Tanaka, M., Yoshida, E., et al: ‘To be Bernoulli or to be Gaussian, for a restricted Boltzmann machine’. Proc. 22nd Int. Conf. on Pattern Recognition, Stockholm, Sweden, August 2014, pp. 1520–1525.
21. 21)
  - 12. Wang, Y., Wang, D.: ‘Towards scaling up classification-based speech separation’, IEEE Trans. Audio, Speech, Lang. Process., 2013, 21, (7), pp. 1381–1390.
22. 22)
  - 23. Kay, S.M.: ‘Fundamentals of statistical signal processing, volume I: Estimation theory’ (Prentice-Hall, Englewood Cliffs, New Jersey, USA, 1993).
23. 23)
  - 4. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum mean square error log-spectral amplitude estimator’, IEEE Trans. Acoust. Speech Signal Process., 1985, 33, (2), pp. 443–445.
24. 24)
  - 19. Hinton, G.E., Salakhutdinov, R.R.: ‘Reducing the dimensionality of data with neural networks’, Science, 2006, 313, (5786), pp. 504–507.
25. 25)
  - 14. Liu, D., Smaragdis, P., Kim, M.: ‘Experiments on deep learning for speech denoising’. Proc. INTERSPEECH, Singapore, September 2014, pp. 2685–2689.
26. 26)
  - 7. Srinivasan, S., Samuelsson, J., Kleijn, W.B.: ‘Codebook driven short-term predictor parameter estimation for speech enhancement’, IEEE Trans. Audio, Speech, Language Process., 2006, 14, (1), pp. 163–176.
27. 27)
  - 5. Veisi, H., Sameti, H.: ‘Speech enhancement using hidden Markov models in mel-frequency domain’, Speech Commun., 2013, 55, (2), pp. 205–220.
28. 28)
  - 32. ITU-T Recommendation: ‘Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs’, 2001, p. 862.
29. 29)
  - 11. Zhang, X., Wang, D.: ‘A deep ensemble learning method for monaural speech separation’, IEEE Trans. Audio, Speech, Lang. Process., 2016, 24, (5), pp. 967–977.
30. 30)
  - 22. Hinton, G.E., Osindero, S., Teh, Y.: ‘A fast learning algorithm for deep belief nets’, Neural Comput., 2006, 18, (7), pp. 1527–1554.
31. 31)
  - 13. Jacobsen, A.P., Kolbaek, M.: ‘Spectral speech enhancement using deep neural networks’. Master thesis, Aalborg University, 2015.
32. 32)
  - 26. Carlson, B.A., Clements, M.A.: ‘A computationally compact divergence measure for speech processing’, IEEE Trans. Pattern Anal. Mach Intell., 1991, 13, (12), pp. 1255–1260.
33. 33)
  - 30. ITU-T Recommendation: ‘Test signals for use in telephonometry’, 2012, p. 501.

Deep neural network-based linear predictive parameter estimations for speech enhancement

References

Related content