© The Institution of Engineering and Technology
This study presents a speech enhancement technique to improve noise corrupted speech via deep neural network (DNN)-based linear predictive (LP) parameter estimations of speech and noise. With regard to the LP coefficient estimation, an enhanced estimation method using a DNN with multiple layers was proposed. Excitation variances were then estimated via a maximum-likelihood scheme using observed noisy speech and estimated LP coefficients. A time-smoothed Wiener filter was further introduced to improve the enhanced speech quality. Performance was evaluated via log spectral distance, a composite multivariate adaptive regression splines modelling-based measure, and a segmental signal-to-noise ratio. The experimental results revealed that the proposed scheme outperformed competing methods.
References
-
-
1)
-
6. Sameti, H., Sheikhzadeh, H., Deng, L., et al: ‘HMM-based strategies for enhancement of speech signals embedded in nonstationary noise’, IEEE Trans. Speech Audio Process., 1998, 6, (5), pp. 445–455.
-
2)
-
21. Erhan, D., Bengio, Y., Courville, A., et al: ‘Why does unsupervised pre-training help deep learning?’, J. Mach. Learn. Res., 2010, 11, pp. 625–660.
-
3)
-
28. Garofolo, J., Lamel, L., Fisher, W., et al: ‘Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database’ (National Institute of Standards and Technology (NIST), Gaithersburgh, MD, 1988).
-
4)
-
25. Eguchi, S., Copas, J.: ‘Interpreting Kullback-Leibler divergence with the Neyman-Pearson lemma’, J. Multivariate Anal., 2006, 97, (9), pp. 2034–2040.
-
5)
-
1. Loizou, P.C.: ‘Speech enhancement: theory and practice’ (CRC press, Boca Raton, FL, USA, 2007).
-
6)
-
8. Srinivasan, S., Samuelsson, J., Kleijn, W.B.: ‘Codebook-based Bayesian speech enhancement for nonstationary environments’, IEEE Trans. Audio, Speech, Language Process., 2007, 15, (2), pp. 441–452.
-
7)
-
3. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum mean square error short-time spectral amplitude estimator’, IEEE Trans. Acoust., Speech Signal Process., 1984, 32, (6), pp. 1109–1121.
-
8)
-
24. Kullback, S., Leibler, R.A.: ‘On information and sufficiency’, Ann. Math. Stat., 1951, 22, (1), pp. 79–86.
-
9)
-
10)
-
27. Gray, R.M., Buzo, A., Gray, A.H., et al: ‘Distortion measure for speech processing’, IEEE Trans. Acoust. Speech Signal Process., 1980, 28, (4), pp. 367–376.
-
11)
-
15. Xu, Y., Du, J., Dai, L., et al: ‘An experimental study on speech enhancement based on deep neural networks’, IEEE Signal Process. Lett., 2014, 21, (1), pp. 65–68.
-
12)
-
17. Kondoz, A.M.: ‘Digital speech: coding for low bit rate communication systems’ (John Wiley & Sons Ltd, Chichester, West Sussex, England, 2004, 2nd edn.).
-
13)
-
20. Li, Y., Kang, S.: ‘Artificial bandwidth extension using deep neural network-based spectral envelope estimation and enhanced excitation estimation’, IET Signal Process., 2016, 10, (4), pp. 422–427.
-
14)
-
31. Hu, Y., Loizou, P.C.: ‘Evaluation of objective quality measures for speech enhancement’, IEEE Trans. Audio, Speech, Lang. Process., 2008, 16, (1), pp. 229–238.
-
15)
-
2. Boll, S.F.: ‘Suppression of acoustic noise in speech using spectral subtraction’, IEEE Trans. Acoust. Speech Signal Process., 1979, 27, (2), pp. 113–120.
-
16)
-
29. Varga, A., Steeneken, H.J.M.: ‘Assessment for automatic speech recognition: II.NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems’, Speech Commun., 1993, 12, (3), pp. 247–251.
-
17)
-
16. Deng, F., Bao, C.: ‘Speech enhancement based on AR model parameters estimation’, Speech Commun., 2016, 79, pp. 30–46.
-
18)
-
10. Ling, Z., Kang, S., Zen, H., et al: ‘Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends’, IEEE Signal Process. Mag., 2015, 32, (3), pp. 35–52.
-
19)
-
9. Yegnanarayana, B., Murthy, P.S.: ‘Enhancement of reverberant speech using LP residual signal’, IEEE Trans. Speech Audio Process., 2000, 8, (3), pp. 267–281.
-
20)
-
18. Yamashita, T., Tanaka, M., Yoshida, E., et al: ‘To be Bernoulli or to be Gaussian, for a restricted Boltzmann machine’. Proc. 22nd Int. Conf. on Pattern Recognition, Stockholm, Sweden, August 2014, pp. 1520–1525.
-
21)
-
12. Wang, Y., Wang, D.: ‘Towards scaling up classification-based speech separation’, IEEE Trans. Audio, Speech, Lang. Process., 2013, 21, (7), pp. 1381–1390.
-
22)
-
23. Kay, S.M.: ‘Fundamentals of statistical signal processing, volume I: Estimation theory’ (Prentice-Hall, Englewood Cliffs, New Jersey, USA, 1993).
-
23)
-
4. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum mean square error log-spectral amplitude estimator’, IEEE Trans. Acoust. Speech Signal Process., 1985, 33, (2), pp. 443–445.
-
24)
-
19. Hinton, G.E., Salakhutdinov, R.R.: ‘Reducing the dimensionality of data with neural networks’, Science, 2006, 313, (5786), pp. 504–507.
-
25)
-
14. Liu, D., Smaragdis, P., Kim, M.: ‘Experiments on deep learning for speech denoising’. Proc. INTERSPEECH, Singapore, September 2014, pp. 2685–2689.
-
26)
-
7. Srinivasan, S., Samuelsson, J., Kleijn, W.B.: ‘Codebook driven short-term predictor parameter estimation for speech enhancement’, IEEE Trans. Audio, Speech, Language Process., 2006, 14, (1), pp. 163–176.
-
27)
-
5. Veisi, H., Sameti, H.: ‘Speech enhancement using hidden Markov models in mel-frequency domain’, Speech Commun., 2013, 55, (2), pp. 205–220.
-
28)
-
32. : ‘Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs’, 2001, p. 862.
-
29)
-
11. Zhang, X., Wang, D.: ‘A deep ensemble learning method for monaural speech separation’, IEEE Trans. Audio, Speech, Lang. Process., 2016, 24, (5), pp. 967–977.
-
30)
-
22. Hinton, G.E., Osindero, S., Teh, Y.: ‘A fast learning algorithm for deep belief nets’, Neural Comput., 2006, 18, (7), pp. 1527–1554.
-
31)
-
13. Jacobsen, A.P., Kolbaek, M.: ‘Spectral speech enhancement using deep neural networks’. Master thesis, Aalborg University, 2015.
-
32)
-
26. Carlson, B.A., Clements, M.A.: ‘A computationally compact divergence measure for speech processing’, IEEE Trans. Pattern Anal. Mach Intell., 1991, 13, (12), pp. 1255–1260.
-
33)
-
30. ITU-T Recommendation: ‘Test signals for use in telephonometry’, 2012, p. 501.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2016.0477
Related content
content/journals/10.1049/iet-spr.2016.0477
pub_keyword,iet_inspecKeyword,pub_concept
6
6