http://iet.metastore.ingenta.com
1887

Deep neural network-based linear predictive parameter estimations for speech enhancement

Deep neural network-based linear predictive parameter estimations for speech enhancement

For access to this article, please select a purchase option:

Buy article PDF
$19.95
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Signal Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

This study presents a speech enhancement technique to improve noise corrupted speech via deep neural network (DNN)-based linear predictive (LP) parameter estimations of speech and noise. With regard to the LP coefficient estimation, an enhanced estimation method using a DNN with multiple layers was proposed. Excitation variances were then estimated via a maximum-likelihood scheme using observed noisy speech and estimated LP coefficients. A time-smoothed Wiener filter was further introduced to improve the enhanced speech quality. Performance was evaluated via log spectral distance, a composite multivariate adaptive regression splines modelling-based measure, and a segmental signal-to-noise ratio. The experimental results revealed that the proposed scheme outperformed competing methods.

References

    1. 1)
      • 1. Loizou, P.C.: ‘Speech enhancement: theory and practice’ (CRC press, Boca Raton, FL, USA, 2007).
    2. 2)
      • 2. Boll, S.F.: ‘Suppression of acoustic noise in speech using spectral subtraction’, IEEE Trans. Acoust. Speech Signal Process., 1979, 27, (2), pp. 113120.
    3. 3)
      • 3. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum mean square error short-time spectral amplitude estimator’, IEEE Trans. Acoust., Speech Signal Process., 1984, 32, (6), pp. 11091121.
    4. 4)
      • 4. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum mean square error log-spectral amplitude estimator’, IEEE Trans. Acoust. Speech Signal Process., 1985, 33, (2), pp. 443445.
    5. 5)
      • 5. Veisi, H., Sameti, H.: ‘Speech enhancement using hidden Markov models in mel-frequency domain’, Speech Commun., 2013, 55, (2), pp. 205220.
    6. 6)
      • 6. Sameti, H., Sheikhzadeh, H., Deng, L., et al: ‘HMM-based strategies for enhancement of speech signals embedded in nonstationary noise’, IEEE Trans. Speech Audio Process., 1998, 6, (5), pp. 445455.
    7. 7)
      • 7. Srinivasan, S., Samuelsson, J., Kleijn, W.B.: ‘Codebook driven short-term predictor parameter estimation for speech enhancement’, IEEE Trans. Audio, Speech, Language Process., 2006, 14, (1), pp. 163176.
    8. 8)
      • 8. Srinivasan, S., Samuelsson, J., Kleijn, W.B.: ‘Codebook-based Bayesian speech enhancement for nonstationary environments’, IEEE Trans. Audio, Speech, Language Process., 2007, 15, (2), pp. 441452.
    9. 9)
      • 9. Yegnanarayana, B., Murthy, P.S.: ‘Enhancement of reverberant speech using LP residual signal’, IEEE Trans. Speech Audio Process., 2000, 8, (3), pp. 267281.
    10. 10)
      • 10. Ling, Z., Kang, S., Zen, H., et al: ‘Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends’, IEEE Signal Process. Mag., 2015, 32, (3), pp. 3552.
    11. 11)
      • 11. Zhang, X., Wang, D.: ‘A deep ensemble learning method for monaural speech separation’, IEEE Trans. Audio, Speech, Lang. Process., 2016, 24, (5), pp. 967977.
    12. 12)
      • 12. Wang, Y., Wang, D.: ‘Towards scaling up classification-based speech separation’, IEEE Trans. Audio, Speech, Lang. Process., 2013, 21, (7), pp. 13811390.
    13. 13)
      • 13. Jacobsen, A.P., Kolbaek, M.: ‘Spectral speech enhancement using deep neural networks’. Master thesis, Aalborg University, 2015.
    14. 14)
      • 14. Liu, D., Smaragdis, P., Kim, M.: ‘Experiments on deep learning for speech denoising’. Proc. INTERSPEECH, Singapore, September 2014, pp. 26852689.
    15. 15)
      • 15. Xu, Y., Du, J., Dai, L., et al: ‘An experimental study on speech enhancement based on deep neural networks’, IEEE Signal Process. Lett., 2014, 21, (1), pp. 6568.
    16. 16)
      • 16. Deng, F., Bao, C.: ‘Speech enhancement based on AR model parameters estimation’, Speech Commun., 2016, 79, pp. 3046.
    17. 17)
      • 17. Kondoz, A.M.: ‘Digital speech: coding for low bit rate communication systems’ (John Wiley & Sons Ltd, Chichester, West Sussex, England, 2004, 2nd edn.).
    18. 18)
      • 18. Yamashita, T., Tanaka, M., Yoshida, E., et al: ‘To be Bernoulli or to be Gaussian, for a restricted Boltzmann machine’. Proc. 22nd Int. Conf. on Pattern Recognition, Stockholm, Sweden, August 2014, pp. 15201525.
    19. 19)
      • 19. Hinton, G.E., Salakhutdinov, R.R.: ‘Reducing the dimensionality of data with neural networks’, Science, 2006, 313, (5786), pp. 504507.
    20. 20)
      • 20. Li, Y., Kang, S.: ‘Artificial bandwidth extension using deep neural network-based spectral envelope estimation and enhanced excitation estimation’, IET Signal Process., 2016, 10, (4), pp. 422427.
    21. 21)
      • 21. Erhan, D., Bengio, Y., Courville, A., et al: ‘Why does unsupervised pre-training help deep learning?’, J. Mach. Learn. Res., 2010, 11, pp. 625660.
    22. 22)
      • 22. Hinton, G.E., Osindero, S., Teh, Y.: ‘A fast learning algorithm for deep belief nets’, Neural Comput., 2006, 18, (7), pp. 15271554.
    23. 23)
      • 23. Kay, S.M.: ‘Fundamentals of statistical signal processing, volume I: Estimation theory’ (Prentice-Hall, Englewood Cliffs, New Jersey, USA, 1993).
    24. 24)
      • 24. Kullback, S., Leibler, R.A.: ‘On information and sufficiency’, Ann. Math. Stat., 1951, 22, (1), pp. 7986.
    25. 25)
      • 25. Eguchi, S., Copas, J.: ‘Interpreting Kullback-Leibler divergence with the Neyman-Pearson lemma’, J. Multivariate Anal., 2006, 97, (9), pp. 20342040.
    26. 26)
      • 26. Carlson, B.A., Clements, M.A.: ‘A computationally compact divergence measure for speech processing’, IEEE Trans. Pattern Anal. Mach Intell., 1991, 13, (12), pp. 12551260.
    27. 27)
      • 27. Gray, R.M., Buzo, A., Gray, A.H., et al: ‘Distortion measure for speech processing’, IEEE Trans. Acoust. Speech Signal Process., 1980, 28, (4), pp. 367376.
    28. 28)
      • 28. Garofolo, J., Lamel, L., Fisher, W., et al: ‘Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database’ (National Institute of Standards and Technology (NIST), Gaithersburgh, MD, 1988).
    29. 29)
      • 29. Varga, A., Steeneken, H.J.M.: ‘Assessment for automatic speech recognition: II.NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems’, Speech Commun., 1993, 12, (3), pp. 247251.
    30. 30)
      • 30. ITU-T Recommendation: ‘Test signals for use in telephonometry’, 2012, p. 501.
    31. 31)
      • 31. Hu, Y., Loizou, P.C.: ‘Evaluation of objective quality measures for speech enhancement’, IEEE Trans. Audio, Speech, Lang. Process., 2008, 16, (1), pp. 229238.
    32. 32)
      • 32. ITU-T Recommendation: ‘Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs’, 2001, p. 862.
    33. 33)
      • 33. ‘Multilingual speech database for telephonometry’, NTT Advance Technology Corp., 1994.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2016.0477
Loading

Related content

content/journals/10.1049/iet-spr.2016.0477
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address