Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Hidden Markov model-based speech enhancement using multivariate Laplace and Gaussian distributions

In this paper, statistical speech enhancement using hidden Markov model (HMM) is studied and new techniques for applying non-Gaussian distributions are proposed. The superiority of using non-Gaussian distributions in online adaptive noise suppression algorithms has been proven; however, in this study, this approach is formulated in an HMM-based mean-square error estimator (MMSE) estimator in which a priori models are trained in an off-line manner. In addition, an analytical study of using different distributions other than autoregressive (AR) Gaussian distribution, such as Laplace, is presented in order to construct an accurate HMM as a priori model for discrete Fourier transform and discrete cosine transform feature vectors of speech signal. In the proposed framework, an HMM-based MMSE estimator bassed on Gaussian assumption using diagonal covariance matrix is provided rather than AR hypothesis which is employed in the conventional AR-HMM-based speech enhancement algorithm. Experimental evaluations of the proposed methods are done in the presence of four different noise types at various signal-to-noise ratio levels which demonstrate the superiority of the proposed methods in most conditions in comparison with AR-HMM.

References

    1. 1)
    2. 2)
    3. 3)
    4. 4)
    5. 5)
      • 9. Martin, R., Breithaupt, C.: ‘Speech enhancement in the DFT domain using Laplacian speech priors’. Int. Workshop on Acoustic Echo and Noise Control (IWAENC2003), Kyoto, Japan, September 2003.
    6. 6)
      • 7. Erkelens, J.S., Jensen, J., Heusdens, R.: ‘Speech enhancement based on Rayleigh mixture modeling of speech spectral amplitude distributions’. European Signal Proc. Conf. (EUSIPCO), 2007.
    7. 7)
      • 25. Kundu, A., Chatterjee, S., Sreenivas, T.V.: ‘Speech enhancement using intra-frame dependency in DCT domain’, in 16th European Signal Processing Conference (EUSIPCO 2008), Lausanne, Switzerland, August 25–29, 2008.
    8. 8)
    9. 9)
    10. 10)
    11. 11)
    12. 12)
    13. 13)
    14. 14)
    15. 15)
      • 13. Aroudi, A., Veisi, H., Sameti, H.: ‘Speech enhancement based on hidden Markov model with discrete cosine transform coefficients using Laplace and Gaussian distributions’. 2012 11th Int. Conf. on Information Science, Signal Processing and their Applications (ISSPA), 2012.
    16. 16)
      • 30. (PESQ), P.E.o.S.Q.: ‘An objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs’. ITU-T Recommendation, 2001, p. 862.
    17. 17)
    18. 18)
      • 27. Wiener, N.: ‘Extrapolation, interpolation, and smoothing of stationary time series’ (The MIT Press, Cambridge, MA, 1949).
    19. 19)
    20. 20)
    21. 21)
      • 18. Bishop, C.M.: ‘Pattern recognition and machine learning (information science and statistics)’ (Springer-Verlag, Inc., New York, 2006).
    22. 22)
    23. 23)
    24. 24)
    25. 25)
    26. 26)
      • 15. Black, A.W., Zen, H., Tokuda, K.: ‘Statistical parametric speech synthesis’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2007. ICASSP 2007, 2007.
    27. 27)
    28. 28)
    29. 29)
      • 17. Dempster, A.P., Laird, N.M., Rubin, D.B.: ‘Maximum likelihood from incomplete data via the EM algorithm’, J. R. Stat. Soc. Ser. B (Methodol.), 1977, 39, (1), pp. 138.
    30. 30)
    31. 31)
    32. 32)
      • 22. Papoulis, A., Pillai, S.U.: ‘Probability, random variables and stochastic processes’ (Tata McGraw-Hill Education, 2002).
    33. 33)
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2014.0032
Loading

Related content

content/journals/10.1049/iet-spr.2014.0032
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address