Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech

A multi-resolution framework using auditory perception-based wavelet packet transform is invoked in multi-resolution auditory model (MRAM) and used for non-intrusive objective speech quality estimation. The MRAM provides a detailed time-frequency modelling of the human auditory system compared to earlier models that have been used for non-intrusive speech quality estimation. The objective Mean Opinion Score (MOS) of a degraded narrowband speech utterance has been estimated by Gaussian Mixture Model (GMM) probabilistic approach using MRAM-based feature vector. Additionally, a recent auditory model (Lyons’ auditory model) based features, mel-frequency cepstral coefficients (MFCC), and line spectral frequencies (LSF) features have also been used independently for comparison of the performance of MRAM features. The combination of MFCC and LSF features with MRAM features for non-intrusive speech quality estimation using GMM probabilistic approach has been proposed and investigated. The performance of these feature vectors has been evaluated and compared with ITU-T Recommendation P.563 and a recent published work by computing correlation coefficient and root-mean-square error between the subjective MOS and the estimated objective MOS. It is found that the proposed method that uses a combination of MRAM features, MFCC, and LSF feature vectors for non-intrusive speech quality performs better than both the other algorithms.

References

    1. 1)
    2. 2)
    3. 3)
    4. 4)
      • 1. ITU-T Recommendation P.800: ‘Methods for subjective determination of transmission quality’. 1996.
    5. 5)
      • 9. Campbell, D., Jones, E., Glavin, M.: ‘Comparison of temporal masking models for audio quality assessment’. Proc. Irish Signals and Systems Conf., Derry, UK, September 2007.
    6. 6)
      • 17. Narwaria, M., Lin, W., McLoughlin, I.V., et al: ‘Non-intrusive speech quality assessment with support vector regression’. 16th Int. Conf. on Advances in Multimedia Modeling, Berlin Heidelberg, Springer-Verlag, 2010, vol. 5916, pp. 325335.
    7. 7)
      • 19. Bozkurt, E., Erzin, E., Erdem, C.E., et al: ‘Use of line spectral frequencies for emotion recognition from speech’. Proc. IEEE Int. Conf. on Pattern Recognition, Turkey, 2010, pp. 37083711.
    8. 8)
    9. 9)
      • 15. Quatieri, T.F.: ‘Discrete-time speech signal processing: principles and practice’ (Pearson Education, 2009, 3rd Impression, 2009), Ch. 9, subsection 9.4.5, pp. 457461, Pearson.
    10. 10)
      • 23. http://www.utdallas.edu/~loizou/speech/noizeus, accessed February 2009.
    11. 11)
    12. 12)
    13. 13)
    14. 14)
      • 7. Cosi, P., Pasquin, S., Zovato, E.: ‘Auditory modeling techniques for robust pitch extraction and noise reduction’. Proc. Int. Conf. on Spoken Language Processing, ISCA, 1998, vol. 7, pp. 8072810.
    15. 15)
      • 20. Dempster, A.P., Laird, N., Rubin, D.B.: ‘Maximum likelihood from incomplete data via the EM algorithm’, J. R. Stat. Soc. B (Methodological), 1977, 39, (1), pp. 138.
    16. 16)
      • 14. Rabiner, L.R., Sambur, M.R.: ‘Voiced-unvoiced-silence detection using the Itakura LPC distance measure’. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, New Jersey, 1977, vol. 2, pp. 323326.
    17. 17)
    18. 18)
      • 22. ITU-T Recommendation P. Supplement 23: ‘ITU-T coded-speech databases’. 1998.
    19. 19)
      • 8. Zwicker, E., Fastl, H.: ‘Psycho-acoustics’ (Springer, 1990, 2nd edn.), Springer-Verlag.
    20. 20)
    21. 21)
    22. 22)
      • 18. Hasan, M.R., Jamil, M., Rabbani, M.G., et al: ‘Speaker identification using mel frequency cepstral coefficients’. Proc. Third Int. Conf. on Electrical & Computer Engineering (ICECE), Dhaka, Bangladesh, 2004, pp. 565568.
    23. 23)
      • 3. ITU-T Recommendation P.563: ‘Single ended method for objective speech quality assessment in narrow-band telephony applications’. 2004.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2014.0214
Loading

Related content

content/journals/10.1049/iet-spr.2014.0214
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address