Sparse representation-based quasi-clean speech construction for speech quality assessment under complex environments

Sparse representation-based quasi-clean speech construction for speech quality assessment under complex environments

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Signal Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

A non-intrusive speech quality assessment method for complex environments was proposed. In the proposed approach, a new sparse representation-based speech reconstruction algorithm was presented to acquire the quasi-clean speech from the noisy degraded signal. Firstly, an over-complete dictionary of the clean speech power spectrum was learned by the K-singular value decomposition algorithm. Then in the sparse representation stage, the stopping residue error was adaptively achieved according to the estimated cross-correlation and the noise spectrum which was adjusted by a posteriori SNR-weighted factor, and the orthogonal matching pursuit approach was applied to reconstruct the clean speech spectrum from the noisy speech. The quasi-clean speech was considered as the reference to a modified PESQ perceptual model, and the mean opinion score of the noisy degraded speech was achieved via the distortions estimation between the quasi-clean speech and the degraded speech. Experimental results show that the proposed approach obtains a correlation coefficient of 0.925 on NOIZEUS complex environment database, which is 99% similar to the performance of the intrusive standard ITU-T PESQ, and 7.1% outperforms non-intrusive standard ITU-T P.563.


    1. 1)
      • 1. Gierlich, H., Heute, U., Moeller, S.: ‘Advances in perceptual modeling of speech quality in telecommunications’. Proc. ITG Sym. on Speech Communication, Erlangen, 2014, pp. 14.
    2. 2)
      • 2. Abhijit, K., Arun, K., Patney, R.K.: ‘A multiresolution model of auditory excitation pattern and its application to objective evaluation of perceived speech quality’, IEEE Trans. Audio Speech Lang Process., 2006, 14, (6), pp. 19121923.
    3. 3)
      • 3. Wang, J., XIE, X., LI, J.X., et al: ‘Research on audio quality evaluation standards’, Inf. Technol. Standardiz., 2014, 3, pp. 3946(in Chinese).
    4. 4)
      • 4. Loizou, P.C.: ‘Speech enhancement: theory and practice’ (CRC Press, FL, 2013), pp. 482483.
    5. 5)
      • 5. Tan, X.H., Xu, K., Qin, J.W.: ‘Objective evaluation method of speech quality based on auditory perceptual properties’, J. Southwest Jiaotong Univ., 2013, 48, (4), pp. 756760 (in Chinese).
    6. 6)
      • 6. ITU-T Rec. P.862: ‘Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs’, 2001,
    7. 7)
      • 7. Ludovic, M., Jens, B., Martin, K.: ‘P.563-The ITU-T standard for single-ended speech quality assessment’, IEEE Trans. Audio Speech Lang Process., 2006, 14, (6), pp. 19241934.
    8. 8)
      • 8. Rajesh, K.D., Arun, K.: ‘Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech’, IET Signal Process., 2015, 9, (9), pp. 638646.
    9. 9)
      • 9. Falk, T.H., Cosentino, S., Santos, J., et al: ‘Non-intrusive objective speech quality and intelligibility prediction for hearing instruments in complex listening environments’. Proc. IEEE Int. Conf. Acoustics, Speech Signal Process., Vancouver, 2013, pp. 78207824.
    10. 10)
      • 10. Manish, N., Wei-si, L., Ian, V.M., et al: ‘Non-intrusive speech quality assessment with support vector regression’, Adv. Multimedia Model., 2010, 59, pp. 325335.
    11. 11)
      • 11. Wang, J., Zhao, S.H., Xie, X., et al: ‘Mapping methods for output-based objective speech quality assessment using data mining’, J. Central South Univ., 2014, 21, (5), pp. 19191926.
    12. 12)
      • 12. Zhou, W.L., He, Q.H.: ‘Non-intrusive speech quality objective evaluation in high-noise environments’. Proc. IEEE China Sum and Int. Conf. on Signal and Information Processing, Chengdu, 2015, pp. 5054.
    13. 13)
      • 13. Evans, N., Mason, J., Liu, W., et al: ‘An assessment on the fundamental limitations of spectral subtraction’. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Toulous, 2006, pp. 145148.
    14. 14)
      • 14. Hilman, F., Koji, I., Koichi, S.: ‘Feature normalization based on non-extensive statistics for speech recognition’, Speech Commun., 2013, 55, (5), pp. 587599.
    15. 15)
      • 15. Wohlberg, B.: ‘Efficient algorithms for convolutional sparse representations’, IEEE Trans. Image Process., 2016, 25, (1), pp. 301315.
    16. 16)
      • 16. Zou, F., Li, X., Roberto, T.: ‘Inverse synthetic aperture radar imaging based on sparse signal processing’, J. Central South Univ., 2011, 18, (5), pp. 16091613.
    17. 17)
      • 17. He, Y.J., Han, J.Q., Deng, S.M., et al: ‘A solution to residual noise in speech denoising with sparse representation’. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Kyoto, 2012, pp. 46534656.
    18. 18)
      • 18. Sun, L.H., Yang, Z.: ‘Speech enhancement based on data driven dictionary and sparse representation’, Signal Process., 2011, 27, (12), pp. 17931800.
    19. 19)
      • 19. Zhao, N., Xu, X., Yang, Y.: ‘Sparse representations for speech enhancement’, Chin. J. Electron., 2011, 19, (2), pp. 268272.
    20. 20)
      • 20. Zhao, Y.P., Zhao, X.H., Wang, B.: ‘A speech enhancement method employing sparse representation of power spectral density’, J. Inf. Comput. Sci., 2013, 10, (6), pp. 17051714.
    21. 21)
      • 21. Aharon, M., Elad, M., Bruckstein, A.: ‘K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation’, IEEE Trans. Audio Speech Lang Process., 2006, 54, (11), pp. 43114322.
    22. 22)
      • 22. Chang, L.H., Wu, J.Y.: ‘An improved RIP-based performance guarantee for sparse signal recovery via orthogonal matching pursuit’, IEEE Trans. Inf. Theory, 2014, 60, (9), pp. 57025715.
    23. 23)
      • 23. Rangachari, S., Loizou, P.: ‘A noise estimation algorithm for highly nonstationary environments’, Speech Commun., 2006, 48, (2), pp. 220231.
    24. 24)
      • 24. Berouti, M., Schwartz, M., Makhoul, J.: ‘Enhancement of speech corrupted by acoustic noise’. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Washington, 1979, pp. 44784482.
    25. 25)
      • 25. Sigg, C.D., Dikk, T., Buhmann, J.M.: ‘Speech enhancement using generative dictionary learning’, IEEE Trans. Audio Speech Lang Process., 2012, 20, (6), pp. 16981712.
    26. 26)
      • 26. NOIZEUS speech corpus’:, accessed 20 September2015.
    27. 27)
      • 27. ITU-T. ITU-T P: ‘Supplement-23 speech corpus [EB/OL].[2015–04–12].
    28. 28)
      • 28. ITU-T Rec. P.800: ‘Methods for subjective determination of transmission quality’, 1996.
    29. 29)
      • 29. K-SVD ToolBox’:, accessed 1 February2016.
    30. 30)
      • 30. Martin, R.: ‘Noise power spectral density estimation based on optimal smoothing and minimum statistics’, IEEE Trans. Audio Speech Lang Process., 2001, 9, (5), pp. 504512.

Related content

This is a required field
Please enter a valid email address