Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Performance improvement of monaural speech separation system using image analysis techniques

This research work proposes an image analysis-based algorithm to enhance the time–frequency (TF) mask obtained in the initial segmentation of CASA-based monaural speech separation system to improve speech quality and intelligibility. It consists of labelling the initial segmentation mask, boundary extraction, active pixel detection and eliminating the non-active pixels related to noise. In labelling, the TF mask obtained is labelled as periodicity pixel ( P ) matrix and non-periodicity pixel ( NP ) matrix. Next speech boundaries are created by connecting all the possible nearby P and NP matrix. Some speech boundary may include noisy TF units as holes; these holes are treated using the proposed algorithm. The proposed algorithm is evaluated with the quality and intelligibility measures such as signal to noise ratio (SNR), perceptual evaluation of speech quality, , , coherence speech intelligibility index (CSII), normalised covariance metric (NCM), and short-time objective intelligibility (STOI). The experimental results show that the proposed algorithm improves the speech quality by increasing the SNR with an average value of 9.91 dB and reduces the by an average value of 25.6% and also improves the speech intelligibility in terms of CSII, NCM, and STOI when compared with the input noisy speech mixture

References

    1. 1)
      • 23. Wang, D.L., Brown, G.J.: ‘Separation of speech from interfering sounds based on oscillatory correlation’, IEEE Trans. Neural Netw., 1999, 10, (3), pp. 684697.
    2. 2)
      • 16. Hu, G., Wang, D.: ‘Monaural speech segregation based on pitch tracking and amplitude modulation’, IEEE Trans. Neural Netw., 2004, 15, (5), pp. 11351150.
    3. 3)
      • 21. Patterson, R.D., Nimmo-Smith, I., Holdsworth, J., et al: ‘An efficient auditory filterbank based on the gammatone function’, MRC Applied Psychology Unit, 1988.
    4. 4)
      • 24. Shoba, S., Rajavel, R.: ‘Adaptive energy threshold selection for monaural speech separation’. Proc. IEEE Int. Conf. Communication and Signal Processing, Melmaruvathur, India, April 2017.
    5. 5)
      • 4. Ephraim, Y., Trees, H.L.: ‘A signal subspace approach for speech enhancement’, IEEE Trans. Speech Audio Process., 1995, 3, (4), pp. 251266.
    6. 6)
      • 14. Brown, G.J., Wang, D.L.: ‘Separation of speech by computational auditory scene analysis’, in Benesty, J., Makino, S., Chen, J. (EDS.): ‘Speech enhancement’ (Springer, New York, 2005), pp. 371402.
    7. 7)
      • 18. Boll, S.F.: ‘Suppression of acoustic noise in speech using spectral subtraction’, IEEE Trans. Acoust. Speech Signal Process., 1979, 27, pp. 113120.
    8. 8)
      • 15. Wang, D.: ‘Tandem algorithm for pitch estimation and voiced speech segregation’, IEEE Trans. Audio, Speech, Lang. Process., 2012, 18, (8), pp. 20672079.
    9. 9)
      • 10. Brown, G.J., Cooke, M.P.: ‘Computational auditory scene analysis’, Comput. Speech Lang., 1994, 8, (4), pp. 297336.
    10. 10)
      • 17. Hu, G., Wang, D.: ‘Auditory segmentation based on onset and offset analysis’, IEEE Trans. Audio Speech Lang. Process., 2007, 15, (2), pp. 396405.
    11. 11)
      • 22. Meddis, R.: ‘Simulation of auditory-neural transduction: further studies’, J. Acoust. Soc. Am., 1988, 83, (3), pp. 10561063.
    12. 12)
      • 9. Rabiee, A., Setayeshi, S., Lee, S.Y.: ‘CASA: biologically inspired approaches for auditory scene analysis’, Natural Intelligence, 2012, 1, (2), pp. 5058.
    13. 13)
      • 6. Wang, D.L., Kun, H.: ‘Towards generalizing classification based speech separation’, IEEE Trans. Audio, Speech Lang. Process., 2013, 21, (1), pp. 68177.
    14. 14)
      • 2. Zhang, X., Wang, Z., Wang, D.: ‘A speech enhancement algorithm by iterating single- and multi-microphone processing and its application to robust ASR’. Proc. ICASSP, New Orleans, USA, March 2017, pp. 276280.
    15. 15)
      • 7. Hu, G., Wang, D.: ‘An auditory scene analysis approach to monaural speech segregation’, in Hansler, E., Schmidt, G. (EDS.): ‘Topics in acoustic echo and noise control’ (Springer Press, Heidelberg, 2006), pp. 485515.
    16. 16)
      • 5. Sameti, H., Sheikhzadeh, H., Deng, L., et al: ‘HMM-based strategies for enhancement of speech signals embedded in non-stationary noise’, IEEE Trans. Speech Audio Process., 1998, 6, (5), pp. 445455.
    17. 17)
      • 1. Jensen, J., Hansen, J.H.L.: ‘Speech enhancement using a constrained iterative sinusoidal model’, IEEE Trans. Speech Audio Process., 2001, 9, (7), pp. 731740.
    18. 18)
      • 13. Donald, S., Wang, D.: ‘Time-Frequency masking in the complex domain for speech dereverberation and denoising’, IEEE/ACM Trans. Audio, Speech, Lang. Process., 2017, 25, (7), pp. 14921501.
    19. 19)
      • 3. Hyvarinen, A., Karhunen, J., Oja, E.: ‘Independent component analysis’ (Wiley Press, New York, 2001).
    20. 20)
      • 12. Rabiee, A., Setayeshi, S., Lee, S.Y.: ‘A harmonic-based biologically inspired approach to monaural speech separation’, IEEE Signal Process. Lett., 2012, 19, (9), pp. 559562.
    21. 21)
      • 29. Pichevar, R., Rouat, J.: ‘A quantitative evaluation of a bio-inspired sound segregation technique for two- and three-source mixtures’, in Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (EDS.): ‘Nonlinear speech modeling and applications’ (Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2005), pp. 430435.
    22. 22)
      • 28. Ma, J., Hu, Y., Loizou, P.: ‘Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions’, J. Acoust. Soc. Am., 2009, 125, (5), pp. 33873405.
    23. 23)
      • 8. Weintraub, M.: ‘A theory and computational model of auditory monaural sound separation’. PhD dissertation, Stanford University, Stanford, CA, 1985.
    24. 24)
      • 26. Kates, J.M., Arehart, K.H.: ‘Coherence and the speech intelligibility index’, J. Acoust. Soc. Am., 2005, 17, (4), pp. 22242237.
    25. 25)
      • 19. Hu, K., Wang, D.: ‘Unvoiced speech segregation from non-speech interference via CASA and spectral subtraction’, IEEE Trans. Audio Speech Lang. Process., 2011, 19, (6), pp. 16001609.
    26. 26)
      • 27. Taal, C.H., Hendriks, R.C., Heusdens, R., et al: ‘An algorithm for intelligibility prediction of time frequency weighted noisy speech’, IEEE Trans. Audio Speech Lang. Process., 2011, 19, (7), pp. 21252136.
    27. 27)
      • 25. Cooke, M.P.: ‘Modeling auditory processing and organization’. PhD dissertation, University of Sheffield, Sheffield, UK, 1993.
    28. 28)
      • 20. Yu, W., Jiajun, L., Ning, C., et al: ‘Improved monaural speech segregation based on computational auditory scene analysis’, J. Audio Speech Music Process., 2013, 2, doi: 10.1186/1687-4722-2013-2, pp. 115.
    29. 29)
      • 11. Harish, N., Rajavel, R.: ‘Monaural speech separation system based on optimum soft mask’. Proc. IEEE Int. Conf. Computational Intelligence and Computing Research, Coimbatore, India, December 2014, pp. 15.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2017.0375
Loading

Related content

content/journals/10.1049/iet-spr.2017.0375
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address