http://iet.metastore.ingenta.com
1887

Voicing detection based on adaptive aperiodicity thresholding for speech enhancement in non-stationary noise

Voicing detection based on adaptive aperiodicity thresholding for speech enhancement in non-stationary noise

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Signal Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

In this study, the authors present a novel voicing detection algorithm which employs the well-known aperiodicity measure to detect voiced speech in signals contaminated with non-stationary noise. The method computes a signal-adaptive decision threshold which takes into account the current noise level, enabling voicing detection by direct comparison with the extracted aperiodicity. This adaptive threshold is updated at each frame by making a simple estimate of the current noise power, and thus is adapted to fluctuating noise conditions. Once the aperiodicity is computed, the method only requires a small number of operations, and enables its implementation in challenging devices (such as hearing aids) if an efficient approximation of the difference function is employed to extract the aperiodicity. Evaluation over a database of speech sentences degraded by several types of noise reveals that the proposed voicing classifier is robust against different noises and signal-to-noise ratios. In addition, to evaluate the applicability of the method for speech enhancement, a simple F 0-based speech enhancement algorithm integrating the proposed classifier is implemented. The system is shown to achieve competitive results, in terms of objective measures, when compared with other well-known speech enhancement approaches.

References

    1. 1)
      • 1. Ahmadi, S., Spanias, A.: ‘Cepstrum-based pitch detection using a new statistical V/UV classification algorithm’, IEEE Trans. Speech Audio Process., 1999, 7, (3), pp. 333338 (doi: 10.1109/89.759042).
    2. 2)
      • 2. Nehorai, A., Porat, B.: ‘Adaptive comb filtering for harmonic signal enhancement’, IEEE Trans. Acoust. Speech Signal Process., 1986, 34, (5), pp. 11241138 (doi: 10.1109/TASSP.1986.1164952).
    3. 3)
      • 3. George, E., Smith, M.: ‘Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model’, IEEE Trans. Speech Audio Process., 1997, 5, (5), pp. 389406 (doi: 10.1109/89.622558).
    4. 4)
      • 4. Boll, S.: ‘Suppression of acoustic noise in speech using spectral subtraction’, IEEE Trans. Acoust. Speech Signal Process., 1979, 27, (2), pp. 113120 (doi: 10.1109/TASSP.1979.1163209).
    5. 5)
      • 5. Ephraim, Y., Malah, D.: ‘Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator’, IEEE Trans. Acoust. Speech Signal Process., 1984, 32, (6), pp. 11091121 (doi: 10.1109/TASSP.1984.1164453).
    6. 6)
      • 6. Ephraim, Y., Van Trees, H.: ‘A signal subspace approach for speech enhancement’, IEEE Trans. Speech Audio Process., 1995, 3, (4), pp. 251266 (doi: 10.1109/89.397090).
    7. 7)
      • 7. Qi, Y., Hunt, B.: ‘Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier’, IEEE Trans. Speech Audio Process., 1993, 1, (2), pp. 250255 (doi: 10.1109/89.222883).
    8. 8)
      • 8. Kawahara, H., Masuda-Katsuse, I., de Cheveigné, A.: ‘Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds’, Speech Commun., 1999, 27, (3–4), pp. 187207 (doi: 10.1016/S0167-6393(98)00085-5).
    9. 9)
      • 9. Boersma, P.: ‘Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound’. Proc. Institute of Phonetic Sciences, Amsterdam, The Netherlands, 1993, vol. 17, pp. 97110.
    10. 10)
      • 10. de Cheveigné, A., Kawahara, H.: ‘Yin, a fundamental frequency estimator for speech and music’, J. Acoust. Soc. Am., 2002, 111, (4), pp. 19171930 (doi: 10.1121/1.1458024).
    11. 11)
      • 11. Kobatake, H.: ‘Optimization of voiced/unvoiced decisions in nonstationary noise environments’, IEEE Trans. Acoust. Speech Signal Process., 1987, 35, (1), pp. 918 (doi: 10.1109/TASSP.1987.1165034).
    12. 12)
      • 12. ETSI ES 202 211 V1.1.1: ‘Speech processing, transmission and quality aspects (STQ); distributed speech recognition; extended front-end feature extraction algorithm; compression algorithms; back-end speech reconstruction algorithm’, 2003.
    13. 13)
      • 13. Nakatani, T., Amano, S., Irino, T., Ishizuka, K., Kondo, T.: ‘A method for fundamental frequency estimation and voicing decision: application to infant utterances recorded in real acoustical environments’, Speech Commun., 2008, 50, (3), pp. 203214 (doi: 10.1016/j.specom.2007.09.003).
    14. 14)
      • 14. Beritelli, F., Casale, S., Russo, S., Serrano, S.: ‘Adaptive V/UV speech detection based on characterization of background noise’, EURASIP J. Audio, Speech, Music Process., 2009, Article ID 965436 (doi: 10.1155/2009/965436).
    15. 15)
      • 15. Le Roux, J., Kameoka, H., Ono, N., de Cheveigné, A., Sagayama, S.: ‘Single and multiple F0 contour estimation through parametric spectrogram modeling of speech in noisy environments’, IEEE Trans. Audio, Speech, Lang. Process., 2007, 15, (4), pp. 11351145 (doi: 10.1109/TASL.2007.894510).
    16. 16)
      • 16. Cabañas-Molero, P., Ruiz-Reyes, N., Vera-Candeas, P., Maldonado-Bascon, S.: ‘Low-complexity F0-based speech/nonspeech discrimination approach for digital hearing aids’, Multimedia Tools Appl., 2011, 54, (2), pp. 291319 (doi: 10.1007/s11042-010-0523-1).
    17. 17)
      • 17. Martin, R.: ‘Noise power spectral density estimation based on optimal smoothing and minimum statistics’, IEEE Trans. Speech Audio Process., 2001, 9, (5), pp. 504512 (doi: 10.1109/89.928915).
    18. 18)
      • 18. Rabiner, L.: ‘A tutorial on hidden Markov models and selected applications in speech recognition’, Proc. IEEE, 1989, 77, (2), pp. 257286 (doi: 10.1109/5.18626).
    19. 19)
      • 19. Hu, Y., Loizou, P.: ‘Subjective comparison of speech enhancement algorithms’. Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Toulouse, France, May 2006, pp. 153156.
    20. 20)
      • 20. Cho, E., Smith, J.O., Widrow, B.: ‘Exploiting the harmonic structure for speech enhancement’. Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Kyoto, Japan, March 2012, pp. 45694572.
    21. 21)
      • 21. Kamath, S., Loizou, P.: ‘A multi-band spectral subtraction method for enhancing speech corrupted by colored noise’. Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Orlando, FL, USA, May 2002, pp. 41604164.
    22. 22)
      • 22. Ealey, D., Kelleher, H., Pearce, D.: ‘Harmonic tunneling: tracking non-stationary noises during speech’. Proc. Seventh European Conf. Speech Communication and Technology, Aalborg, Denmark, September 2001, pp. 437440.
    23. 23)
      • 23. Ruiz-Reyes, N., Vera-Candeas, P., Muñoz, J., García-Galán, S., Cañadas, F.: ‘New speech/music discrimination approach based on fundamental frequency estimation’, Multimedia Tools Appl., 2009, 41, (2), pp. 253286 (doi: 10.1007/s11042-008-0228-x).
    24. 24)
      • 24. ITU-T Recommendation P.862: ‘Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs’, 2000.
    25. 25)
      • 25. Hu, Y., Loizou, P.: ‘Evaluation of objective quality measures for speech enhancement’, IEEE Trans. Audio, Speech, Lang. Process., 2008, 16, (1), pp. 229238 (doi: 10.1109/TASL.2007.911054).
    26. 26)
      • 26. Loizou, P.: ‘Speech quality assessment’, in Weisi, L., et al (Eds.): ‘Multimedia analysis, processing and communications’ (Springer Verlag, 2011), pp. 623654.
    27. 27)
      • 27. Lu, Y., Loizou, P.: ‘A geometric approach to spectral subtraction’, Speech Commun., 2008, 50, (6), pp. 453466 (doi: 10.1016/j.specom.2008.01.003).
    28. 28)
      • 28. ETSI ES 202 050 V1.1.5: ‘Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms’, 2007.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2012.0224
Loading

Related content

content/journals/10.1049/iet-spr.2012.0224
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address