http://iet.metastore.ingenta.com
1887

Robust speech recognition using harmonic features

Robust speech recognition using harmonic features

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Signal Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

In this study, the authors propose a speech recognition system using harmonic structure related information to detect harmonic features in noisy environment. The proposed algorithm first extracts the harmonic components contained inside the speech signals using sine function convolution. By setting the frequency of the sine function as equal to the fundamental frequency of speech signals, harmonic components can be extracted out. The reconstructed signal obtained by summing up the extracted harmonic components is found to have a high degree of correlation with the original signal. The extracted frame energy measure of the harmonic components has been further processed to become dynamic harmonic features and then used together with the European Telecommunications Standards Institute (ETSI) front-end processed mel-frequency cepstral coefficients (MFCC) feature or the perceptual linear prediction (PLP) feature in the speech recognition system. The proposed enhanced speech recognition system shows a better recognition rate over the ETSI front-end processed MFCC (or PLP)-based speech recognition system.

References

    1. 1)
      • 1. Irizarry, R.: ‘The additive sinusoidal plus residual model: A statistical analysis’. Proc. CNMAT, 1999.
    2. 2)
      • 2. Park, S., Kwon, W., Kwon, O., Kim, M.: ‘Short-time Fourier analysis via optimal harmonic FIR filters’, IEEE Trans. Signal Process., 2002, 45, pp. 15351542 (doi: 10.1109/78.599995).
    3. 3)
      • 3. Tabrikian, J., Dubnov, S., Dickalov, Y.: ‘Maximum a-posteriori probability pitch tracking in noisy environments using harmonic model’, IEEE Trans. Speech Audio Process., 2004, 12, pp. 7687 (doi: 10.1109/TSA.2003.819950).
    4. 4)
      • 4. Zavarehei, E., Vaseghi, S.: ‘Interpolation of lost speech segments using lp-hnm model with codebook post-processing’, IEEE Trans. Multimedia, 2008, 10, pp. 493502 (doi: 10.1109/TMM.2008.917345).
    5. 5)
      • 5. Griffin, D., Lim, J.: ‘Multiband excitation vocoder’, IEEE Trans. Acoust., Speech Signal Process., 2002, 36, pp. 12231235 (doi: 10.1109/29.1651).
    6. 6)
      • 6. Kondoz, A.: ‘Digital speech: coding for low bit rate communication systems’ (John Wiley & Sons Inc, 2004).
    7. 7)
      • 7. Codec, D.: Version 2, Inmarsat-M Specification, Inmarsat, 1991.
    8. 8)
      • 8. Gong, Y., Haton, J.: ‘Time domain harmonic matching pitch estimation using time dependent speech modeling’, IEEE Trans. Acoust., Speech Signal Process., 2003, 35, pp. 13861400 (doi: 10.1109/TASSP.1987.1165056).
    9. 9)
      • 9. Plapous, C., Marro, C., Scalart, P.: ‘Speech enhancement using harmonic regeneration’. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2005, vol. 1, pp. 157160.
    10. 10)
      • 10. Yu, A., Wang, H.: ‘New speech harmonic structure measure and it application to post speech enhancement’, IEEE Int. Conf. Acoust., Speech, Signal Process., 2004, 1, pp. I 729732.
    11. 11)
      • 11. Zavarehei, E., Vaseghi, S., Yan, Q.: ‘Noisy speech enhancement using harmonic noise model and codebook-based post-processing’, IEEE Trans. Audio, Speech, Lang. Process., 2007, 15, pp. 11941203 (doi: 10.1109/TASL.2007.894516).
    12. 12)
      • 12. Vaseghi, S., Zavarehei, E., Yan, Q.: ‘Speech bandwidth extension: extrapolations of spectral envelop and harmonicity quality of excitation’, IEEE Int. Conf. Acoust., Speech Signal Process., 2006, 3, pp. III 14.
    13. 13)
      • 13. Raza, D., Chan, C.: ‘Enhancing quality of celp coded speech via wideband extension by using voicing GMM interpolation and HNM re-synthesis’. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, IEEE, 2005, vol. 1.
    14. 14)
      • 14. Fukuda, T., Ichikawa, O., Nishimura, M.: ‘Long-term spectro-temporal and static harmonic features for voice activity detection’, IEEE J. Sel. Top. Signal Process., 2010, 4, pp. 834844 (doi: 10.1109/JSTSP.2010.2069750).
    15. 15)
      • 15. Xiao, X., Nickel, R.: ‘Speech enhancement with inventory style speech resynthesis’, IEEE Trans. Audio, Speech, Lang. Process., 2010, 18, pp. 12431257 (doi: 10.1109/TASL.2009.2031793).
    16. 16)
      • 16. Vera-Candeas, P., Ruiz-Reyes, N., López-Ferreras, F.: ‘Bark scale-based perceptual matching pursuit for improving sinusoidal audio modeling’, Digit. Signal Process., 2009, 19, pp. 229240 (doi: 10.1016/j.dsp.2008.10.001).
    17. 17)
      • 17. Huang, Q., Wang, D.: ‘Single-channel speech separation based on long-short frame associated harmonic model’, Digit. Signal Process., 2011, 21, pp. 497507 (doi: 10.1016/j.dsp.2011.02.003).
    18. 18)
      • 18. Lippmann, R.: ‘Speech recognition by machines and humans’, Speech Commun., 1997, 22, pp. 115 (doi: 10.1016/S0167-6393(97)00021-6).
    19. 19)
      • 19. Gu, L., Rose, K.: ‘Perceptual harmonic cepstral coefficients for speech recognition in noisy environment’. icassp, IEEE, 2001, pp. 125128.
    20. 20)
      • 20. De Cheveigne, A.: ‘Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing’, J. Acoust. Soc. Am., 1993, 93, pp. 32713290 (doi: 10.1121/1.405712).
    21. 21)
      • 21. Nishi, K., Ando, S.: ‘An optimal comb filter for time-varying harmonics extraction’, IEICE transactions on Fundamentals of Electronics, Commun. Comput. Sci., 1998, 81, pp. 16221627.
    22. 22)
      • 22. Lim, J., Oppenheim, A., Braida, L.: ‘Evaluation of an adaptive comb filtering method for enhancing speech degraded by white noise addition’, IEEE Trans. Acoust., Speech Signal Process., 2003, 26, pp. 354358 (doi: 10.1109/TASSP.1978.1163117).
    23. 23)
      • 23. Nehorai, A., Porat, B.: ‘Adaptive comb filtering for harmonic signal enhancement’, IEEE Trans. Acoust., Speech Signal Process., 2003, 34, pp. 11241138 (doi: 10.1109/TASSP.1986.1164952).
    24. 24)
      • 24. Jang, Y., Chicharo, J.: ‘Adaptive IIR comb filter for harmonic signal cancellation’, Int. J. Electron., 1993, 75, pp. 241250 (doi: 10.1080/00207219308907103).
    25. 25)
      • 25. King, B., Atlas, L.: ‘Coherent modulation comb filtering for enhancing speech in wind noise’. Proc. Int. Workshop on Acoustics Echo and Noise Control, 2008.
    26. 26)
      • 26. Schwartz, D., Howe, C., Purves, D.: ‘The statistical structure of human speech sounds predicts musical universals’, J. Neurosci., 2003, 23, pp. 7160.
    27. 27)
      • 27. Speech processing, transmission and quality aspects (stq); distributed speech recognition; extended front-end feature extraction algorithm; compression algorithms; back-end speech reconstruction algorithm, ETSI ES 202 211 v1.1.1, 2003.
    28. 28)
      • 28. Tabrikian, J., Dubnov, S., Dickalov, Y.: ‘Maximum a-posteriori probability pitch tracking in noisy environments using harmonic model’, IEEE Trans. Speech Audio Process., 2004, 12, pp. 7687 (doi: 10.1109/TSA.2003.819950).
    29. 29)
      • 29. Traunmuller, H., Eriksson, A.: ‘The frequency range of the voice fundamental in the speech of male and female adults’, PhD thesis, Manuscript, Department of Linguistics, University of Stockholm, (accessed May 8 2004) http://www.ling.su.se/staff/hartmut/aktupub.htm, 1994.
    30. 30)
      • 30. Titze, I., Martin, D.: ‘Principles of voice production’, Acoust. Soc. Am. J., 1998, 104, pp. 1148 (doi: 10.1121/1.424266).
    31. 31)
      • 31. Baken, R., Orlikoff, R.: ‘Clinical measurement of speech and voice’ (Singular Pub Group, 2000).
    32. 32)
      • 32. Bartle, R., Sherbert, D.: ‘Introduction to real analysis’ (Wiley New York, 1982), vol. 2.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2013.0094
Loading

Related content

content/journals/10.1049/iet-spr.2013.0094
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address