http://iet.metastore.ingenta.com
1887

Spoken-word recognition using dynamic features analysed by two-dimensional cepstrum

Spoken-word recognition using dynamic features analysed by two-dimensional cepstrum

For access to this article, please select a purchase option:

Buy article PDF
$19.95
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IEE Proceedings I (Communications, Speech and Vision) — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

In the paper, two-dimensional cepstrum (TDC) analysis and its application to word and monosyllable recognition are described. The TDC can simultaneously represent several different kinds of information contained in the speech waveform: static and dynamic features, as well as global and fine frequency structure. Noise reduction and speech enhancement can be easily performed using the TDC. Using word and monosyllable recognition experiments based on dynamic programming (DP) matching of a time sequence of the TDC, it is confirmed that the global static features (spectral envelope) and global dynamic features are both effective for speech recognition. A speaker-independent (noisy) word recognition algorithm is also proposed which recognises the words based on the similarity of dynamic features. The algorithm employs linear matching instead of DP nonlinear matching, requires a small amount of memory, and shows high speed and high accuracy in recognition. At present, the recognition rate is 89.0% at ∞ dB and 70.0% at 0 dB signal-to-noise ratio.

References

    1. 1)
      • S. Furui . On the role of spectral transition for speech perception. J. Acoust. Soc. Am. , 4 , 1016 - 1025
    2. 2)
      • Y. Thokura . Speech segment perception in continuous speech. Trans. Tech. Group Hearing Acoust. Soc. Japan
    3. 3)
      • C. Chan , K.W. Ng . Separation of fricatives from aspirated plosives by means of temporal spectral variation. IEEE Trans. , 4 , 1130 - 1137
    4. 4)
      • S. Furui . Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. , 1 , 52 - 59
    5. 5)
      • Roucos, S., Dunham, M.O.: `A stochastic segment model for phoneme-based continuous speech recognition', Proceedings of ICASSP87, 19871, 3.3.1, p. 73–76.
    6. 6)
      • Gupta, V.N., Lennig, M., Mermelstein, P.: `Integration of acoustic information in a large vocabulary word recognition', Proceedings of ICASSP87, 1987, 17.2.1, p. 697–700.
    7. 7)
      • Nishimura, M., Toshioka, K.: `HMM-based speech recognition using multi-dimensional multi-labelling', Proceedings of ICASSP87, 1987, 27.11.1, p. 1163–1166.
    8. 8)
      • F. Itakura . Minimum prediction residual principle applied to speech recognition. IEEE Trans. , 1 , 67 - 72
    9. 9)
      • S. Imai , T. Kitamura . Speech analysis using two-dimensional cepstrum. Trans. Inst. Electron. & Commun. Eng., Jpn. Part A , 12 , 1096 - 1103
    10. 10)
      • Ariki, Y., Kajimoto, K., Sakai, T.: `Acoustic noise reduction by two-dimensional spectral smoothing and amplitude transformation', Proceedings of ICASSP86, 1986, 3.5.1, p. 97–100.
    11. 11)
      • H. Sakoe , S. Chiba . Dynamic programming algorithm optimisation for spoken word recognition. IEEE Trans. , 1 , 43 - 49
    12. 12)
      • L.R. Rabiner , R.W. Schafer . (1978) , Digital processing of speech signals.
    13. 13)
      • T. Ukita , T. Nitta , S. Watanabe . Speakerindependent connected speech recognition using the statistical word identifier. Trans. Inst. Electron. & Commun. Eng. Jpn. Part D , 3 , 284 - 291
    14. 14)
      • L.R. Rabiner , S.E. Levinson , A.E. Rosenberg , J.G. Wilpon . Speaker–independent recognition of isolated words using clustering techniques. IEEE Trans. , 4 , 336 - 349
http://iet.metastore.ingenta.com/content/journals/10.1049/ip-i-2.1989.0017
Loading

Related content

content/journals/10.1049/ip-i-2.1989.0017
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address