http://iet.metastore.ingenta.com
1887

Modelling speech signals using formant frequencies as an intermediate representation

Modelling speech signals using formant frequencies as an intermediate representation

For access to this article, please select a purchase option:

Buy article PDF
$19.95
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Signal Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Multiple-level segmental hidden Markov models (M-SHMMs) in which the relationship between symbolic and acoustic representations of speech is regulated by a formant-based intermediate representation are considered. New TIMIT phone recognition results are presented, confirming that the theoretical upper-bound on performance is achieved provided that either the intermediate representation or the formant-to-acoustic mapping is sufficiently rich. The way in which M-SHMMs exploit formant-based information is also investigated, using singular value decomposition of the formant-to-acoustic mappings and linear discriminant analysis. The analysis shows that if the intermediate layer contains information which is linearly related to the spectral representation, that information is used in preference to explicit formant frequencies, even though the latter are useful for phone discrimination. In summary, although these results confirm the utility of M-SHMMs for automatic speech recognition, they provide empirical evidence of the value of nonlinear formant-to-acoustic mappings.

References

    1. 1)
      • Tokuda, K., Zen, H., Kitamura, T.: `Trajectory modelling based on HMMs with the explicit relationship between static and dynamic features', Proc. Eurospeech'03, September 2003, Geneva, Switzerland.
    2. 2)
      • J. Glass . A probabilistic framework for segment-based speech recognition. Comput. Speech Lang. , 137 - 152
    3. 3)
    4. 4)
      • Russell, M.J.: `A segmental HMM for speech pattern modelling', Proc. IEEE-ICASSP, April 1993, Minneapolis, MN, p. 499–502.
    5. 5)
      • Gales, M.J.F., Young, S.J.: `Segmental hidden markov models', Proc. Eurospeech'93, September 1993, Berlin, Germany, p. 1579–1582.
    6. 6)
      • Digalakis, V.: `Segment-based stochastic models of spectral dynamics for continuous speech recognition', 1922, PhD, Boston University.
    7. 7)
      • Richards, H.B., Bridle, J.S.: `The HDM: a segmental hidden dynamic model of coarticulation', Proc. IEEE-ICASSP, March 1999, Phoenix, AZ, p. 357–360.
    8. 8)
      • O. Ghitza , M.M. Sondhi . Hidden Markov models with templates as non-stationary states: an application to speech recognition. Comput. Speech Lang. , 101 - 119
    9. 9)
      • A. Wiewiorka , D.M. Brookes . Exponential interpolation of states in a hidden Markov model. Proc. Inst. Acoust. , 9 , 201 - 208
    10. 10)
      • S.L. Lauritzen , A.C. Atkinson , R.J. Carroll , D.J. Hand , D.A. Pierce , D.M. Titterington . (1996) Graphical models.
    11. 11)
      • F.V. Jensen . (1996) An introduction to Bayesian networks.
    12. 12)
      • L. Deng , D. Braam . Context-dependent Markov model structured by locus equations: Applications to phonetic classification. J. Acoust. Soc. Am. , 6 , 2008 - 2025
    13. 13)
      • L. Deng . A dynamic, feature-based approach to the interface between phonology and phonetics for speech modelling and recognition. Speech Commun. , 4 , 288 - 323
    14. 14)
      • Gao, Y., Bakis, R., Huang, J., Zhang, B.: `Multistage coarticulation model combiningarticulatory, formant and cepstral features', Proc. Int. Conf. on Spoken Language, October 2000, Beijing, China, 1, p. 25–28.
    15. 15)
    16. 16)
      • Zhou, J., Seide, F., Deng, L.: `Coarticulation modelling by embedding a target-directed hidden trajectory model into HMM - modelling and training', Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, April 2003, Hong Kong, 1, p. 744–747.
    17. 17)
      • M.J. Russell , P.J.B. Jackson . A multiple-level linear/linear segmental HMM with a formant-based intermediate layer. Comput. Speech Lang. , 2 , 205 - 225
    18. 18)
      • Yu, D., Deng, L., Acero, A.: `Evaluation of a long-contextual-spen trajectory model and phonetic recognizer using A* lattice search', Proc. Interspeech, September 2005, Lisbon, Portugal.
    19. 19)
    20. 20)
      • C.M. Bishop . (1995) Neural networks for pattern recognition.
    21. 21)
      • Holmes, J.N.: US Patent US6292775, 2001.
    22. 22)
      • Holmes, J.N.: `Robust measurement of fundamental frequency and degree of voicing', Proc. Int. Conf. on Spoken Language, 30 November–4 December 1998, Sydney, Australia.
    23. 23)
      • J.N. Holmes , I.G. Mattingly , J.N. Shearme . Speech synthesis by rule. Lang. Speech , 127 - 143
    24. 24)
      • J.S. Garofolo , L.F. Lamel , W.M. Fisher , J.G. Fiscus , D.S. Pallett , N.L. Dahlgren , V. Zue . (1993) TIMIT acoustic-phonetic continuous speech corpus, LDC Catalog No.: LDC93S1.
    25. 25)
      • S.J. Young , J. Odell , D. Ollason , V. Valtchev , P. Woodland . (1997) The HTK Book.
    26. 26)
    27. 27)
      • W.J. Holmes , M.J. Russell . Probablistic-trajectory segmental HMMs. Comput. Speech Lang. , 1 , 3 - 37
    28. 28)
      • Lamel, L.F., Gauvain, J.L.: `High performance speaker-independent phone recognition using CDHMM', Proc. EUROSPEECH'93, September 1993, p. 121–124.
    29. 29)
      • Deng, L., Cui, X., Pruvenok, R., Huang, J., Momen, S., Chen, Y., Alwan, A.: `A database of vocal tract resonance trajectories for research in speech processing', Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2006), May 2006, Toulouse, France, p. I-369–I-372.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr_20060179
Loading

Related content

content/journals/10.1049/iet-spr_20060179
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address