© The Institution of Engineering and Technology
Multiple-level segmental hidden Markov models (M-SHMMs) in which the relationship between symbolic and acoustic representations of speech is regulated by a formant-based intermediate representation are considered. New TIMIT phone recognition results are presented, confirming that the theoretical upper-bound on performance is achieved provided that either the intermediate representation or the formant-to-acoustic mapping is sufficiently rich. The way in which M-SHMMs exploit formant-based information is also investigated, using singular value decomposition of the formant-to-acoustic mappings and linear discriminant analysis. The analysis shows that if the intermediate layer contains information which is linearly related to the spectral representation, that information is used in preference to explicit formant frequencies, even though the latter are useful for phone discrimination. In summary, although these results confirm the utility of M-SHMMs for automatic speech recognition, they provide empirical evidence of the value of nonlinear formant-to-acoustic mappings.
References
-
-
1)
-
C.M. Bishop
.
(1995)
Neural networks for pattern recognition.
-
2)
-
M.J. Russell
.
Reducing computational load in segmental hidden Markov model decoding for speech recognition.
Electron. Lett.
,
25 ,
1408 -
1409
-
3)
-
A. Wiewiorka ,
D.M. Brookes
.
Exponential interpolation of states in a hidden Markov model.
Proc. Inst. Acoust.
,
9 ,
201 -
208
-
4)
-
Deng, L., Cui, X., Pruvenok, R., Huang, J., Momen, S., Chen, Y., Alwan, A.: `A database of vocal tract resonance trajectories for research in speech processing', Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2006), May 2006, Toulouse, France, p. I-369–I-372.
-
5)
-
Richards, H.B., Bridle, J.S.: `The HDM: a segmental hidden dynamic model of coarticulation', Proc. IEEE-ICASSP, March 1999, Phoenix, AZ, p. 357–360.
-
6)
-
J.S. Garofolo ,
L.F. Lamel ,
W.M. Fisher ,
J.G. Fiscus ,
D.S. Pallett ,
N.L. Dahlgren ,
V. Zue
.
(1993)
TIMIT acoustic-phonetic continuous speech corpus, LDC Catalog No.: LDC93S1.
-
7)
-
J.N. Holmes ,
I.G. Mattingly ,
J.N. Shearme
.
Speech synthesis by rule.
Lang. Speech
,
127 -
143
-
8)
-
Holmes, J.N.: `Robust measurement of fundamental frequency and degree of voicing', Proc. Int. Conf. on Spoken Language, 30 November–4 December 1998, Sydney, Australia.
-
9)
-
Gales, M.J.F., Young, S.J.: `Segmental hidden markov models', Proc. Eurospeech'93, September 1993, Berlin, Germany, p. 1579–1582.
-
10)
-
O. Ghitza ,
M.M. Sondhi
.
Hidden Markov models with templates as non-stationary states: an application to speech recognition.
Comput. Speech Lang.
,
101 -
119
-
11)
-
S.L. Lauritzen ,
A.C. Atkinson ,
R.J. Carroll ,
D.J. Hand ,
D.A. Pierce ,
D.M. Titterington
.
(1996)
Graphical models.
-
12)
-
Yu, D., Deng, L., Acero, A.: `Evaluation of a long-contextual-spen trajectory model and phonetic recognizer using A* lattice search', Proc. Interspeech, September 2005, Lisbon, Portugal.
-
13)
-
Tokuda, K., Zen, H., Kitamura, T.: `Trajectory modelling based on HMMs with the explicit relationship between static and dynamic features', Proc. Eurospeech'03, September 2003, Geneva, Switzerland.
-
14)
-
M. Ostendorf ,
V.V. Digalakis ,
O.A. Kimball
.
From HMM's to segmental models: a unified view of stochastic modeling for speech recognition.
IEEE Trans. Speech Audio Process.
,
5 ,
360 -
378
-
15)
-
L. Deng ,
D. Braam
.
Context-dependent Markov model structured by locus equations: Applications to phonetic classification.
J. Acoust. Soc. Am.
,
6 ,
2008 -
2025
-
16)
-
Gao, Y., Bakis, R., Huang, J., Zhang, B.: `Multistage coarticulation model combiningarticulatory, formant and cepstral features', Proc. Int. Conf. on Spoken Language, October 2000, Beijing, China, 1, p. 25–28.
-
17)
-
Lamel, L.F., Gauvain, J.L.: `High performance speaker-independent phone recognition using CDHMM', Proc. EUROSPEECH'93, September 1993, p. 121–124.
-
18)
-
Zhou, J., Seide, F., Deng, L.: `Coarticulation modelling by embedding a target-directed hidden trajectory model into HMM - modelling and training', Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, April 2003, Hong Kong, 1, p. 744–747.
-
19)
-
Russell, M.J.: `A segmental HMM for speech pattern modelling', Proc. IEEE-ICASSP, April 1993, Minneapolis, MN, p. 499–502.
-
20)
-
S.J. Young ,
J. Odell ,
D. Ollason ,
V. Valtchev ,
P. Woodland
.
(1997)
The HTK Book.
-
21)
-
Holmes, J.N.: US Patent US6292775, 2001.
-
22)
-
P.J.B. Jackson ,
B.-H. Lo ,
M.J. Russell
.
Data-driven, non-linear, formant-to-acoustic mapping for ASR.
Electron. Lett.
,
13 ,
667 -
669
-
23)
-
L. Deng
.
A dynamic, feature-based approach to the interface between phonology and phonetics for speech modelling and recognition.
Speech Commun.
,
4 ,
288 -
323
-
24)
-
F.V. Jensen
.
(1996)
An introduction to Bayesian networks.
-
25)
-
M.J. Russell ,
P.J.B. Jackson
.
A multiple-level linear/linear segmental HMM with a formant-based intermediate layer.
Comput. Speech Lang.
,
2 ,
205 -
225
-
26)
-
W.J. Holmes ,
M.J. Russell
.
Probablistic-trajectory segmental HMMs.
Comput. Speech Lang.
,
1 ,
3 -
37
-
27)
-
Digalakis, V.: `Segment-based stochastic models of spectral dynamics for continuous speech recognition', 1922, PhD, Boston University.
-
28)
-
J. Glass
.
A probabilistic framework for segment-based speech recognition.
Comput. Speech Lang.
,
137 -
152
-
29)
-
L. Deng ,
J. Ma
.
Spontaneous speech recognition using a statistical coarticulatory model for vocal-tract-resonance dynamics.
J. Acoust. Soc. Am.
,
6 ,
3036 -
3048
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr_20060179
Related content
content/journals/10.1049/iet-spr_20060179
pub_keyword,iet_inspecKeyword,pub_concept
6
6