With a view to using an articulatory representation in automatic recognition of conversational speech, two nonlinear methods for mapping from formants to short-term spectra were investigated: multilayered perceptrons (MLPs), and radial basis function (RBF) networks. Five schemes for dividing the TIMIT data according to their phone class were tested. The r.m.s. error of the RBF networks was 10%, less than that of the MLP, and the scheme based on discrete articulatory regions gave the greatest improvements over a single network.
References
-
-
1)
-
C.M. Bishop
.
(1995)
Neural networks for pattern recognition.
-
2)
-
Richards, H.B., Bridle, J.S.: `The HDM: a segmental hidden dynamic model of coarticulation', Proc. IEEE-ICASSP'99, 1999, p. 357–360.
-
3)
-
M. Ostendorf ,
V.V. Digalakis ,
O.A. Kimball
.
From HMM's to segmental models: a unified view of stochastic modeling for speech recognition.
IEEE Trans. Speech Audio Process.
,
5 ,
360 -
378
-
4)
-
L. Deng ,
J. Ma
.
Spontaneous speech recognition using a statistical coarticulatory model for vocal-tract-resonance dynamics.
J. Acoust. Soc. Am.
,
6 ,
3036 -
3048
-
5)
-
Holmes, J.N.J.: `Speech processing system using formant analysis', US patent, 6292775, September 2001.
-
6)
-
W.J. Holmes ,
M.J. Russell
.
Probabilistic-trajectory segmental HMMs.
Comput. Speech Lang.
,
1 ,
3 -
37
-
7)
-
Holmes, W.J., Holmes, J.N., Garner, P.N.: `Using formant frequencies in speech recognition', Proc. Eurospeech'97, 1997, p. 2083–2086.
http://iet.metastore.ingenta.com/content/journals/10.1049/el_20020436
Related content
content/journals/10.1049/el_20020436
pub_keyword,iet_inspecKeyword,pub_concept
6
6