© The Institution of Engineering and Technology
The application of probabilistic linear discriminant analysis (PLDA) to speaker adaptation for automatic speech recognition based on hidden Markov models is proposed. By expressing the set of acoustic models of each of the training speakers in a matrix and treating each column as a sample, the small sample problem that can be encountered in PLDA if only one sample is available for each training speaker is overcome. In the continuous speech recognition experiments, the performance of the PLDA based approach improves over the principal component analysis (PCA) based approach and the two-dimensional PCA based approach for adaptation data longer than 12 s.
References
-
-
1)
-
9. Paul, D.B., Baker, J.M.: ‘The design for the Wall Street Journal-based CSR corpus’. DARPA Speech and Natural Language Workshop, Harriman, NY, USA, February 1992, pp. 357–362.
-
2)
-
4. Kenny, P.: ‘Bayesian speaker verification with heavy-tailed priors’. Speaker and Language Recognition Workshop (IEEE Odyssey), Brno, Czech Republic, June 2010.
-
3)
-
6. Anastasakos, T., McDonough, J., Schwartz, R., Makhoul, J.: ‘A compact model for speaker-adaptive training’. Int. Conf. Spoken Language Processing, Philadelphia, PA, USA, October 1996, pp. 1137–1140.
-
4)
-
3. Prince, S.J.D., Elder, J.H.: ‘Probabilistic linear discriminant analysis for inferences about identity’. Int. Conf. Computer Vision, Rio de Janeiro, Brazil, October 2007, pp. 1–8.
-
5)
-
10. Gauvain, J.-L., Lee, C.-H.: ‘Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains’, Trans. Speech Audio Process., 1994, 2, pp. 291–298 (doi: 10.1109/89.279278).
-
6)
-
3. Prince, S.J.D., Elder, J.H.: ‘Probabilistic linear discriminant analysis for inferences about identity’. Int. Conf. Computer Vision, Rio de Janeiro, Brazil, October 2007, pp. 1–8.
-
7)
-
5. Jeong, Y., Kim, H.S.: ‘New speaker adaptation method using 2-D PCA’, IEEE Signal Process. Lett., 2010, 17, pp. 193–196 (doi: 10.1109/LSP.2009.2036696).
-
8)
-
7. Leggetter, C.J., Woodland, P.C.: ‘Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models’, Comput. Speech Lang., 1995, 9, pp. 171–185 (doi: 10.1006/csla.1995.0010).
-
9)
-
9. Paul, D.B., Baker, J.M.: ‘The design for the Wall Street Journal-based CSR corpus’. DARPA Speech and Natural Language Workshop, Harriman, NY, USA, February 1992, pp. 357–362.
-
10)
-
2. Kuhn, R., Junqua, J.-C., Nguyen, P., Niedzielski, N.: ‘Rapid speaker adaptation in eigenvoice space’, IEEE Trans. Speech Audio Process., 2000, 8, pp. 695–707 (doi: 10.1109/89.876308).
-
11)
-
6. Anastasakos, T., McDonough, J., Schwartz, R., Makhoul, J.: ‘A compact model for speaker-adaptive training’. Int. Conf. Spoken Language Processing, Philadelphia, PA, USA, October 1996, pp. 1137–1140.
-
12)
-
1. Rabiner, L.R.: ‘A tutorial on hidden Markov models and selected applications in speech recognition’, Proc. IEEE, 1989, 77, pp. 257–286 (doi: 10.1109/5.18626).
-
13)
-
4. Kenny, P.: ‘Bayesian speaker verification with heavy-tailed priors’. Speaker and Language Recognition Workshop (IEEE Odyssey), Brno, Czech Republic, June 2010.
-
14)
-
8. Dempster, A.P., Laird, N.M., Rubin, D.B.: ‘Maximum likelihood from incomplete data via the EM algorithm’, J. R. Stat. Soc. Ser. B, Stat. Methodol., 1977, 39, pp. 1–38.
http://iet.metastore.ingenta.com/content/journals/10.1049/el.2013.2223
Related content
content/journals/10.1049/el.2013.2223
pub_keyword,iet_inspecKeyword,pub_concept
6
6