access icon free Speaker adaptation using probabilistic linear discriminant analysis for continuous speech recognition

The application of probabilistic linear discriminant analysis (PLDA) to speaker adaptation for automatic speech recognition based on hidden Markov models is proposed. By expressing the set of acoustic models of each of the training speakers in a matrix and treating each column as a sample, the small sample problem that can be encountered in PLDA if only one sample is available for each training speaker is overcome. In the continuous speech recognition experiments, the performance of the PLDA based approach improves over the principal component analysis (PCA) based approach and the two-dimensional PCA based approach for adaptation data longer than 12 s.

Inspec keywords: speech recognition; hidden Markov models; statistical analysis; probability

Other keywords: speaker adaptation; automatic speech recognition; principal component analysis; small sample problem; continuous speech recognition; two-dimensional PCA based approach; probabilistic linear discriminant analysis; hidden Markov models; PLDA based approach

Subjects: Markov processes; Speech processing techniques; Markov processes; Speech recognition and synthesis

References

    1. 1)
      • 9. Paul, D.B., Baker, J.M.: ‘The design for the Wall Street Journal-based CSR corpus’. DARPA Speech and Natural Language Workshop, Harriman, NY, USA, February 1992, pp. 357362.
    2. 2)
      • 4. Kenny, P.: ‘Bayesian speaker verification with heavy-tailed priors’. Speaker and Language Recognition Workshop (IEEE Odyssey), Brno, Czech Republic, June 2010.
    3. 3)
      • 6. Anastasakos, T., McDonough, J., Schwartz, R., Makhoul, J.: ‘A compact model for speaker-adaptive training’. Int. Conf. Spoken Language Processing, Philadelphia, PA, USA, October 1996, pp. 11371140.
    4. 4)
      • 3. Prince, S.J.D., Elder, J.H.: ‘Probabilistic linear discriminant analysis for inferences about identity’. Int. Conf. Computer Vision, Rio de Janeiro, Brazil, October 2007, pp. 18.
    5. 5)
      • 10. Gauvain, J.-L., Lee, C.-H.: ‘Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains’, Trans. Speech Audio Process., 1994, 2, pp. 291298 (doi: 10.1109/89.279278).
    6. 6)
      • 3. Prince, S.J.D., Elder, J.H.: ‘Probabilistic linear discriminant analysis for inferences about identity’. Int. Conf. Computer Vision, Rio de Janeiro, Brazil, October 2007, pp. 18.
    7. 7)
      • 5. Jeong, Y., Kim, H.S.: ‘New speaker adaptation method using 2-D PCA’, IEEE Signal Process. Lett., 2010, 17, pp. 193196 (doi: 10.1109/LSP.2009.2036696).
    8. 8)
      • 7. Leggetter, C.J., Woodland, P.C.: ‘Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models’, Comput. Speech Lang., 1995, 9, pp. 171185 (doi: 10.1006/csla.1995.0010).
    9. 9)
      • 9. Paul, D.B., Baker, J.M.: ‘The design for the Wall Street Journal-based CSR corpus’. DARPA Speech and Natural Language Workshop, Harriman, NY, USA, February 1992, pp. 357362.
    10. 10)
      • 2. Kuhn, R., Junqua, J.-C., Nguyen, P., Niedzielski, N.: ‘Rapid speaker adaptation in eigenvoice space’, IEEE Trans. Speech Audio Process., 2000, 8, pp. 695707 (doi: 10.1109/89.876308).
    11. 11)
      • 6. Anastasakos, T., McDonough, J., Schwartz, R., Makhoul, J.: ‘A compact model for speaker-adaptive training’. Int. Conf. Spoken Language Processing, Philadelphia, PA, USA, October 1996, pp. 11371140.
    12. 12)
      • 1. Rabiner, L.R.: ‘A tutorial on hidden Markov models and selected applications in speech recognition’, Proc. IEEE, 1989, 77, pp. 257286 (doi: 10.1109/5.18626).
    13. 13)
      • 4. Kenny, P.: ‘Bayesian speaker verification with heavy-tailed priors’. Speaker and Language Recognition Workshop (IEEE Odyssey), Brno, Czech Republic, June 2010.
    14. 14)
      • 8. Dempster, A.P., Laird, N.M., Rubin, D.B.: ‘Maximum likelihood from incomplete data via the EM algorithm’, J. R. Stat. Soc. Ser. B, Stat. Methodol., 1977, 39, pp. 138.
http://iet.metastore.ingenta.com/content/journals/10.1049/el.2013.2223
Loading

Related content

content/journals/10.1049/el.2013.2223
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading