Skew Gaussian mixture models for speaker recognition

Avi Matza; Yuval Bistritz

Skew Gaussian mixture models for speaker recognition

View Fulltext

Author(s): Avi Matza ¹ and Yuval Bistritz ¹
- Affiliations: 1: School of Electrical Engineering, Tel-Aviv University
Source: Volume 8, Issue 8, October 2014, p. 860 – 867
DOI: 10.1049/iet-spr.2013.0270 , Print ISSN 1751-9675, Online ISSN 1751-9683

Received 05/07/2013, Accepted 21/03/2014, Revised 04/03/2014, Published 28/10/2014

Gaussian mixture models (GMMs) are widely used in speech and speaker recognition. This study explores the idea that a mixture of skew Gaussians might capture better feature vectors that tend to have skew empirical distributions. It begins with deriving an expectation maximisation (EM) algorithm to train a mixture of two-piece skew Gaussians that turns out to be not much more complicated than the usual EM algorithm used to train symmetric GMMs. Next, the algorithm is used to compare skew and symmetric GMMs in some simple speaker recognition experiments that use Mel frequency cepstral coefficients (MFCC) and line spectral frequencies (LSF) as the feature vectors. MFCC are one of the most popular feature vectors in speech and speaker recognition applications. LSF were chosen because they exhibit significantly more skewed distribution than MFCC and because they are widely used [together with the related immittance spectral frequencies (ISF)] in speech transmission standards. In the reported experiments, models with skew Gaussians performed better than models with symmetric Gaussians and skew GMMs with LSF compared favourably with both skew symmetric and symmetric GMMs that used MFCC.

References

1. 1)
  - 19. Zue, V., Seneff, S., Glass, J.: ‘Speech database development: TIMIT and beyond’. ESCA Tutorial and Research Workshop on Speech Input/Output Assessment and Speech Databases, Noordwijkerhout, the Netherlands, September 1989, pp. 20–23.
2. 2)
  - 13. John, S.: ‘The three-parameter two-piece normal family of distributions and its fitting’, Commun. Stat. – Theory Methods, 1982, 11, (8), pp. 879–885 (doi: 10.1080/03610928208828279).
3. 3)
  - 4. Kleijn, W.B., Paliwal, K.K. (Ed): ‘Speech coding and synthesis’ (Elsevier Science, Amsterdam, The Netherlands, 1995).
4. 4)
  - 14. Azzalini, A.: ‘A class of distributions which includes the normal ones’, Scand. J. Stat., 1985, 12, pp. 171–178.
5. 5)
  - 5. ITU-T Recommendation G.718: ‘Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8–32 kbit/s’, 06/2008.
6. 6)
  - 12. Gibbons, J.F., Mylroie, S.: ‘Estimation of impurity profiles in ion-implanted amorphous targets using joined half-Gaussian distributions’, Appl. Phys. Lett., 1973, 22, pp. 568–569 (doi: 10.1063/1.1654511).
7. 7)
  - 1. Mclachlan, G., Peel, D.: ‘Finite mixture models’ (John Wiley and Sons Inc., 2000).
8. 8)
  - 16. Arellano-Valle, R.B., Gomez, H.W., Quintana, F.A.: ‘Statistical inference for a general class of asymmetric distributions’, J. Statistical Planning Inference, 2005, 128, (2), pp. 427–443 (doi: 10.1016/j.jspi.2003.11.014).
9. 9)
  - D.A. Reynolds , T.F. Quartieri , R.B. Dunn . Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. , 19 - 41
10. 10)
  - 8. Zilca, R., Bistritz, Y.: ‘Distance based Gaussian Mixture Model for speaker recognition over the telephone’. Proc. of the 6th Int. Conf. on Spoken Language Processing, Beijing, China, October 2000, pp. 16–20.
11. 11)
  - D.A. Reynolds . Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. , 91 - 108
12. 12)
  - 10. Cordeiro, H., Ribeiro, C.M.: ‘Speaker characterization with MLSFs’. Speaker and Language Recognition Workshop, IEEE Odyssey 2006, San Juan, Puerto Rico, June 2006, pp. 1–4.
13. 13)
  - M. Jelinek , R. Salami . Wideband speech coding advances in VMR-WB standard. IEEE Trans. Audio Speech Lang. Process. , 4 , 1167 - 1179
14. 14)
  - 15. Azzalini, A.: ‘Further results on a class of distributions which includes the normal ones’, Statistica, 1986, XLVI, pp. 199–208.
15. 15)
  - 20. Fisher, W.M., Doddington, G.R., Goudie-Marshall, K.M., et al: ‘NTIMIT – LDC Catalog No. LDC93S2’, http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC93S2.
16. 16)
  - 11. Matza, A., Bistritz, Y.: ‘Skew Gaussian mixture models for speaker recognition’. Proc of 12th Annual Conf. of Int. Speech Communication Association, Florence, Italy, August 2011, pp. 5–8.
17. 17)
  - F. Bimbot , I. Magrin-Chagnolleau , L. Mathan . Second-order statistical methods for text-independent speaker identification. Speech Commun. , 177 - 192
18. 18)
  - 9. Lee, B.J., Kim, S., Kang, H.G.: ‘Speaker recognition based on transformed line spectral frequencies’. Proc. of IEEE Intelligent Signal Processing and Communication Systems, November 2004, ISPACS'04, pp. 177–180.
19. 19)
  - 7. Bistritz, Y., Peller, S.: ‘Immittance Spectral Pairs (ISP) for speech encoding’. Proc of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Minneapolice, Minnesota, April 1993, vol. 2, pp. 9–12.
20. 20)
  - 17. Bilmes, J.A.: ‘A gentle tutorial of the EM algorithm and its appliation to parameter estimation for Gausian mixtures and hidden Markov models’, Technical Report TR-97–021 EECS U.C. Berkley, 1998.
21. 21)
  - 18. Campbell, Jr.,J.P., Reynolds, D.A.: ‘Corpora for the evaluation of speaker recognition systems’. Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Phoenix, Arizona, May 1999, pp. 2247–2250.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Skew Gaussian mixture models for speaker recognition

References

Related content