Using Jitter and Shimmer in speaker verification
Using Jitter and Shimmer in speaker verification
- Author(s): M. Farrús and J. Hernando
- DOI: 10.1049/iet-spr.2008.0147
For access to this article, please select a purchase option:
Buy article PDF
Buy Knowledge Pack
IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.
Thank you
Your recommendation has been sent to your librarian.
- Author(s): M. Farrús 1 and J. Hernando 1
-
-
View affiliations
-
Affiliations:
1: TALP Research Centre, Department of Signal Theory and Communications, Universitat Politècnica de Catalunya, Barcelona, Spain
-
Affiliations:
1: TALP Research Centre, Department of Signal Theory and Communications, Universitat Politècnica de Catalunya, Barcelona, Spain
- Source:
Volume 3, Issue 4,
July 2009,
p.
247 – 257
DOI: 10.1049/iet-spr.2008.0147 , Print ISSN 1751-9675, Online ISSN 1751-9683
Jitter and shimmer are measures of the fundamental frequency and amplitude cycle-to-cycle variations, respectively. Both features have been largely used for the description of pathological voices, and since they characterise some aspects concerning particular voices, they are expected to have a certain degree of speaker specificity. In the current work, jitter and shimmer are successfully used in a speaker verification experiment. Moreover, both measures are combined with spectral and prosodic features using several types of normalisation and fusion techniques in order to obtain better verification results. The overall speaker verification system is also improved by using histogram equalisation as a normalisation technique previous to fusing the features by support vector machines.
Inspec keywords: speaker recognition; support vector machines; jitter
Other keywords:
Subjects: Knowledge engineering techniques; Speech recognition and synthesis; Speech processing techniques
References
-
-
1)
- Carey, M.J., Parris, E.S., Lloyd-Thomas, H., Bennett, S.: `Robust prosodic features for speaker identification', Proc. ICSLP, October 1996, Philadelphia, PA.
-
2)
- C.L. Ludlow , D.C. Coulter , C.J. Bassich . Relationships between vocal jitter, age, sex, and smoking. J. Acoust. Soc. Am. , 55 - 56
-
3)
- J. Kreiman , B.R. Gerrat . Perception of aperiodicity in pathological voice. J. Acoust. Soc. Am. , 2201 - 2211
-
4)
- P. Boersma , D. Weenink . (1992) Praat: doing phonetics by computer.
-
5)
- Hernando, J., Nadeu, C.: `CDHMM speaker recognition by means of frequency filtering of filter-bank energies', Proc. Eurospeech, September 1997, Rhodes, Greece, p. 2363–2366.
-
6)
- V. Dellwo , M. Huckvale , M. Ashby , C. Müller . (2007) How is individuality expressed in voice? An introduction to speech production and description for speaker classification.
-
7)
- Nadeu, C., Hernando, J., Gorricho, M.: `On the decorrelation of filter bank energies in speech recognition', Proc. Eurospeech, September 1995, Madrid, Spain.
-
8)
- J. Tuson . (2000) Diccionari de lingüística.
-
9)
- Michaelis, D., Fröhlich, M., Strube, H.W., Kruse, E., Story, B., Titze, I.R.: `Some simulations concerning jitter and shimmer measurement', Proc. Third Int. Workshop Advances in Quantitative Laryngoscopy, 1998, Aachen, Germany.
-
10)
- S.B. Davis , P. Mermelstein . Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustic, Speech Signal Process. , 357 - 366
-
11)
- Sadeghi Naini, A., Homayounpour, M.M.: `Speaker age interval and sex identification based on jitters, shimmers and mean MFCC using supervised and unsupervised discriminative classification methods', Proc. ICSP, 2006, Guilin, China.
-
12)
- E. Shriberg , A. Stolcke , D. Hakkani-Tur , G. Tur . Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. , 127 - 154
-
13)
- H.F. Wertzner , S. Schreiber , L. Amaro . Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders. Rev. Bras. Otorrinolaringol. , 582 - 588
-
14)
- Pelecanos, J., Sridharan, S.: `Feature warping for robust speaker verification', Proc. t ODYSSEY-2001, 2001, Crete, Greece.
-
15)
- Indovina, M., Uludag, U., Snelik, R., Mink, A., Jain, A.: `Multimodal Biometric authentication methods: a COTS approach', Proc. Workshop Multimodal User Authentication, 2003, Santa Barbara, CA.
-
16)
- Sönmez, M.K., Shriberg, E., Heck, L., Weintraub, M.: `Modeling dynamic prosodic variation for speaker verification', Proc. ICSLP, November 1998, Sydney, Australia.
-
17)
- J.P. Campbell . Speaker recognition: a tutorial. IEEE , 1437 - 1462
-
18)
- Farrús, M., Ejarque, P., Temko, A., Hernando, J.: `Histogram equalization in SVM multimodal person verification', Proc. ICB, 2007, Seoul, Korea.
-
19)
- Li, X., Tao, J., Johnson, M.T., Soltis, J., Savage, A., Leong, K.M., Newman, J.D.: `Stress and emotion classification using jitter and shimmer features', Proc. ICASSP, April 2007, Honolulu, Hawaii.
-
20)
- M.A. Hearst . Trends and controversies: support vector machines. IEEE Intell. Syst. , 18 - 28
-
21)
- Hilger, F., Ney, H.: `Quantile based histogram equalization for noise robust speech recognition', Proc. Eurospeech, 2001, Aalborg, Denmark.
-
22)
- H. Gish , M. Schmidt . Text-independent speaker identification. IEEE Signal Process. Mag. , 18 - 32
-
23)
- Wang, Y., Wang, Y., Tan, T.: `Combining fingerprint and voiceprint biometrics for identity verification: an experimental comparison', Proc. ICBA, 2004, Hong Kong, China.
-
24)
- S. Nooteboom , W.J. Hardcastle , J. Laver . (1997) The prosody of speech: melody and rhythm.
-
25)
- A.G. Adami . Modeling prosodic differences for speaker recognition. Speech Commun. , 277 - 291
-
26)
- Fox, N.A., Gross, R., Chazal, P., Cohn, J.F., Reilly, R.B.: `Person identification using automatic integration of speech, lip and face experts', Proc. ACM SIGMM 2003 Multimedia Biometrics Methods and Applications Workshop, 2003, Berkeley, CA.
-
27)
- Andrews, W., Kohler, M.A., Campbell, J., Godfrey, J., Hernández-Cordero, J.: `Gender-dependent phonetic refraction for speaker recognition', Proc. ICASSP, May 2002, Orlando, FL.
-
28)
- A. Schmidt-Nielsen , T.H. Crystal . Speaker verification by human listeners: experiments comparing human and machine performance using the NIST 1998 speaker evaluation data. Digit. Signal Process. , 249 - 266
-
29)
- M. Behlau , P. Pontes . (1995) Avaliação e Tratamento das Disfonias.
-
30)
- L. Rabiner , B. Juang . (1993) Fundamentals of speech recognition.
-
31)
- A.V. Oppenheim , R.W. Schafer . From frequency to quefrency: a history of the cepstrum. IEEE Signal Process. Mag. , 5 , 95 - 106
-
32)
- Abad, A., Nadeu, C., Hernando, J., Padrell, J.: `Jacobian adaptation based on the frequency-filtered spectral energies', Proc. Eurospeech, 2003, Geneva, Switzerland.
-
33)
- Bartkova, K., Le-Gac, D., Charlet, D., Jouvet, D.: `Prosodic parameter for speaker identification', Proc. ICSLP, September 2002, Denver, CO.
-
34)
- Lucey, S., Chen, T.: `Improved audio-visual speaker recognition via the use of a hybrid combination strategy', Proc. AVBPA, 2003, Guildford, UK.
-
35)
- Doddington, G.: `Speaker recognition based on idiolectal differences between speakers', Proc. Eurospeech, September 2001, Aalborg, Denmark.
-
36)
- Zhang, X., Wong, H., Cheung, W.: `A privacy-aware service-oriented platform for distributed data mining', Proc. Int. Conf. E-Commerce Technology and the Int. Conf. Enterprise Computing, 2006, Palo Alto, CA.
-
37)
- Á. de la Torre , A.M. Peinado , J.C. Segura , J.L. Pérez-Córdoba , M.C. Benítez , A.J. Rubio . Histogram equalization of speech representation for robust speech recognition. IEEE Trans Speech Audio Process. , 355 - 366
-
38)
- A. Wennerstrom . (2001) The music of everyday speech. Prosody and discourse analysis.
-
39)
- E. Limpert , W.A. Stahel , M. Abbt . Log-normal distributions across the sciences: keys and clues. BioScience , 341 - 352
-
40)
- B.S. Atal . Automatic speaker recognition based on pitch contours. J. Acoust. Soc. Am. , 1687 - 1697
-
41)
- Peskin, B., Navrátil, J., Abramson, J.: `Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS'02', Proc. ICASSP, April 2003, Hong Kong, China.
-
42)
- N. Cristianini , J. Shawe-Taylor . (2000) An introduction to support vector machines (and other kernel-based learning methods).
-
43)
- Sönmez, M.K., Heck, L., Weintraub, M., Shriberg, E.: `A lognormal tied mixture model of pitch for prosody-based speaker recognition', Proc. Eurospeech, September 1997, Rhodes, Greece.
-
44)
- Reynolds, D.A., Andrews, W., Campbell, J.: `The SuperSID project: exploiting high-level information for high-accuracy speaker recognition', Proc- ICASSP, April 2003, Hong Kong, China.
-
45)
- R.G. Newcombe . Two-sided confidence intervals for the single proportion: Comparison of seven methods. Stat. Med. , 857 - 872
-
46)
- L. Rodríguez-Liñares , C. García-Mateo , J.L. Alba-Castro . On combining classifiers for speaker authentication. Pattern Recognit. , 347 - 359
-
47)
- Wittig, F., Müller, C.: `Implicit feedback for user-adaptive systems by analyzing the user's speech', Proc. ABIS-03, 2003, Karlsruhe, Germany.
-
48)
- Behlau, M., Madazio, G., Feijó, D., Pontes, P.: ‘Avaliação da Voz’, in ‘Voz - O Livro do Especialista’ (Revinter, Rio de Janeiro, 2001), vol. I, Ch. 3, pp. 86–180.
-
49)
- Godfrey, J.J., Holliman, E.C., McDaniel, J.: `Switchboard: telephone speech corpus for research and development', Proc. ICASSP, April 1990, Alburquerque, New Mexico.
-
50)
- J. Kitter , M. Hatef , R. Duin , J. Matas . On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. , 226 - 239
-
51)
- I. Wagner . A new jitter-algorithm to quantify hoarseness: an exploratory study. Forensic Linguistics , 18 - 27
-
52)
- A. Jain , K. Nandakumar , A. Ross . Score normalization in multimodal biometric systems. Pattern Recognit. , 2270 - 2285
-
53)
- R.M. Bolle , J.H. Connell , S. Pankanti , N.K. Ratha , A.W. Senior . (2004) Guide to biometrics.
-
54)
- S.E. Linville . The aging voice. ASHA Leader , 19 , 12 - 21
-
55)
- Weber, F., Manganaro, L., Peskin, B., Shriberg, E.: `Using prosodic and lexical information for speaker identification', Proc. ICASSP, May 2002, Orlando, FL.
-
56)
- B. Kröger . Zur Auswirkung der Glottis-Sprechtakt-Kopplung auf die Stimmreinheit. Sprache-Stimme-Gehör , 139 - 142
-
57)
- J.C. Christopher . A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. , 121 - 167
-
58)
- M. Skosan , D. Mashao . Modified segmental histogram equalization for robust speaker verification. Pattern Recognit. Lett. , 479 - 486
-
59)
- Balchandran, R., Mammone, R.: `Non parametric estimation and correction of non linear distortion in speech systems', Proc. ICASSP, May 1998, Seattle, WA.
-
1)