Jitter and shimmer are measures of the fundamental frequency and amplitude cycle-to-cycle variations, respectively. Both features have been largely used for the description of pathological voices, and since they characterise some aspects concerning particular voices, they are expected to have a certain degree of speaker specificity. In the current work, jitter and shimmer are successfully used in a speaker verification experiment. Moreover, both measures are combined with spectral and prosodic features using several types of normalisation and fusion techniques in order to obtain better verification results. The overall speaker verification system is also improved by using histogram equalisation as a normalisation technique previous to fusing the features by support vector machines.

References

1. 1)
  - Carey, M.J., Parris, E.S., Lloyd-Thomas, H., Bennett, S.: `Robust prosodic features for speaker identification', Proc. ICSLP, October 1996, Philadelphia, PA.
2. 2)
  - C.L. Ludlow , D.C. Coulter , C.J. Bassich . Relationships between vocal jitter, age, sex, and smoking. J. Acoust. Soc. Am. , 55 - 56
3. 3)
  - J. Kreiman , B.R. Gerrat . Perception of aperiodicity in pathological voice. J. Acoust. Soc. Am. , 2201 - 2211
4. 4)
  - P. Boersma , D. Weenink . (1992) Praat: doing phonetics by computer.
5. 5)
  - Hernando, J., Nadeu, C.: `CDHMM speaker recognition by means of frequency filtering of filter-bank energies', Proc. Eurospeech, September 1997, Rhodes, Greece, p. 2363–2366.
6. 6)
  - V. Dellwo , M. Huckvale , M. Ashby , C. Müller . (2007) How is individuality expressed in voice? An introduction to speech production and description for speaker classification.
7. 7)
  - Nadeu, C., Hernando, J., Gorricho, M.: `On the decorrelation of filter bank energies in speech recognition', Proc. Eurospeech, September 1995, Madrid, Spain.
8. 8)
  - J. Tuson . (2000) Diccionari de lingüística.
9. 9)
  - Michaelis, D., Fröhlich, M., Strube, H.W., Kruse, E., Story, B., Titze, I.R.: `Some simulations concerning jitter and shimmer measurement', Proc. Third Int. Workshop Advances in Quantitative Laryngoscopy, 1998, Aachen, Germany.
10. 10)
  - S.B. Davis , P. Mermelstein . Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustic, Speech Signal Process. , 357 - 366
11. 11)
  - Sadeghi Naini, A., Homayounpour, M.M.: `Speaker age interval and sex identification based on jitters, shimmers and mean MFCC using supervised and unsupervised discriminative classification methods', Proc. ICSP, 2006, Guilin, China.
12. 12)
  - E. Shriberg , A. Stolcke , D. Hakkani-Tur , G. Tur . Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. , 127 - 154
13. 13)
  - H.F. Wertzner , S. Schreiber , L. Amaro . Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders. Rev. Bras. Otorrinolaringol. , 582 - 588
14. 14)
  - Pelecanos, J., Sridharan, S.: `Feature warping for robust speaker verification', Proc. t ODYSSEY-2001, 2001, Crete, Greece.
15. 15)
  - Indovina, M., Uludag, U., Snelik, R., Mink, A., Jain, A.: `Multimodal Biometric authentication methods: a COTS approach', Proc. Workshop Multimodal User Authentication, 2003, Santa Barbara, CA.
16. 16)
  - Sönmez, M.K., Shriberg, E., Heck, L., Weintraub, M.: `Modeling dynamic prosodic variation for speaker verification', Proc. ICSLP, November 1998, Sydney, Australia.
17. 17)
  - J.P. Campbell . Speaker recognition: a tutorial. IEEE , 1437 - 1462
18. 18)
  - Farrús, M., Ejarque, P., Temko, A., Hernando, J.: `Histogram equalization in SVM multimodal person verification', Proc. ICB, 2007, Seoul, Korea.
19. 19)
  - Li, X., Tao, J., Johnson, M.T., Soltis, J., Savage, A., Leong, K.M., Newman, J.D.: `Stress and emotion classification using jitter and shimmer features', Proc. ICASSP, April 2007, Honolulu, Hawaii.
20. 20)
  - M.A. Hearst . Trends and controversies: support vector machines. IEEE Intell. Syst. , 18 - 28
21. 21)
  - Hilger, F., Ney, H.: `Quantile based histogram equalization for noise robust speech recognition', Proc. Eurospeech, 2001, Aalborg, Denmark.
22. 22)
  - H. Gish , M. Schmidt . Text-independent speaker identification. IEEE Signal Process. Mag. , 18 - 32
23. 23)
  - Wang, Y., Wang, Y., Tan, T.: `Combining fingerprint and voiceprint biometrics for identity verification: an experimental comparison', Proc. ICBA, 2004, Hong Kong, China.
24. 24)
  - S. Nooteboom , W.J. Hardcastle , J. Laver . (1997) The prosody of speech: melody and rhythm.
25. 25)
  - A.G. Adami . Modeling prosodic differences for speaker recognition. Speech Commun. , 277 - 291
26. 26)
  - Fox, N.A., Gross, R., Chazal, P., Cohn, J.F., Reilly, R.B.: `Person identification using automatic integration of speech, lip and face experts', Proc. ACM SIGMM 2003 Multimedia Biometrics Methods and Applications Workshop, 2003, Berkeley, CA.
27. 27)
  - Andrews, W., Kohler, M.A., Campbell, J., Godfrey, J., Hernández-Cordero, J.: `Gender-dependent phonetic refraction for speaker recognition', Proc. ICASSP, May 2002, Orlando, FL.
28. 28)
  - A. Schmidt-Nielsen , T.H. Crystal . Speaker verification by human listeners: experiments comparing human and machine performance using the NIST 1998 speaker evaluation data. Digit. Signal Process. , 249 - 266
29. 29)
  - M. Behlau , P. Pontes . (1995) Avaliação e Tratamento das Disfonias.
30. 30)
  - L. Rabiner , B. Juang . (1993) Fundamentals of speech recognition.
31. 31)
  - A.V. Oppenheim , R.W. Schafer . From frequency to quefrency: a history of the cepstrum. IEEE Signal Process. Mag. , 5 , 95 - 106
32. 32)
  - Abad, A., Nadeu, C., Hernando, J., Padrell, J.: `Jacobian adaptation based on the frequency-filtered spectral energies', Proc. Eurospeech, 2003, Geneva, Switzerland.
33. 33)
  - Bartkova, K., Le-Gac, D., Charlet, D., Jouvet, D.: `Prosodic parameter for speaker identification', Proc. ICSLP, September 2002, Denver, CO.
34. 34)
  - Lucey, S., Chen, T.: `Improved audio-visual speaker recognition via the use of a hybrid combination strategy', Proc. AVBPA, 2003, Guildford, UK.
35. 35)
  - Doddington, G.: `Speaker recognition based on idiolectal differences between speakers', Proc. Eurospeech, September 2001, Aalborg, Denmark.
36. 36)
  - Zhang, X., Wong, H., Cheung, W.: `A privacy-aware service-oriented platform for distributed data mining', Proc. Int. Conf. E-Commerce Technology and the Int. Conf. Enterprise Computing, 2006, Palo Alto, CA.
37. 37)
  - Á. de la Torre , A.M. Peinado , J.C. Segura , J.L. Pérez-Córdoba , M.C. Benítez , A.J. Rubio . Histogram equalization of speech representation for robust speech recognition. IEEE Trans Speech Audio Process. , 355 - 366
38. 38)
  - A. Wennerstrom . (2001) The music of everyday speech. Prosody and discourse analysis.
39. 39)
  - E. Limpert , W.A. Stahel , M. Abbt . Log-normal distributions across the sciences: keys and clues. BioScience , 341 - 352
40. 40)
  - B.S. Atal . Automatic speaker recognition based on pitch contours. J. Acoust. Soc. Am. , 1687 - 1697
41. 41)
  - Peskin, B., Navrátil, J., Abramson, J.: `Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS'02', Proc. ICASSP, April 2003, Hong Kong, China.
42. 42)
  - N. Cristianini , J. Shawe-Taylor . (2000) An introduction to support vector machines (and other kernel-based learning methods).
43. 43)
  - Sönmez, M.K., Heck, L., Weintraub, M., Shriberg, E.: `A lognormal tied mixture model of pitch for prosody-based speaker recognition', Proc. Eurospeech, September 1997, Rhodes, Greece.
44. 44)
  - Reynolds, D.A., Andrews, W., Campbell, J.: `The SuperSID project: exploiting high-level information for high-accuracy speaker recognition', Proc- ICASSP, April 2003, Hong Kong, China.
45. 45)
  - R.G. Newcombe . Two-sided confidence intervals for the single proportion: Comparison of seven methods. Stat. Med. , 857 - 872
46. 46)
  - L. Rodríguez-Liñares , C. García-Mateo , J.L. Alba-Castro . On combining classifiers for speaker authentication. Pattern Recognit. , 347 - 359
47. 47)
  - Wittig, F., Müller, C.: `Implicit feedback for user-adaptive systems by analyzing the user's speech', Proc. ABIS-03, 2003, Karlsruhe, Germany.
48. 48)
  - Behlau, M., Madazio, G., Feijó, D., Pontes, P.: ‘Avaliação da Voz’, in ‘Voz - O Livro do Especialista’ (Revinter, Rio de Janeiro, 2001), vol. I, Ch. 3, pp. 86–180.
49. 49)
  - Godfrey, J.J., Holliman, E.C., McDaniel, J.: `Switchboard: telephone speech corpus for research and development', Proc. ICASSP, April 1990, Alburquerque, New Mexico.
50. 50)
  - J. Kitter , M. Hatef , R. Duin , J. Matas . On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. , 226 - 239
51. 51)
  - I. Wagner . A new jitter-algorithm to quantify hoarseness: an exploratory study. Forensic Linguistics , 18 - 27
52. 52)
  - A. Jain , K. Nandakumar , A. Ross . Score normalization in multimodal biometric systems. Pattern Recognit. , 2270 - 2285
53. 53)
  - R.M. Bolle , J.H. Connell , S. Pankanti , N.K. Ratha , A.W. Senior . (2004) Guide to biometrics.
54. 54)
  - S.E. Linville . The aging voice. ASHA Leader , 19 , 12 - 21
55. 55)
  - Weber, F., Manganaro, L., Peskin, B., Shriberg, E.: `Using prosodic and lexical information for speaker identification', Proc. ICASSP, May 2002, Orlando, FL.
56. 56)
  - B. Kröger . Zur Auswirkung der Glottis-Sprechtakt-Kopplung auf die Stimmreinheit. Sprache-Stimme-Gehör , 139 - 142
57. 57)
  - J.C. Christopher . A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. , 121 - 167
58. 58)
  - M. Skosan , D. Mashao . Modified segmental histogram equalization for robust speaker verification. Pattern Recognit. Lett. , 479 - 486
59. 59)
  - Balchandran, R., Mammone, R.: `Non parametric estimation and correction of non linear distortion in speech systems', Proc. ICASSP, May 1998, Seattle, WA.

Using Jitter and Shimmer in speaker verification

Using Jitter and Shimmer in speaker verification

Buy article PDF

Buy Knowledge Pack

Thank you

References

Related content