Using Jitter and Shimmer in speaker verification

Access Full Text

Using Jitter and Shimmer in speaker verification

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Signal Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Jitter and shimmer are measures of the fundamental frequency and amplitude cycle-to-cycle variations, respectively. Both features have been largely used for the description of pathological voices, and since they characterise some aspects concerning particular voices, they are expected to have a certain degree of speaker specificity. In the current work, jitter and shimmer are successfully used in a speaker verification experiment. Moreover, both measures are combined with spectral and prosodic features using several types of normalisation and fusion techniques in order to obtain better verification results. The overall speaker verification system is also improved by using histogram equalisation as a normalisation technique previous to fusing the features by support vector machines.

Inspec keywords: speaker recognition; support vector machines; jitter

Other keywords: shimme; prosodic feature; pathological voices; normalisation techniques; fusion techniques; support vector machines; speaker verification; histogram equalisation; spectral feature; jitter

Subjects: Knowledge engineering techniques; Speech recognition and synthesis; Speech processing techniques

References

    1. 1)
      • Carey, M.J., Parris, E.S., Lloyd-Thomas, H., Bennett, S.: `Robust prosodic features for speaker identification', Proc. ICSLP, October 1996, Philadelphia, PA.
    2. 2)
      • C.L. Ludlow , D.C. Coulter , C.J. Bassich . Relationships between vocal jitter, age, sex, and smoking. J. Acoust. Soc. Am. , 55 - 56
    3. 3)
      • J. Kreiman , B.R. Gerrat . Perception of aperiodicity in pathological voice. J. Acoust. Soc. Am. , 2201 - 2211
    4. 4)
      • P. Boersma , D. Weenink . (1992) Praat: doing phonetics by computer.
    5. 5)
      • Hernando, J., Nadeu, C.: `CDHMM speaker recognition by means of frequency filtering of filter-bank energies', Proc. Eurospeech, September 1997, Rhodes, Greece, p. 2363–2366.
    6. 6)
      • V. Dellwo , M. Huckvale , M. Ashby , C. Müller . (2007) How is individuality expressed in voice? An introduction to speech production and description for speaker classification.
    7. 7)
      • Nadeu, C., Hernando, J., Gorricho, M.: `On the decorrelation of filter bank energies in speech recognition', Proc. Eurospeech, September 1995, Madrid, Spain.
    8. 8)
      • J. Tuson . (2000) Diccionari de lingüística.
    9. 9)
      • Michaelis, D., Fröhlich, M., Strube, H.W., Kruse, E., Story, B., Titze, I.R.: `Some simulations concerning jitter and shimmer measurement', Proc. Third Int. Workshop Advances in Quantitative Laryngoscopy, 1998, Aachen, Germany.
    10. 10)
      • S.B. Davis , P. Mermelstein . Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustic, Speech Signal Process. , 357 - 366
    11. 11)
      • Sadeghi Naini, A., Homayounpour, M.M.: `Speaker age interval and sex identification based on jitters, shimmers and mean MFCC using supervised and unsupervised discriminative classification methods', Proc. ICSP, 2006, Guilin, China.
    12. 12)
      • E. Shriberg , A. Stolcke , D. Hakkani-Tur , G. Tur . Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. , 127 - 154
    13. 13)
      • H.F. Wertzner , S. Schreiber , L. Amaro . Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders. Rev. Bras. Otorrinolaringol. , 582 - 588
    14. 14)
      • Pelecanos, J., Sridharan, S.: `Feature warping for robust speaker verification', Proc. t ODYSSEY-2001, 2001, Crete, Greece.
    15. 15)
      • Indovina, M., Uludag, U., Snelik, R., Mink, A., Jain, A.: `Multimodal Biometric authentication methods: a COTS approach', Proc. Workshop Multimodal User Authentication, 2003, Santa Barbara, CA.
    16. 16)
      • Sönmez, M.K., Shriberg, E., Heck, L., Weintraub, M.: `Modeling dynamic prosodic variation for speaker verification', Proc. ICSLP, November 1998, Sydney, Australia.
    17. 17)
      • J.P. Campbell . Speaker recognition: a tutorial. IEEE , 1437 - 1462
    18. 18)
      • Farrús, M., Ejarque, P., Temko, A., Hernando, J.: `Histogram equalization in SVM multimodal person verification', Proc. ICB, 2007, Seoul, Korea.
    19. 19)
      • Li, X., Tao, J., Johnson, M.T., Soltis, J., Savage, A., Leong, K.M., Newman, J.D.: `Stress and emotion classification using jitter and shimmer features', Proc. ICASSP, April 2007, Honolulu, Hawaii.
    20. 20)
      • M.A. Hearst . Trends and controversies: support vector machines. IEEE Intell. Syst. , 18 - 28
    21. 21)
      • Hilger, F., Ney, H.: `Quantile based histogram equalization for noise robust speech recognition', Proc. Eurospeech, 2001, Aalborg, Denmark.
    22. 22)
      • H. Gish , M. Schmidt . Text-independent speaker identification. IEEE Signal Process. Mag. , 18 - 32
    23. 23)
      • Wang, Y., Wang, Y., Tan, T.: `Combining fingerprint and voiceprint biometrics for identity verification: an experimental comparison', Proc. ICBA, 2004, Hong Kong, China.
    24. 24)
      • S. Nooteboom , W.J. Hardcastle , J. Laver . (1997) The prosody of speech: melody and rhythm.
    25. 25)
      • A.G. Adami . Modeling prosodic differences for speaker recognition. Speech Commun. , 277 - 291
    26. 26)
      • Fox, N.A., Gross, R., Chazal, P., Cohn, J.F., Reilly, R.B.: `Person identification using automatic integration of speech, lip and face experts', Proc. ACM SIGMM 2003 Multimedia Biometrics Methods and Applications Workshop, 2003, Berkeley, CA.
    27. 27)
      • Andrews, W., Kohler, M.A., Campbell, J., Godfrey, J., Hernández-Cordero, J.: `Gender-dependent phonetic refraction for speaker recognition', Proc. ICASSP, May 2002, Orlando, FL.
    28. 28)
      • A. Schmidt-Nielsen , T.H. Crystal . Speaker verification by human listeners: experiments comparing human and machine performance using the NIST 1998 speaker evaluation data. Digit. Signal Process. , 249 - 266
    29. 29)
      • M. Behlau , P. Pontes . (1995) Avaliação e Tratamento das Disfonias.
    30. 30)
      • L. Rabiner , B. Juang . (1993) Fundamentals of speech recognition.
    31. 31)
      • A.V. Oppenheim , R.W. Schafer . From frequency to quefrency: a history of the cepstrum. IEEE Signal Process. Mag. , 5 , 95 - 106
    32. 32)
      • Abad, A., Nadeu, C., Hernando, J., Padrell, J.: `Jacobian adaptation based on the frequency-filtered spectral energies', Proc. Eurospeech, 2003, Geneva, Switzerland.
    33. 33)
      • Bartkova, K., Le-Gac, D., Charlet, D., Jouvet, D.: `Prosodic parameter for speaker identification', Proc. ICSLP, September 2002, Denver, CO.
    34. 34)
      • Lucey, S., Chen, T.: `Improved audio-visual speaker recognition via the use of a hybrid combination strategy', Proc. AVBPA, 2003, Guildford, UK.
    35. 35)
      • Doddington, G.: `Speaker recognition based on idiolectal differences between speakers', Proc. Eurospeech, September 2001, Aalborg, Denmark.
    36. 36)
      • Zhang, X., Wong, H., Cheung, W.: `A privacy-aware service-oriented platform for distributed data mining', Proc. Int. Conf. E-Commerce Technology and the Int. Conf. Enterprise Computing, 2006, Palo Alto, CA.
    37. 37)
      • Á. de la Torre , A.M. Peinado , J.C. Segura , J.L. Pérez-Córdoba , M.C. Benítez , A.J. Rubio . Histogram equalization of speech representation for robust speech recognition. IEEE Trans Speech Audio Process. , 355 - 366
    38. 38)
      • A. Wennerstrom . (2001) The music of everyday speech. Prosody and discourse analysis.
    39. 39)
      • E. Limpert , W.A. Stahel , M. Abbt . Log-normal distributions across the sciences: keys and clues. BioScience , 341 - 352
    40. 40)
      • B.S. Atal . Automatic speaker recognition based on pitch contours. J. Acoust. Soc. Am. , 1687 - 1697
    41. 41)
      • Peskin, B., Navrátil, J., Abramson, J.: `Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS'02', Proc. ICASSP, April 2003, Hong Kong, China.
    42. 42)
      • N. Cristianini , J. Shawe-Taylor . (2000) An introduction to support vector machines (and other kernel-based learning methods).
    43. 43)
      • Sönmez, M.K., Heck, L., Weintraub, M., Shriberg, E.: `A lognormal tied mixture model of pitch for prosody-based speaker recognition', Proc. Eurospeech, September 1997, Rhodes, Greece.
    44. 44)
      • Reynolds, D.A., Andrews, W., Campbell, J.: `The SuperSID project: exploiting high-level information for high-accuracy speaker recognition', Proc- ICASSP, April 2003, Hong Kong, China.
    45. 45)
      • R.G. Newcombe . Two-sided confidence intervals for the single proportion: Comparison of seven methods. Stat. Med. , 857 - 872
    46. 46)
      • L. Rodríguez-Liñares , C. García-Mateo , J.L. Alba-Castro . On combining classifiers for speaker authentication. Pattern Recognit. , 347 - 359
    47. 47)
      • Wittig, F., Müller, C.: `Implicit feedback for user-adaptive systems by analyzing the user's speech', Proc. ABIS-03, 2003, Karlsruhe, Germany.
    48. 48)
      • Behlau, M., Madazio, G., Feijó, D., Pontes, P.: ‘Avaliação da Voz’, in ‘Voz - O Livro do Especialista’ (Revinter, Rio de Janeiro, 2001), vol. I, Ch. 3, pp. 86–180.
    49. 49)
      • Godfrey, J.J., Holliman, E.C., McDaniel, J.: `Switchboard: telephone speech corpus for research and development', Proc. ICASSP, April 1990, Alburquerque, New Mexico.
    50. 50)
      • J. Kitter , M. Hatef , R. Duin , J. Matas . On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. , 226 - 239
    51. 51)
      • I. Wagner . A new jitter-algorithm to quantify hoarseness: an exploratory study. Forensic Linguistics , 18 - 27
    52. 52)
      • A. Jain , K. Nandakumar , A. Ross . Score normalization in multimodal biometric systems. Pattern Recognit. , 2270 - 2285
    53. 53)
      • R.M. Bolle , J.H. Connell , S. Pankanti , N.K. Ratha , A.W. Senior . (2004) Guide to biometrics.
    54. 54)
      • S.E. Linville . The aging voice. ASHA Leader , 19 , 12 - 21
    55. 55)
      • Weber, F., Manganaro, L., Peskin, B., Shriberg, E.: `Using prosodic and lexical information for speaker identification', Proc. ICASSP, May 2002, Orlando, FL.
    56. 56)
      • B. Kröger . Zur Auswirkung der Glottis-Sprechtakt-Kopplung auf die Stimmreinheit. Sprache-Stimme-Gehör , 139 - 142
    57. 57)
      • J.C. Christopher . A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. , 121 - 167
    58. 58)
      • M. Skosan , D. Mashao . Modified segmental histogram equalization for robust speaker verification. Pattern Recognit. Lett. , 479 - 486
    59. 59)
      • Balchandran, R., Mammone, R.: `Non parametric estimation and correction of non linear distortion in speech systems', Proc. ICASSP, May 1998, Seattle, WA.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2008.0147
Loading

Related content

content/journals/10.1049/iet-spr.2008.0147
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading