Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Automatic speaker verification on narrowband and wideband lossy coded clean speech

Substantial progress has been achieved in voice-based biometrics in recent times but a variety of challenges still remain for speech research community. One such obstacle is reliable speaker authentication from speech signals degraded by lossy compression. Compression is commonplace in modern telecommunications, such as mobile telephony, VoIP services, teleconference, voice messaging or gaming. In this study, the authors investigate the effect of lossy speech compression on text-independent speaker verification. Voice biometrics performance is evaluated on clean speech signals distorted by the state-of-the-art narrowband (NB) as well as wideband (WB) speech codecs. The tests are performed in both channel-matched and channel-mismatched scenarios. The test results show that coded WB speech improves voice authentication precision by 1–3% of equal error rate over coded NB speech, even at the lowest investigated bitrates. It is also shown that the enhanced voice services codec does not provide better results than the other codecs involved in this study.

References

    1. 1)
      • 20. Sordo Martinez, P.L., Fauve, B., Larcher, A., et al: ‘Speaker verification performance with constrained durations’. Proc. 2nd Int. Workshop on Biometrics and Forensics (IWBF), Valletta, Malta, March 2014.
    2. 2)
      • 5. Bruhn, S., Norvell, E., Svedberg, J., et al: ‘A novel sinusoidal approach to audio signal frame loss concealment and its application in the new EVS codec standard’. Proc. Int. Conf. ICASSP'15, South Brisbane, QLD, April 2015, pp. 51425146.
    3. 3)
      • 2. Hansen, J.H., Hasan, T.: ‘Speaker recognition by machines and humans: a tutorial review’, IEEE Signal Process. Mag., 2015, 32, (6), pp. 7499.
    4. 4)
      • 9. Stauffer, A.R., Lawson, A.D.: ‘Speaker recognition on lossy compressed speech using the Speex codec’. Proc. INTERSPEECH'09, Brighton, UK, September 2009, pp. 23632366.
    5. 5)
      • 6. Kuitert, M., Boves, L.: ‘Speaker verification with GSM coded telephone speech’. Proc. 5th European Conf. EUROSPEECH'97, Rhodes, Greece, September 1997, pp. 975978.
    6. 6)
      • 19. ETSI TS 26.445: ‘EVS codec detailed algorithmic description’, 2014.
    7. 7)
      • 13. ITU-T Rec. G.711: ‘Pulse code modulation (PCM) of voice frequencies’, 1988.
    8. 8)
      • 15. ITU-T Rec. G.729: ‘Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-exited linear prediction (CS-ACELP)’, 2007.
    9. 9)
      • 22. Kanagasundaram, A., Vogt, R., Dean, D., et al: ‘I-vector based speaker recognition on short utterances’. Proc. Int. Conf. INTERSPEECH'11, Florence, Italy, August 2011, pp. 23412344.
    10. 10)
      • 11. Gallardo, L.F., Moller, S., Wagner, M.: ‘Human speaker identification of known voices transmitted through different user interfaces and transmission channels’. Proc. Int. Conf. ICASSP'13, Vancouver, BC, Canada, May 2013, pp. 77757779.
    11. 11)
      • 4. Gallardo, L.F., Wagner, M., Möller, S.: ‘I-vector speaker verification for speech degraded by narrowband and wideband channels’. Proc. 11th ITG Symp. Speech Communication, Erlangen, Germany, September 2014.
    12. 12)
      • 12. Sadjadi, S.O., Slaney, M., Heck, L.: ‘MSR identity toolbox v1. 0: A Matlab toolbox for speaker-recognition research’, Speech and Language Processing Technical Committee Newsletter, 2013, 1, (4).
    13. 13)
      • 8. Dunn, R.B., Quatieri, T.F., Reynolds, D.A., et al: ‘Speaker recognition from coded speech and the effects of score normalization’. Proc. 35th Asilomar Conf. on Signals, Systems and Computers, Pacific Grove, CA, USA, November 2001, vol. 2, pp. 15621567.
    14. 14)
      • 10. Mclaren, M., Abrash, V., Graciarena, , et al: ‘Improving robustness to compressed speech in speaker recognition’. Proc. of INTERSPEECH'13, Lyon, France, August 2013, pp. 36983702.
    15. 15)
      • 1. Bimbot, F., Bonastre, J.F., Fredouille, C., et al: ‘A tutorial on text-independent speaker verification’, EURASIP J. Appl. Signal Process., 2004, 2004, pp. 430451.
    16. 16)
      • 21. Poddar, A., Sahidullah, M.D., Saha, G.: ‘Performance comparison of speaker recognition systems in presence of duration variability’. Proc. IEEE India Conf. INDICON, New Delhi, India, December 2015.
    17. 17)
      • 7. Besacier, L., Grassi, S., Dufaux, A., et al: ‘GSM speech coding and speaker recognition’. Proc. of ICASSP'00, Istanbul, Turkey, June 2000, vol. 2, pp. II1085II1088.
    18. 18)
      • 18. ETSI TS 26.171: ‘Adaptive multi-rate - wideband (AMR-WB) speech codec; general description’, 2001.
    19. 19)
      • 17. ETSI TS 26.071: ‘3GPP mandatory speech CODEC speech processing functions; AMR speech codec; general description’, 2000.
    20. 20)
      • 14. ITU-T Rec. G.711.1: ‘Wideband embedded extension for G.711 pulse code modulation’, 2008.
    21. 21)
      • 3. Gallardo, L.F.: ‘Human and automatic speaker recognition over telecommunication channels’ (Springer Science + Business Media, Singapore, 2016).
    22. 22)
      • 16. ITU-T Rec. G.729.1: ‘G.729-based embedded variable bit-rate coder: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729’, 2006.
    23. 23)
      • 23. Polacky, J., Jarina, R., Chmulik, M.: ‘Influence of packet loss on a speaker verification system over IP network’. Proc. 26th Int. Conf. Radioelektronika 2016, Kosice, Slovakia, April 2016, pp. 339342.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-bmt.2016.0119
Loading

Related content

content/journals/10.1049/iet-bmt.2016.0119
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address