http://iet.metastore.ingenta.com
1887

Speaker verification with short utterances: a review of challenges, trends and opportunities

Speaker verification with short utterances: a review of challenges, trends and opportunities

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Biometrics — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Automatic speaker verification (ASV) technology now reports a reasonable level of accuracy in its applications in voice-based biometric systems. However, it requires adequate amount of speech data for enrolment and verification; otherwise, the performance becomes considerably degraded. For this reason, the trade-off between the convenience and security is difficult to maintain in practical scenarios. The utterance duration remains a critical issue while deploying a voice biometric system in real-world applications. A large amount of research work has been carried out to address the limited data issue within the scope of SV. The advancements and research activities in mitigating the challenges due to short utterance have seen a significant rise in recent times. In this study, the authors present an extensive survey of SV with short utterances considering the studies from recent past and include latest research offering various solutions and analyses. The review also summarises the major findings of the studies of duration variability problem in ASV systems. Finally, they discuss a number of possible future directions promoting further research in this field.

References

    1. 1)
      • 1. Kinnunen, T., Li, H.: ‘An overview of text-independent speaker recognition: from features to supervectors’, Speech Commun., 2010, 52, (1), pp. 1240.
    2. 2)
      • 2. Campbell, J.P.Jr.: ‘Speaker recognition: a tutorial’, Proc. IEEE, 1997, 85, (9), pp. 14371462.
    3. 3)
      • 3. Chakroborty, S., Saha, G.: ‘Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on Gaussian filter’, Int. J. Signal Process., 2009, 5, (1), pp. 1119.
    4. 4)
      • 4. Hébert, M.: ‘Text-dependent speaker recognition’ (Springer, 2008), pp. 743762.
    5. 5)
      • 5. Shetty, M.: ‘ICICI bank to roll out voice authentication’. Aovailable at: wwwtimesofindiaindiatimescom/business/india-business/ICICI-Bank-to-rollout-voice-authentication/articleshow/46818823cms, 6 April, 2015.
    6. 6)
      • 6. Loshin, P.: ‘Barclays replaces passwords with voice authentication’. Available at http://searchsecuritytechtargetcom/news/450301866/Barclays-replacespasswords-with-voice-authentication, 3 August 2016.
    7. 7)
      • 7. Mandasari, M.I., McLaren, M., van Leeuwen, D.A.: ‘Evaluation of i-vector speaker recognition systems for forensic application’. Proc. INTERSPEECH, 2011, pp. 2124.
    8. 8)
      • 8. Larcher, A., Lee, K.A., Ma, B., et al: ‘RSR2015: database for text-dependent speaker verification using multiple pass-phrases’. Proc. INTERSPEECH, 2012, pp. 15801583.
    9. 9)
      • 9. Jayanna, H., Prasanna, S.M.: ‘Analysis, feature extraction, modeling and testing techniques for speaker recognition’, IETE Tech. Rev., 2009, 26, (3), pp. 181190.
    10. 10)
      • 10. Hansen, J.H.L., Hasan, T.: ‘Speaker recognition by machines and humans: a tutorial review’, IEEE Signal Process. Mag., 2015, 32, (6), pp. 7499.
    11. 11)
      • 11. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: ‘Speaker verification using adapted Gaussian mixture models’, Digit. Signal Process., 2000, 10, (1), pp. 1941.
    12. 12)
      • 12. Dehak, N., Kenny, P., Dehak, R., et al: ‘Front-end factor analysis for speaker verification’, IEEE Trans. Audio Speech Lang. Process., 2011, 19, (4), pp. 788798.
    13. 13)
      • 13. Satyanarayana, P.: ‘Short segment analysis of speech for enhancement’. PhD dissertation, Indian Institute of Technology Madras, India, 1999.
    14. 14)
      • 14. Yegnanarayana, B., Prasanna, S.M., Zachariah, J.M., et al: ‘Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system’, IEEE Trans. Speech Audio Process., 2005, 13, (4), pp. 575582.
    15. 15)
      • 15. Prasanna, S.M., Gupta, C.S., Yegnanarayana, B.: ‘Extraction of speaker-specific excitation information from linear prediction residual of speech’, Speech Commun., 2006, 48, (10), pp. 12431261.
    16. 16)
      • 16. Murty, K.S.R., Yegnanarayana, B.: ‘Combining evidence from residual phase and MFCC features for speaker recognition’, IEEE Signal Process. Lett., 2006, 13, (1), pp. 5255.
    17. 17)
      • 17. Rabiner, L.R., Juang, B.H.: ‘Fundamentals of speech recognition’ (PTR Prentice-Hall, 1993).
    18. 18)
      • 18. Reynolds, D.A.: ‘Experimental evaluation of features for robust speaker identification’, IEEE Trans. Speech Audio Process., 1994, 2, (4), pp. 639643.
    19. 19)
      • 19. Sahidullah, M., Saha, G.: ‘Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition’, Speech Commun., 2012, 54, (4), pp. 543565.
    20. 20)
      • 20. Sahidullah, M., Saha, G.: ‘A novel windowing technique for efficient computation of MFCC for speaker recognition’, IEEE Signal Process. Lett., 2013, 20, (2), pp. 149152.
    21. 21)
      • 21. Mandasari, M.I., Saeidi, R., McLaren, M., et al: ‘Quality measure functions for calibration of speaker recognition systems in various duration conditions’, IEEE Trans. Audio Speech Lang. Process., 2013, 21, (11), pp. 24252438.
    22. 22)
      • 22. Kanagasundaram, A., Dean, D., Sridharan, S., et al: ‘Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques’, Speech Commun., 2014, 59, pp. 6982.
    23. 23)
      • 23. Sahidullah, M., Kinnunen, T.: ‘Local spectral variability features for speaker verification’, Digit. Signal Process., 2016, 50, pp. 111.
    24. 24)
      • 24. Atal, B.S.: ‘Automatic speaker recognition based on pitch contours’, J. Acoust. Soc. Am., 1972, 52, (6B), pp. 16871697.
    25. 25)
      • 25. Mary, L., Rao, K., Gangashetty, S., et al: ‘Neural network models for capturing duration and intonation knowledge for language and speaker identification’. Proc. ICCNS, 2004.
    26. 26)
      • 26. Farahani, F., Georgiou, P.G., Narayanan, S.S.: ‘Speaker identification using suprasegmental pitch pattern dynamics’. Proc. ICASSP, 2004, 1, pp. I89.
    27. 27)
      • 27. Davis, S.B., Mermelstein, P.: ‘Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences’, IEEE Trans. Acoust. Speech Signal Process., 1980, 28, (4), pp. 357366.
    28. 28)
      • 28. Hermansky, H.: ‘Perceptual linear predictive (PLP) analysis of speech’, J. Acoust. Soc. Am., 1990, 87, (4), pp. 17381752.
    29. 29)
      • 29. Dişken, G., Tüfekçi, Z., Saribulut, L., et al: ‘A review on feature extraction for speaker recognition under degraded conditions’, IETE Tech. Rev., 2017, 34, (3), pp. 321332.
    30. 30)
      • 30. Furui, S.: ‘Cepstral analysis technique for automatic speaker verification’, IEEE Trans. Acoust. Speech Signal Process., 1981, 29, (2), pp. 254272.
    31. 31)
      • 31. Reynolds, D.A., Rose, R.C.: ‘Robust text-independent speaker identification using Gaussian mixture speaker models’, IEEE Trans. Speech Audio Process., 1995, 3, (1), pp. 7283.
    32. 32)
      • 32. Kenny, P., Ouellet, P., Dehak, N., et al: ‘A study of interspeaker variability in speaker verification’, IEEE Trans. Audio Speech Lang. Process., 2008, 16, (5), pp. 980988.
    33. 33)
      • 33. Kenny, P.: ‘Bayesian speaker verification with heavy-tailed priors’. Proc. Odyssey, 2010, p. 14.
    34. 34)
      • 34. Deng, L., Yu, D.: ‘Deep learning: methods and applications’, Found. Trends Signal Process., 2014, 7, (3-4), pp. 197387.
    35. 35)
      • 35. Kenny, P., Gupta, V., Stafylakis, T., et al: ‘Deep neural networks for extracting Baum–Welch statistics for speaker recognition’. Proc. Odyssey, 2014, pp. 293298.
    36. 36)
      • 36. Lei, Y., Scheffer, N., Ferrer, L., et al: ‘A novel scheme for speaker recognition using a phonetically-aware deep neural network’. Proc. ICASSP, 2014, pp. 16951699.
    37. 37)
      • 37. Tirumala, S.S., Shahamiri, S.R.: ‘A review on deep learning approaches in speaker identification’. Proc. Eighth Int. Conf. Signal Processing Systems, 2016, pp. 142147.
    38. 38)
      • 38. Variani, E., Lei, X., McDermott, E., et al: ‘Deep neural networks for small footprint text-dependent speaker verification’. Proc. ICASSP, 2014, pp. 40524056.
    39. 39)
      • 39. Li, L., Wang, D., Zhang, Z., et al: ‘Deep speaker vectors for semi text-independent speaker verification’, preprint arXiv:150506427, 2015.
    40. 40)
      • 40. Poddar, A., Sahidullah, M., Saha, G.: ‘Performance comparison of speaker recognition systems in presence of duration variability’. Proc. IEEE INDICON, 2015, pp. 16.
    41. 41)
      • 41. Sizov, A., Lee, K.A., Kinnunen, T.: ‘Unifying probabilistic linear discriminant analysis variants in biometric authentication’. Proc. S+SSPR, 2014, pp. 464475.
    42. 42)
      • 42. NIST: ‘The NIST year 2010 speaker recognition evaluation plan’. Technical Report, NIST, 2010.
    43. 43)
      • 43. NIST: ‘The NIST year 2008 speaker recognition evaluation plan’. Technical Report, NIST, 2008.
    44. 44)
      • 44. Kanagasundaram, A., Vogt, R., Dean, D.B., et al: ‘i-Vector based speaker recognition on short utterances’. Proc. INTERSPEECH, 2011, pp. 23412344.
    45. 45)
      • 45. Hautamäki, V., Cheng, Y.C., Rajan, P., et al: ‘Minimax i-vector extractor for short duration speaker verification’. Proc. INTERSPEECH, 2013, pp. 37083712.
    46. 46)
      • 46. Poorjam, A.H., Saeidi, R., Kinnunen, T., et al: ‘Incorporating uncertainty as a quality measure in i-vector based language recognition’. Proc. Odyssey, 2016, pp. 7480.
    47. 47)
      • 47. Shum, S.: ‘Unsupervised methods for speaker diarization’. Massachusetts Institute of Technology, 2011.
    48. 48)
      • 48. Poddar, A., Sahidullah, M., Saha, G.: ‘An adaptive i-vector extraction for speaker verification with short utterance’. Int. Conf. Pattern Recognition and Machine Intelligence, 2017, vol. 7, pp. 16.
    49. 49)
      • 49. Hasan, T., Saeidi, R., Hansen, J.H., et al: ‘Duration mismatch compensation for i-vector based speaker recognition systems’. Proc. ICASSP, 2013, pp. 76637667.
    50. 50)
      • 50. Schwarz, P., Matejka, P., Cernocky, J.: ‘Hierarchical structures of neural networks for phoneme recognition’. Proc. ICASSP, 2006, vol. 1, p. I.
    51. 51)
      • 51. Sarkar, A.K., Matrouf, D., Bousquet, P.M., et al: ‘Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification’. Proc. INTERSPEECH, 2012.
    52. 52)
      • 52. Chen, Y., Tang, Z.M.: ‘The speaker recognition of noisy short utterance’. ICISBDE, 2013, pp. 666671.
    53. 53)
      • 53. Li, Z.Y., Zhang, W.Q., Liu, J.: ‘Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition’, Multimedia Tools Appl., 2015, 74, (3), pp. 937953.
    54. 54)
      • 54. Alam, M.J., Kenny, P., Stafylakis, T.: ‘Combining amplitude and phase-based features for speaker verification with short duration utterances’. Proc. INTERSPEECH, 2015, pp. 249253.
    55. 55)
      • 55. Das, R.K., Abhiram, S., Prasanna, S.M., et al: ‘Combining source and system information for limited data speaker verification’. Proc. INTERSPEECH, 2014, pp. 18361840.
    56. 56)
      • 56. Das, R.K., Pati, D., Prasanna, S.M.: ‘Different aspects of source information for limited data speaker verification’. Proc. NCC, 2015, pp. 16.
    57. 57)
      • 57. Das, R.K., Mahadeva Prasanna, S.: ‘Exploring different attributes of source information for speaker verification with limited test data’, J. Acoust. Soc. Am., 2016, 140, (1), pp. 184190.
    58. 58)
      • 58. Rosenberg, A.E., Soong, F.K.: ‘Evaluation of a vector quantization talker recognition system in text independent and text dependent modes’, Comput. Speech Lang., 1987, 2, (3-4), pp. 143157.
    59. 59)
      • 59. Hautamäki, V., Kinnunen, T., Krkkinen, I., et al: ‘Maximum a posteriori adaptation of the centroid model for speaker verification’, IEEE Signal Process. Lett., 2008, 15, pp. 162165.
    60. 60)
      • 60. Kinnunen, T., Karpov, E., Franti, P.: ‘Real-time speaker identification and verification’, IEEE Trans. Audio Speech Lang. Process., 2006, 14, (1), pp. 277288.
    61. 61)
      • 61. Li, K., Wrench, E.: ‘An approach to text-independent speaker recognition with short utterances’. Proc. ICASSP, 1983, vol. 8, pp. 555558.
    62. 62)
      • 62. Wagner, M.: ‘Combined speech recognition/speaker-verification system with modest training requirements’. Proc. SST, 1996, pp. 139143.
    63. 63)
      • 63. Matsui, T., Furui, S.: ‘Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMM's’, IEEE Trans. Audio Speech Lang. Process., 1994, 2, (3), pp. 456459.
    64. 64)
      • 64. Tran, D., Wagner, M., Van Le, T.: ‘A proposed decision rule for speaker recognition based on fuzzy c-means clustering’. Proc. ISCSLP, 1998.
    65. 65)
      • 65. Li, K.P., Porter, J.E.: ‘Normalizations and selection of speech segments for speaker recognition scoring’. Proc. ICASSP, 1988, pp. 595598.
    66. 66)
      • 66. Merlin, T., Bonastre, J.F., Fredouille, C.: ‘Non directly acoustic process for costless speaker recognition and indexation’. Proc. IWICTA, 1999, vol. 29.
    67. 67)
      • 67. Ferrer, L., Bratt, H., Gadde, VRR., et al: ‘Modeling duration patterns for speaker recognition’. Proc. Eurospeech, 2003, pp. 20172020.
    68. 68)
      • 68. Larcher, A., Bonastre, J.F., Mason, J.S.: ‘Short utterance-based video aided speaker recognition’. Proc. WMSP, 2008, pp. 897901.
    69. 69)
      • 69. Vogt, R.J., Lustri, C.J., Sridharan, S.: ‘Factor analysis modeling for speaker verification with short utterances’. Proc. Odyssey, 2008.
    70. 70)
      • 70. Vogt, R., Sridharan, S.: ‘Minimising speaker verification utterance length through confidence based early verification decisions’. Proc. ICB, 2009, pp. 454463.
    71. 71)
      • 71. Vogt, R., Sridharan, S., Mason, M.: ‘Making confident speaker verification decisions with minimal speech’, IEEE Trans. Audio Speech Lang. Process., 2010, 18, (6), pp. 11821192.
    72. 72)
      • 72. Stadelmann, T., Freisleben, B.: ‘Dimension-decoupled Gaussian mixture model for short utterance speaker recognition’. Proc. ICPR, 2010, pp. 16021605.
    73. 73)
      • 73. Fauve, B.G., Evans, N.W., Pearson, N., et al: ‘Influence of task duration in text-independent speaker verification’. Proc. INTERSPEECH, 2007, pp. 794797.
    74. 74)
      • 74. Fauve, B.G., Evans, N.W., Mason, J.S.: ‘Improving the performance of text independent short duration SVM-and GMM-based speaker verification’. Proc. Odyssey, 2008, p. 18.
    75. 75)
      • 75. McLaren, M., Vogt, R., Baker, B., et al: ‘Experiments in SVM-based speaker verification using short utterances’. Proc. Odyssey, 2010, p. 17.
    76. 76)
      • 76. Krishnamoorthy, P., Jayanna, H.S., Prasanna, S.R.M.: ‘Speaker recognition under limited data condition by noise addition’, Expert Syst. Appl., 2011, 38, (10), pp. 1348713490.
    77. 77)
      • 77. Zhang, C., Zheng, T.F.: ‘A Fisher voice based feature fusion method for short utterance speaker recognition’. Proc. China SIP, 2013, pp. 165169.
    78. 78)
      • 78. Larcher, A., Bonastre, J.F., Mason, J.S.: ‘Constrained temporal structure for text-dependent speaker verification’, Digit. Signal Process., 2013, 23, (6), pp. 19101917.
    79. 79)
      • 79. Zhang, W.Q., Zhao, J., Zhang, W.L., et al: ‘Multi-scale kernels for short utterance speaker recognition’. Proc. ISCSLP, 2014, pp. 414417.
    80. 80)
      • 80. Kanagasundaram, A., Vogt, R.J., Dean, D.B., et al: ‘PLDA based speaker recognition on short utterances’. Proc. Odyssey, 2012.
    81. 81)
      • 81. Larcher, A., Bousquet, P.M., Lee, K.A., et al: ‘i-Vectors in the context of phonetically-constrained short utterances for speaker verification’. Proc. ICASSP, 2012, pp. 47734776.
    82. 82)
      • 82. Fatima, N., Zheng, T.F.: ‘Vowel-category based short utterance speaker recognition’. Proc. ICSAI, 2012, pp. 17741778.
    83. 83)
      • 83. Fatima, N., Zheng, T.F.: ‘Syllable category based short utterance speaker recognition’. Proc. ICALIP, 2012, pp. 436441.
    84. 84)
      • 84. Larcher, A., Lee, K.A., Ma, B., et al: ‘Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances’. Proc. ICASSP, 2013, pp. 76737677.
    85. 85)
      • 85. Bharathi, B., Nagarajan, T.: ‘GMM and i-vector based speaker verification using speaker-specific-text for short utterances’. Proc. TENCON, 2013, pp. 14.
    86. 86)
      • 86. Domínguez, J.G., Zazo, R., González Rodríguez, J.: ‘On the use of total variability and probabilistic linear discriminant analysis for speaker verification on short utterances’ (Springer, 2012), pp. 1119.
    87. 87)
      • 87. Kanagasundaram, A., Dean, D., González Domínguez, J., et al: ‘Improving short utterance based i-vector speaker recognition using source and utterance-duration normalization techniques’. Proc. INTERSPEECH, 2013.
    88. 88)
      • 88. Huber, P.J.: ‘A robust version of the probability ratio test’, Ann. Math. Stat., 1965, 36, (6), pp. 17531758.
    89. 89)
      • 89. Soldi, G., Bozonnet, S., Alegre, F., et al: ‘Short-duration speaker modelling with phone adaptive training’. Proc. Odyssey, 2014.
    90. 90)
      • 90. Kanagasundaram, A., Dean, D., Sridharan, S.: ‘Improving PLDA speaker verification with limited development data’. Proc. ICASSP, 2014, pp. 16651669.
    91. 91)
      • 91. Biagetti, G., Crippa, P., Curzi, A., et al: ‘Speaker identification with short sequences of speech frames’. Proc. ICPRAM, 2015, pp. 178185.
    92. 92)
      • 92. Hong, Q., Li, L., Li, M., et al: ‘Modified-prior PLDA and score calibration for duration mismatch compensation in speaker recognition system’. Proc. INTERSPEECH, 2015, pp. 10371041.
    93. 93)
      • 93. Mandasari, M.I., Saeidi, R., van Leeuwen, D.A.: ‘Quality measures based calibration with duration and noise dependency for speaker recognition’, Speech Commun., 2015, 72, pp. 126137.
    94. 94)
      • 94. Das, R.K., Jelil, S., Prasanna, S.M.: ‘Significance of constraining text in limited data text-independent speaker verification’. Proc. SPCOM, 2016, pp. 15.
    95. 95)
      • 95. Mamodiya, S., Kumar, L., Das, R.K., et al: ‘Exploring acoustic factor analysis for limited test data speaker verification’. Proc. TENCON, 2016, pp. 13971401.
    96. 96)
      • 96. Manam, A.B., Revanth, T.S., Das, R.K., et al: ‘Speaker verification using acoustic factor analysis with phonetic content compensation in limited and degraded test conditions’. Proc. TENCON, 2016, pp. 14021406.
    97. 97)
      • 97. Li, L., Wang, D., Zhang, C., et al: ‘Improving short utterance speaker recognition by modeling speech unit classes’, IEEE Trans. Audio Speech Lang. Process., 2016, 24, (6), pp. 11291139.
    98. 98)
      • 98. Li, W., Fu, T., You, H., et al: ‘Feature sparsity analysis for i-vector based speaker verification’, Speech Commun., 2016, 80, pp. 6070.
    99. 99)
      • 99. Ando, A., Asami, T., Yamaguchi, Y., et al: ‘Speaker recognition in duration-mismatched condition using bootstrapped i-vectors’. Proc. APSIPA, 2016, pp. 14.
    100. 100)
      • 100. Kanagasundaram, A., Dean, D., Sridharan, S., et al: ‘A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems’, Int. J. Speech Technol., 2017, 20, (2), pp. 247259.
    101. 101)
      • 101. Ma, J., Sethu, V., Ambikairajah, E., et al: ‘Duration compensation of i-vectors for short duration speaker verification’, Electron. Lett., 2017, 53, (6), pp. 405407.
    102. 102)
      • 102. Yang, I.H., Heo, H.S., Yoon, S.H., et al: ‘Applying compensation techniques on i-vectors extracted from short-test utterances for speaker verification using deep neural network’. Proc. ICASSP, 2017, pp. 54905494.
    103. 103)
      • 103. Hasan, T., Hansen, J.H.: ‘Acoustic factor analysis for robust speaker verification’, IEEE Trans. Audio Speech Lang. Process., 2013, 21, (4), pp. 842853.
    104. 104)
      • 104. Li, L., Wang, D., Zhang, X., et al: ‘System combination for short utterance speaker recognition’, arXiv preprint arXiv:160309460, 2016.
    105. 105)
      • 105. Villalba, J., Ortega, A., Miguel, A., et al: ‘Analysis of speech quality measures for the task of estimating the reliability of speaker verification decisions’, Speech Commun., 2016, 78, pp. 4261.
    106. 106)
      • 106. Kanagasundaram, A., Dean, D., Sridharan, S., et al: ‘Dnn based speaker recognition on short utterances’, preprint arXiv:161003190, 2016.
    107. 107)
      • 107. Tian, Y., He, L., Cai, M., et al: ‘Deep neural networks based speaker modeling at different levels of phonetic granularity’. Proc. ICASSP, 2017, pp. 54405444.
    108. 108)
      • 108. McLaren, M., Lei, Y., Ferrer, L.: ‘Advances in deep neural network approaches to speaker recognition’. Proc. ICASSP, 2015, pp. 48144818.
    109. 109)
      • 109. Snyder, D., Garcia Romero, D., Povey, D.: ‘Time delay deep neural network-based universal background models for speaker recognition’. Proc. ASRU, 2015, pp. 9297.
    110. 110)
      • 110. Garcia Romero, D., Zhang, X., McCree, A., et al: ‘Improving speaker recognition performance in the domain adaptation challenge using deep neural networks’. Proc. SLT, 2014, pp. 378383.
    111. 111)
      • 111. Richardson, F., Reynolds, D., Dehak, N.: ‘Deep neural network approaches to speaker and language recognition’, IEEE Signal Process. Lett., 2015, 22, (10), pp. 16711675.
    112. 112)
      • 112. Richardson, F., Reynolds, D., Dehak, N.: ‘A unified deep neural network for speaker and language recognition’, preprint arXiv:150400923, 2015.
    113. 113)
      • 113. Hinton, G., Deng, L., Yu, D., et al: ‘Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups’, IEEE Signal Process. Mag., 2012, 29, (6), pp. 8297.
    114. 114)
      • 114. Heigold, G., Moreno, I., Bengio, S., et al: ‘End-to-end text-dependent speaker verification’. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 51155119.
    115. 115)
      • 115. Yang, L., Jin, R.: ‘Distance metric learning: a comprehensive survey’ (Michigan State University, 2006).
    116. 116)
      • 116. Li, L., Wang, D., Xing, C., et al: ‘Max-margin metric learning for speaker recognition’. 2016 Tenth Int. Symp. Chinese Spoken Language Processing (ISCSLP), 2016, pp. 14.
    117. 117)
      • 117. Lei, Z., Wan, Y., Luo, J., et al: ‘Mahalanobis metric scoring learned from weighted pairwise constraints in i-vector speaker recognition system’. INTERSPEECH, 2016, pp. 18151819.
    118. 118)
      • 118. Tosic, I., Frossard, P.: ‘Dictionary learning’, IEEE Signal Process. Mag., 2011, 28, (2), pp. 2738.
    119. 119)
      • 119. Wright, J., Yang, A.Y., Ganesh, A., et al: ‘Robust face recognition via sparse representation’, IEEE Trans. Pattern Anal. Mach. Intell., 2009, 31, (2), pp. 210227.
    120. 120)
      • 120. Van Der Maaten, L., Postma, E., Van den Herik, J.: ‘Dimensionality reduction: a comparative’, J. Mach. Learn. Res., 2009, 10, pp. 6671.
    121. 121)
      • 121. Bahmaninezhad, F., Hansen, J.H.L.: ‘Generalized discriminant analysis (GDA) for improved i-vector based speaker recognition’, INTERSPEECH, 2016, 2016, pp. 36433647.
    122. 122)
      • 122. Aldhaheri, R.W., Al Saadi, F.E.: ‘Robust text-independent speaker recognition with short utterance in noisy environment using SVD as a matching measure’, J. King Saud Univ., Comput. Inf. Sci., 2004, 17, pp. 2544.
    123. 123)
      • 123. Chen, Y., Tang, Z.: ‘Speaker recognition of noisy short utterance based on speech frame quality discrimination and three-stage classification model’, Int. J. Control Autom., 2015, 8, (3), pp. 135146.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-bmt.2017.0065
Loading

Related content

content/journals/10.1049/iet-bmt.2017.0065
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address