access icon free Influence of speaker de-identification in depression detection

Depression is a common mental disorder that is usually addressed by outpatient treatments that favour patients’ inclusion in the society. This raises the need for tools to remotely monitor the emotional state of the patients, which can be carried out via telephone or the Internet using speech processing approaches. However, these strategies lead to privacy concerns caused by the transmission of the patients’ speech and its subsequent storage in servers. The use of speech de-identification to protect the privacy of these patients seems straightforward, but the influence of this procedure in the manifestation of the disease in the patients’ speech has not been addressed yet. Hence, this study evaluates the performance of an automatic depression level estimation system when dealing with original and de-identified speech, in order to analyse the influence of the de-identification procedure in the detection of depression. Two de-identification approaches based on voice transformation via frequency warping and amplitude scaling are assessed, which can be applied to any speaker without additional training. Experiments carried out in the framework of the audio/visual emotion challenge 2014 show that the proposed de-identification approaches achieve promising de-identification results at the expense of a slight degradation of depression detection.

Inspec keywords: medical signal detection; speaker recognition; medical signal processing; patient monitoring; data protection

Other keywords: automatic depression level estimation system; frequency warping; Internet; emotional state monitoring; outpatient treatments; speaker de-identification; privacy protection; telephone; voice transformation; mental disorder; amplitude scaling; depression detection; speech processing

Subjects: Speech processing techniques; Signal detection; Speech recognition and synthesis; Biology and medical computing; Biomedical engineering; Data security; Biomedical measurement and imaging

References

    1. 1)
      • 11. Magariños, C., Lopez-Otero, P., Docio-Fernandez, L., et al: ‘Reversible speaker de-identification using pre-trained transformation functions’, Comput. Speech Lang., 2016, 46, pp. 3652.
    2. 2)
      • 22. Erro, D., Sainz, I., Navas, E., et al: ‘Harmonics plus noise model based vocoder for statistical parametric speech synthesis’, IEEE J. Sel. Topics Signal Process., 2014, 8, (2), pp. 184194.
    3. 3)
      • 21. Erro, D., Alonso, A., Serrano, L., et al: ‘Interpretable parametric voice conversion functions based on Gaussian mixture models and constrained transformations’, Comput. Speech Lang., 2015, 30, pp. 315.
    4. 4)
      • 32. Jan, A., Meng, H., Gaus, Y., et al: ‘Automatic depression scale prediction using facial expression dynamics and regression’. Proc. AVEC'14, 2014.
    5. 5)
      • 13. Valstar, M., Schuller, B., Smith, K., et al: ‘AVEC 2014 – 3D dimensional affect and depression recognition challenge’. Proc. AVEC'14, 2014.
    6. 6)
      • 23. Zorila, T., Erro, D., Hernaez, I.: ‘Improving the quality of standard GMM-based voice conversion systems by considering physically motivated linear transformations’, Commun. Comput. Inf. Sci., 2012, 328, pp. 3039.
    7. 7)
      • 8. Jin, Q., Toth, A.R., Schultz, T., et al: ‘Speaker de-identification via voice transformation’. IEEE Workshop on Automatic Speech Recognition and Understanding, 2009, pp. 529533.
    8. 8)
      • 15. Williamson, J., Quatieri, T., Helfer, B., et al: ‘Vocal and facial biomarkers of depression based on motor incoordination and timing’. Proc. AVEC'14, 2014.
    9. 9)
      • 27. Beck, A., Steer, R., Ball, R., et al: ‘Comparison of Beck depression inventories -IA and -II in psychiatric outpatients’, J. Person. Assess., 1996, 67, (3), pp. 588597.
    10. 10)
      • 19. Lopez-Otero, P., Docio-Fernandez, L., Garcia-Mateo, C.: ‘Assessing speaker independence on a speech-based depression level estimation system’, Pattern Recognit. Lett., 2015, 68, pp. 343350.
    11. 11)
      • 24. Dehak, N., Kenny, P.J., Dehak, R., et al: ‘Front end factor analysis for speaker verification’, IEEE Trans. Audio, Speech, Lang. Process., 2011, 19, (4).
    12. 12)
      • 35. Degottex, G., Stylianou, Y.: ‘Analysis and synthesis of speech using an adaptive full-band harmonic model’, IEEE Trans. Audio Speech Lang. Process., 2013, 21, (10), pp. 20852095.
    13. 13)
      • 18. Williamson, J., Godoy, E., Cha, M., et al: ‘Detecting depression using vocal, facial and semantic communication cues’. Proc. AVEC'16, 2016, pp. 310.
    14. 14)
      • 1. World Health Organization: ‘Depression: a global public health concern’, 2012.
    15. 15)
      • 14. Valstar, M., Gratch, J., Schuller, B., et al: ‘AVEC 2016 – depression, mood, and emotion recognition workshop and challenge’. Proc. AVEC'16, 2016.
    16. 16)
      • 26. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: ‘Speaker verification using adapted Gaussian mixture models’, Digital Signal Process., 2000, 10, (1-3), pp. 1941.
    17. 17)
      • 33. Senoussaoui, M., Sarria-Paja, M., Santos, J., et al: ‘Model fusion for multimodal depression classification and level detection’. Proc. AVEC'14, 2014.
    18. 18)
      • 9. Abou-Zleikha, M., Tan, Z.H., Christensen, M., et al: ‘A discriminative approach for speaker selection in speaker de-identification systems’. Proc. European Signal Processing Conf. (EUSIPCO), 2015, pp. 21472151.
    19. 19)
      • 4. Cummins, N.: ‘Automatic assessment of depression from speech: Paralinguistic analysis, modelling and machine learning’. PhD thesis, The University of New South Wales, 2016.
    20. 20)
      • 5. Karam, Z.N., Provost, E.M., Singh, S., et al: ‘Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech’. Proc. ICASSP, 2014, pp. 48584862.
    21. 21)
      • 28. Moreno, A., Poch, D., Bonafonte, A., et al: ‘Albayzin speech database: design of the phonetic corpus’. Proc. EUROSPEECH, 1993.
    22. 22)
      • 10. Magariños, C., Lopez-Otero, P., Docio-Fernandez, L., et al: ‘Piecewise linear definition of transformation functions for speaker de-identification’. Proc. Int. Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE), 2016, pp. 15.
    23. 23)
      • 3. Hor, K., Taylor, M.: ‘Suicide and schizophrenia: a systematic review of rates and risk factors’, J. Psychopharmacol., 2010, 24, (4), pp. 8190.
    24. 24)
      • 31. Pérez Espinosa, H., Escalante, H., Villaseñor Pineda, L., et al: ‘Fusing affective dimensions and audio-visual features from segmented video for depression recognition’. Proc. AVEC'14, 2014.
    25. 25)
      • 7. Justin, T., Štruc, V., Dobrišek, S., et al: ‘Speaker de-identification using diphone recognition and speech synthesis’. Proc. IEEE Int. Conf. Workshops on Automatic Face Gesture Recognition, 2015, pp. 17.
    26. 26)
      • 29. Ortega-Garcia, J., Fierrez, J., Alonso-Fernandez, F., et al: ‘The multi-scenario multi-environment BioSecure multimodal database (BMDB)’, IEEE Trans. Pattern Anal. Mach. Intell., 2009, 32, (4), pp. 10971111.
    27. 27)
      • 20. Pitz, M., Ney, H.: ‘Vocal tract normalization equals linear transformation in cepstral space’, IEEE Trans. Speech Audio Process., 2005, 13, pp. 930944.
    28. 28)
      • 16. Lopez-Otero, P., Docio-Fernandez, L., Garcia-Mateo, C.: ‘A study of acoustic features for depression detection’. Proc. IWBF, 2014, pp. 16.
    29. 29)
      • 2. Bedell, J., Hunger, R., Corrigan, P.: ‘Current approaches to assessment and treatment of persons with serious mental illness’,  Prof. Psychol. Res. Pr., 1997, 28, (3), pp. 217228.
    30. 30)
      • 6. National Institute of Standards and Technology (NIST): ‘De-identification of personally identifiable information’, 2015.
    31. 31)
      • 34. Cummins, N., Sethu, V., Epps, J., et al: ‘Relevance vector machine for depression prediction’. Proc. Interspeech, 2015, pp. 110114.
    32. 32)
      • 17. Lopez-Otero, P., Docio-Fernandez, L., Garcia-Mateo, C.: ‘A study of acoustic features for the classification of depressed speech’. Proc. MIPRO, 2014, pp. 13311335.
    33. 33)
      • 30. Povey, D., Ghoshal, A., Boulianne, G., et al: ‘The Kaldi speech recognition toolkit’. IEEE Workshop on Automatic Speech Recognition and Understanding, 2011.
    34. 34)
      • 25. Garcia-Romero, D., Espy-Wilson, C.Y.: ‘Analysis of i-vector length normalization in speaker recognition systems’. Proc. Interspeech, 2011, pp. 249252.
    35. 35)
      • 12. Valstar, M., Schuller, B., Smith, K., et al: ‘Proceedings of the 3rd International Worskhop on Audio/Visual Emotion Challenge’. Proc. AVEC'13, 2013.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2016.0731
Loading

Related content

content/journals/10.1049/iet-spr.2016.0731
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading