Automatic speech discrete labels to dimensional emotional values conversion method

Automatic speech discrete labels to dimensional emotional values conversion method

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Biometrics — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Dimensional emotion estimation (e.g. arousal and valence) from spontaneous and realistic expressions has drawn increasing commercial attention. However, the application of dimensional emotion estimation technology remains a challenge due to issues such as manual annotation and evaluation. In this work, the authors introduce an automatic annotation and emotion prediction model. The automatic annotation is performed through three main steps: (i) label initialisation, (ii) automatic label annotation, and (iii) label optimisation. The approach has been validated on different language databases with different types of emotion expressions, including spontaneous, acted and induced emotional expressions. Compared with non-optimisation of the predicted labels, the process of optimisation improves the concordance correlation coefficient (CCC) values by an average of 0.104 for arousal and 0.051 for valence. Furthermore, the standard variation between annotated values and the ground truth is reduced to an average of 0.44 for arousal and 0.34 for valence. Finally, the CCC values using the proposed model reach 0.58 for arousal and 0.28 for valence, which further verifies the feasibility and reliability of the proposed model. The proposed method can be used to reduce labour intensive and time-consuming manual annotation work.


    1. 1)
      • 1. Chen, L., Mao, X., Yan, H.: ‘Text-independent phoneme segmentation combining egg and speech data’, IEEE/ACM Trans. Audio Speech Lang. Process., 2016, 24, (6), pp. 10291037.
    2. 2)
      • 2. Chen, L., Mao, X., Wei, P., et al: ‘Speech emotional features extraction based on electroglottograph’, Neural Comput., 2013, 25, (12), pp. 32943317.
    3. 3)
      • 3. Deng, J., Xu, X., Zhang, Z., et al: ‘Semi-supervised autoencoders for speech emotion recognition’, IEEE/ACM Trans. Audio Speech Lang. Process., 2017, PP, (99), pp. 3143.
    4. 4)
      • 4. Lingenfelser, F., Wagner, J., André, E., et al: ‘Synchronous, asynchronous and event driven fusion systems for affect recognition on naturalistic data’, IEEE Trans. Affect. Comput., 2017, PP, (99), pp. 114.
    5. 5)
      • 5. Mencattini, A., Martinelli, E., Costantini, G., et al: ‘Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure’, Knowl.-Based Syst., 2014, 63, (3), pp. 6881.
    6. 6)
      • 6. Karadogan, S.G., Larsen, J.: ‘Combining semantic and acoustic features for valence and arousal recognition in speech’. Int. Workshop on Cognitive Information Processing, Baiona, Spain, 2012, pp. 16.
    7. 7)
      • 7. Ringeval, F., Eyben, F., Kroupi, E., et al: ‘Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data’, Pattern Recognit. Lett., 2015, 66, (C), pp. 2230.
    8. 8)
      • 8. Gunes, H., Schuller, B.: ‘Categorical and dimensional affect analysis in continuous input: current trends and future directions’, Image Vis. Comput., 2013, 31, (2), pp. 120136.
    9. 9)
      • 9. Busso, C., Bulut, M., Lee, C.C., et al: ‘Iemocap: interactive emotional dyadic motion capture database’, Lang. Res. Eval., 2008, 42, (4), pp. 335364.
    10. 10)
      • 10. Ringeval, F., Sonderegger, A., Sauer, J., et al: ‘Introducing the recola multimodal corpus of remote collaborative and affective interactions’. IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, Shanghai, China, 2013, pp. 18.
    11. 11)
      • 11. Mckeown, G., Valstar, M., Cowie, R., et al: ‘The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent’, IEEE Trans. Affect. Comput., 2012, 3, (1), pp. 517.
    12. 12)
      • 12. Grimm, M., Kroschel, K., Narayanan, S.: ‘The vera am mittag German audio-visual emotional speech database’. IEEE Int. Conf. on Multimedia and Expo, Hannover, Germany, 2008, pp. 865868.
    13. 13)
      • 13. Han, W., Li, H., Ma, L., et al: ‘A ranking-based emotion annotation scheme and real-life speech database’. 2012 Proc. 4th Int. Workshop on Emotion Sentiment and Social Signals, Istanbul, Turkey, 2012, pp. 6771.
    14. 14)
      • 14. Huang, J., Li, Y., Tao, J.: ‘Effect of dimensional emotion in discrete speech emotion classification’. Affective Social Multimedia Computing (ASMMC 2017), Stockholm, Sweden, 2017, pp. 15.
    15. 15)
      • 15. Han, W., Li, H., Ruan, H., et al: ‘Active learning for dimensional speech emotion recognition’. INTERSPEECH, Lyon, France, 2013, pp. 2529.
    16. 16)
      • 16. Hozjan, V., Kacic, Z.: ‘Improved emotion recognition with a large set of statistical features’. European Conf. on Speech Communication and Technology (Eurospeech 2003 – INTERSPEECH 2003), Geneva, Switzerland, September 2003, pp. 133136.
    17. 17)
      • 17. Eyben, F.: ‘Real-time speech and music classification by large audio feature space extraction’ (Springer Theses, Switzerland, 2015).
    18. 18)
      • 18. Eyben, F.: ‘Opensmile: the Munich versatile and fast open-source audio feature extractor’. ACM Int. Conf. on Multimedia, Firenze, Italy, 2010, pp. 14591462.
    19. 19)
      • 19. Scherer, K.R., Shuman, V., Fontaine, J.J.R., et al: ‘The GRID meets the wheel: assessing emotional feeling via self-report’ (Oxford University Press, New York, NY, USA, 2013).
    20. 20)
      • 20. Amiriparian, S., Freitag, M., Cummins, N., et al: ‘Feature selection in multimodal continuous emotion prediction’. 2017 Seventh Int. Conf. on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), San Antonio, TX, USA, 2017, pp. 3037.
    21. 21)
      • 21. Ververidis, D., Kotropoulos, C.: ‘Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition. Signal Process., 2008, 88, (12), pp. 29562970.
    22. 22)
      • 22. Lindgren, F., Geladi, P., Wold, S.: ‘The Kernel algorithm for PLS’. J. Chemom., 1993, 7, (1), pp. 4559.
    23. 23)
      • 23. De Jong, S.: ‘Simpls: an alternative approach to partial least squares regression’. Chemometr. Intell. Lab. Syst., 1993, 18, (3), pp. 251263.
    24. 24)
      • 24. Jing, S., Mao, X., Chen, L.: ‘Prominence features: effective emotional features for speech emotion recognition’. Digit. Signal Process., 2018, 72, pp. 216231.
    25. 25)
      • 25. Burkhardt, F., Paeschke, A., Rolfes, M., et al: ‘A database of German emotional speech’. European Conf. on Speech Communication and Technology (INTERSPEECH 2005 - Eurospeech), Lisbon, Portugal, September 2005, pp. 15171520.
    26. 26)
      • 26. Costantini, G., Iaderola, I., Paoloni, M.T.: ‘Emovo corpus: an Italian emotional speech database’. Int. Conf. on Language Resources and Evaluation, Reykjavik, Iceland, 2014, pp. 15171520.
    27. 27)
      • 27. Mencattini, A., Martinelli, E., Ringeval, F., et al: ‘Continuous estimation of emotions in speech by dynamic cooperative speaker models’, IEEE Trans. Affect. Comput., 2016, 8, (3), pp. 314327.
    28. 28)
      • 28. Lin, L.I.: ‘A concordance correlation coefficient to evaluate reproducibility’, Biometrics, 1989, 45, (1), pp. 255268.
    29. 29)
      • 29. Mariooryad, S., Busso, C.: ‘Correcting time-continuous emotional labels by modeling the reaction lag of evaluators’, IEEE Trans. Affect. Comput., 2017, 6, (2), pp. 97108.

Related content

This is a required field
Please enter a valid email address