NIML: non-intrusive machine learning-based speech quality prediction on VoIP networks

Rami S. Alkhawaldeh; Saed Khawaldeh; Usama Pervaiz; Moatsum Alawida; Hamzah Alkhawaldeh

NIML: non-intrusive machine learning-based speech quality prediction on VoIP networks

View Fulltext

Author(s): Rami S. Alkhawaldeh¹ ; Saed Khawaldeh^{2, 3, 4, 5} ; Usama Pervaiz^{2, 3, 4, 5} ; Moatsum Alawida⁶ ; Hamzah Alkhawaldeh⁷
- Affiliations: 1: Department of Computer Information Systems , The University of Jordan , Aqaba 77110 , Jordan ;
  2: Erasmus+ Joint Master Program in Medical Imaging and Applications , University of Burgundy , France ;
  3: Erasmus+ Joint Master Program in Medical Imaging and Applications , University of Cassino , Italy ;
  4: Erasmus+ Joint Master Program in Medical Imaging and Applications , University of Girona , Spain ;
  5: Sensor Informatics and Medical Technology Group, Department of Electrical Engineering and Automation , Aalto University , Finland ;
  6: School of Computer Sciences , University Sains Malaysia , 11800 USM, Pulau Pinang , Malaysia ;
  7: King Hussein School of Computing Sciences , Princess Sumaya University for Technology , Amman 11941 , Jordan
Source: Volume 13, Issue 16, 08 October 2019, p. 2609 – 2616
DOI: 10.1049/iet-com.2018.5430 , Print ISSN 1751-8628, Online ISSN 1751-8636

Received 23/05/2018, Accepted 04/07/2019, Revised 28/02/2019, Published 17/07/2019

Voice over Internet Protocol (VoIP) networks have recently emerged as a promising telecommunication medium for transmitting voice signal. One of the essential aspects that interests researchers is how to estimate the quality of transmitted voice over VoIP for several purposes such as design and technical issues. Two methodologies are used to evaluate the voice, which are subjective and objective methods. In this study, the authors propose a non-intrusive machine learning-based (NIML) objective method to estimate the quality of voice. In particular, they build a training set of parameters – from the network and the voice itself – along with the quality of voices as labels. The voice quality is estimated using the perceptual evaluation of speech quality (PESQ) method as an intrusive algorithm. Then, the authors use a set of classifiers to build models for estimating the quality of the transmitted voice from the training set. The experimental results show that the classifier models have a valuable performance where Random Forest model has superior results compared to other models of precision 94.1%, recall 94.2%, and receiver operating characteristic area 99.2% as evaluation metrics.

References

1. 1)
  - 2. Sun, L.: ‘Speech quality prediction for voice over internet protocol networks’. Technical report, University of Plymouth, 2004.
2. 2)
  - 24. McGowan, J.W.: ‘Burst Ratio: A Measure of Bursty Loss on Packet-Based Networks’. United States Patent, B2 (6,931,017), August 2005.
3. 3)
  - 1. Karapantazis, S., Pavlidou, F.-N.: ‘Voip: a comprehensive survey on a promising technology’, Comput. Netw., 2009, 53, (12), pp. 2050–2090, Available at http://www.sciencedirect.com/science/article/pii/S1389128609001200.
4. 4)
  - 5. Rango, F., Tropea, M., Fazio, P., et al: ‘Overview on VoIP: subjective and objective measurement methods’, Int. J. Comput. Sci. Netw. Secur., 2006, 6, (1), pp. 140–153.
5. 5)
  - 37. Fernandes, V., Ferreira, A.: ‘On the relevance of f0, jitter, shimmer and hnr acoustic parameters in forensic voice comparisons using gsm, voip and contemporaneous high-quality voice recordings’. Audio Engineering Society Conf.: 2017 AES Int. Conf. on Audio Forensics, Arlington VA, USA, 2017.
6. 6)
  - 23. Sharan, R.V., Moir, T.J.: ‘Robust acoustic event classification using deep neural networks’, Inf. Sci., 2017, 396(C), pp. 24–32.
7. 7)
  - 15. Voran, S.D.: ‘ U.S. Dept. of Commerce, National Telecommunications and Information Administration’ (Boulder, Colo., 1998), https://catalogue.nla.gov.au/Record/4136660.
8. 8)
  - 10. Salama, H., Dunne, J., Galvin, J., et al: ‘System for monitoring conversational audio call quality’. US Patent 9,635,087, 25 April 2017.
9. 9)
  - 3. Barry, M.A., Tamgno, J.K., Lishou, C., et al: ‘Challenges of integrating a VoIP communication system on a VSAT network’. 2017 19th Int. Conf. on Advanced Communication Technology (ICACT), Bongpyeong, South Korea, 2017, pp. 275–281.
10. 10)
  - 7. Raja, A., Azad, R.M.A., Flanagan, C., et al: ‘VoIP speech quality estimation in a mixed context with genetic programming’. GECCO '08: Proc. of the 10th Annual Conf. on Genetic and Evolutionary Computation, Atlanta, GA, USA, 2008, pp. 1627–1634.
11. 11)
  - 29. Davis, J., Goadrich, M.: ‘The relationship between precision-recall and roc curves’. Proc. of the 23rd Int. Conf. on Machine Learning (ICML '06), Pittsburgh, Pennsylvania, USA, 2006, pp. 233–240.
12. 12)
  - 34. ITU Recommendation P.800.1: ‘Terms and definitions related to quality of service and network performance including dependability’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, 1994.
13. 13)
  - 33. ITU-T Recommendation G. 712: ‘Transmission Performance Characteristics of Puls Code Modulation (PCM) channels’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), 1996.
14. 14)
  - 35. I.-T.R. P.862.1: ‘Mapping function for transforming P.862 raw result scores to MOS-LQO’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, 2003.
15. 15)
  - 31. ITU-T Recommendation G.723.1: ‘Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s’, International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), March 1996.
16. 16)
  - 14. Yang, W.: ‘Enhanced Modified Bark Spectral Distortion (EMBSD): An Objective Speech Quality Measure Based On Audible Distortion And Cognition Model’. PhD thesis, Philadelphia, PA, USA, (May 1999), chair-Yantorno, Robert.
17. 17)
  - 12. Quackenbush, S., Barnawell, T., Clements, M.: ‘Objective measures of speech quality’ (Prentice-Hall, Englewood Cliffs, NJ, 1988).
18. 18)
  - 21. Gaoxiong, Y., Wei, Z.: ‘The perceptual objective listening quality assessment algorithm in telecommunication: introduction of itu-t new metrics polqa’. 2012 1st IEEE Int. Conf. on Communications in China (ICCC), Beijing, China, 2012, pp. 351–355.
19. 19)
  - 6. Mahdi, A.E., Picovici, D.: ‘Advances in voice quality measurement in modern telecommunications’, Digit. Signal Process., 2009, 19, pp. 79–103, Available at http://www.sciencedirect.com/science/article/pii/S1051200407001832.
20. 20)
  - 27. ITU-TRecommendation P.50: ‘Objective measuring apparatus’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, September 1999.
21. 21)
  - 30. Lusted, L.B.: ‘Signal detectability and medical decision-making’, Science, 1971, 171, (3977), pp. 1217–1219.
22. 22)
  - 4. Al-Akhras, M., Zedan, H., John, R., et al: ‘Non-intrusive speech quality prediction in VoIP networks using a neural network approach’, Neurocomputing, 2009, 72, pp. 2595–2608.
23. 23)
  - 19. Rix, A.W., Beerends, J.G., Hollier, M.P., et al: ‘Perceptual evaluation of speech quality (PESQ): a new method for speech quality assessment of telephone networks and codecs’. 2001 IEEE Int. Conf. on Proc. of the Acoustics, Speech, and Signal Processing (ICASSP ’01), Salt Lake City, UT, USA, 2001, pp. 749–752.
24. 24)
  - 11. ITU-T Recommendation P.800: ‘Methods for subjective determination of transmission quality’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, August 1996.
25. 25)
  - 26. ITU-T Recommendation G. 107: ‘The E-model, a computational model for use in transmission planning’, International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, March 2000.
26. 26)
  - 13. Wang, S., Sekey, A., Gersho, A.: ‘An objective measure for predicting subjective quality of speech coders’, IEEE J. Sel. Areas Commun., 1992, 10, (5), pp. 819–829.
27. 27)
  - 17. Rix, W., Hollier, P.: ‘The perceptual analysis measurement system for robust end-to-end speech quality assessment’, Acoust. Speech Signal Process., 2000, 3, pp. 1515–1518.
28. 28)
  - 36. Rix, A.W.: ‘Comparison between subjective listening quality and p. 862 pesq score’. Proc. Meas. Speech Qual. Net. (MESAQIN), Prague, Czech Republic, 2003, pp. 17–25.
29. 29)
  - 20. Carvalho, L., Mota, E., Aguiar, R., et al: ‘An e-model implementation for speech quality evaluation in VoIP systems’. 10th IEEE Symp. on Computers and Communications (ISCC'05), Murcia, Spain, 2005, pp. 933–938.
30. 30)
  - 18. ITU-T Recommendation P.862: ‘Perceptual evaluation of speech quality (PESQ):An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, February 2001.
31. 31)
  - 25. Kekre, S.H., Saxena, H.B., C.L: ‘A two-state Markov model of speech in conversation and its application to computer communication systems’, Comput. Electr. Eng., 1977, 4, (2), pp. 133–141.
32. 32)
  - 16. ITU-T Recommendation P.861: ‘Objective quality measurement of telephone-band (300–3400 Hz) speech codecs’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, February 1996.
33. 33)
  - 8. Raja, A., Flanagan, C.: ‘Genetic Programming, chapter Real-Time, Non-intrusive Speech Quality Estimation: a Signal-Based Model, 2008, pp. 37–48.
34. 34)
  - 9. Soloducha, M., Raake, A., Kettler, F., et al: ‘Testing conversational quality of voip with different terminals and degradations’. 2017 Ninth Int. Conf. on Quality of Multimedia Experience (QoMEX), Erfurt, Germany, 2017, pp. 1–3.
35. 35)
  - 28. Witten, I.H., Frank, E., Hall, M.A.: ‘Data mining: practical machine learning tools and techniques’ (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011, 3rd edn.).
36. 36)
  - 22. Qawaqneh, Z., Mallouh, A.A., Barkana, B.D.: ‘Deep neural network framework and transformed mfccs for speaker's age and gender classification’, Knowl.-Based Syst., 2017, 115, pp. 5–14.
37. 37)
  - 32. ITU-T Recommendation G. 729: ‘Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP)’, International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), March 1996.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

NIML: non-intrusive machine learning-based speech quality prediction on VoIP networks

References

Related content