Your browser does not support JavaScript!

NIML: non-intrusive machine learning-based speech quality prediction on VoIP networks

NIML: non-intrusive machine learning-based speech quality prediction on VoIP networks

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Communications — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Voice over Internet Protocol (VoIP) networks have recently emerged as a promising telecommunication medium for transmitting voice signal. One of the essential aspects that interests researchers is how to estimate the quality of transmitted voice over VoIP for several purposes such as design and technical issues. Two methodologies are used to evaluate the voice, which are subjective and objective methods. In this study, the authors propose a non-intrusive machine learning-based (NIML) objective method to estimate the quality of voice. In particular, they build a training set of parameters – from the network and the voice itself – along with the quality of voices as labels. The voice quality is estimated using the perceptual evaluation of speech quality (PESQ) method as an intrusive algorithm. Then, the authors use a set of classifiers to build models for estimating the quality of the transmitted voice from the training set. The experimental results show that the classifier models have a valuable performance where Random Forest model has superior results compared to other models of precision 94.1%, recall 94.2%, and receiver operating characteristic area 99.2% as evaluation metrics.


    1. 1)
      • 2. Sun, L.: ‘Speech quality prediction for voice over internet protocol networks’. Technical report, University of Plymouth, 2004.
    2. 2)
      • 24. McGowan, J.W.: ‘Burst Ratio: A Measure of Bursty Loss on Packet-Based Networks’. United States Patent, B2 (6,931,017), August 2005.
    3. 3)
      • 1. Karapantazis, S., Pavlidou, F.-N.: ‘Voip: a comprehensive survey on a promising technology’, Comput. Netw., 2009, 53, (12), pp. 20502090, Available at
    4. 4)
      • 5. Rango, F., Tropea, M., Fazio, P., et al: ‘Overview on VoIP: subjective and objective measurement methods’, Int. J. Comput. Sci. Netw. Secur., 2006, 6, (1), pp. 140153.
    5. 5)
      • 37. Fernandes, V., Ferreira, A.: ‘On the relevance of f0, jitter, shimmer and hnr acoustic parameters in forensic voice comparisons using gsm, voip and contemporaneous high-quality voice recordings’. Audio Engineering Society Conf.: 2017 AES Int. Conf. on Audio Forensics, Arlington VA, USA, 2017.
    6. 6)
      • 23. Sharan, R.V., Moir, T.J.: ‘Robust acoustic event classification using deep neural networks’, Inf. Sci., 2017, 396(C), pp. 2432.
    7. 7)
      • 15. Voran, S.D.: ‘ U.S. Dept. of Commerce, National Telecommunications and Information Administration’ (Boulder, Colo., 1998),
    8. 8)
      • 10. Salama, H., Dunne, J., Galvin, J., et al: ‘System for monitoring conversational audio call quality’. US Patent 9,635,087, 25 April 2017.
    9. 9)
      • 3. Barry, M.A., Tamgno, J.K., Lishou, C., et al: ‘Challenges of integrating a VoIP communication system on a VSAT network’. 2017 19th Int. Conf. on Advanced Communication Technology (ICACT), Bongpyeong, South Korea, 2017, pp. 275281.
    10. 10)
      • 7. Raja, A., Azad, R.M.A., Flanagan, C., et al: ‘VoIP speech quality estimation in a mixed context with genetic programming’. GECCO '08: Proc. of the 10th Annual Conf. on Genetic and Evolutionary Computation, Atlanta, GA, USA, 2008, pp. 16271634.
    11. 11)
      • 29. Davis, J., Goadrich, M.: ‘The relationship between precision-recall and roc curves’. Proc. of the 23rd Int. Conf. on Machine Learning (ICML '06), Pittsburgh, Pennsylvania, USA, 2006, pp. 233240.
    12. 12)
      • 34. ITU Recommendation P.800.1: ‘Terms and definitions related to quality of service and network performance including dependability’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, 1994.
    13. 13)
      • 33. ITU-T Recommendation G. 712: ‘Transmission Performance Characteristics of Puls Code Modulation (PCM) channels’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), 1996.
    14. 14)
      • 35. I.-T.R. P.862.1: ‘Mapping function for transforming P.862 raw result scores to MOS-LQO’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, 2003.
    15. 15)
      • 31. ITU-T Recommendation G.723.1: ‘Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s’, International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), March 1996.
    16. 16)
      • 14. Yang, W.: ‘Enhanced Modified Bark Spectral Distortion (EMBSD): An Objective Speech Quality Measure Based On Audible Distortion And Cognition Model’. PhD thesis, Philadelphia, PA, USA, (May 1999), chair-Yantorno, Robert.
    17. 17)
      • 12. Quackenbush, S., Barnawell, T., Clements, M.: ‘Objective measures of speech quality’ (Prentice-Hall, Englewood Cliffs, NJ, 1988).
    18. 18)
      • 21. Gaoxiong, Y., Wei, Z.: ‘The perceptual objective listening quality assessment algorithm in telecommunication: introduction of itu-t new metrics polqa’. 2012 1st IEEE Int. Conf. on Communications in China (ICCC), Beijing, China, 2012, pp. 351355.
    19. 19)
      • 6. Mahdi, A.E., Picovici, D.: ‘Advances in voice quality measurement in modern telecommunications’, Digit. Signal Process., 2009, 19, pp. 79103, Available at
    20. 20)
      • 27. ITU-TRecommendation P.50: ‘Objective measuring apparatus’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, September 1999.
    21. 21)
      • 30. Lusted, L.B.: ‘Signal detectability and medical decision-making’, Science, 1971, 171, (3977), pp. 12171219.
    22. 22)
      • 4. Al-Akhras, M., Zedan, H., John, R., et al: ‘Non-intrusive speech quality prediction in VoIP networks using a neural network approach’, Neurocomputing, 2009, 72, pp. 25952608.
    23. 23)
      • 19. Rix, A.W., Beerends, J.G., Hollier, M.P., et al: ‘Perceptual evaluation of speech quality (PESQ): a new method for speech quality assessment of telephone networks and codecs’. 2001 IEEE Int. Conf. on Proc. of the Acoustics, Speech, and Signal Processing (ICASSP ’01), Salt Lake City, UT, USA, 2001, pp. 749752.
    24. 24)
      • 11. ITU-T Recommendation P.800: ‘Methods for subjective determination of transmission quality’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, August 1996.
    25. 25)
      • 26. ITU-T Recommendation G. 107: ‘The E-model, a computational model for use in transmission planning’, International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, March 2000.
    26. 26)
      • 13. Wang, S., Sekey, A., Gersho, A.: ‘An objective measure for predicting subjective quality of speech coders’, IEEE J. Sel. Areas Commun., 1992, 10, (5), pp. 819829.
    27. 27)
      • 17. Rix, W., Hollier, P.: ‘The perceptual analysis measurement system for robust end-to-end speech quality assessment’, Acoust. Speech Signal Process., 2000, 3, pp. 15151518.
    28. 28)
      • 36. Rix, A.W.: ‘Comparison between subjective listening quality and p. 862 pesq score’. Proc. Meas. Speech Qual. Net. (MESAQIN), Prague, Czech Republic, 2003, pp. 1725.
    29. 29)
      • 20. Carvalho, L., Mota, E., Aguiar, R., et al: ‘An e-model implementation for speech quality evaluation in VoIP systems’. 10th IEEE Symp. on Computers and Communications (ISCC'05), Murcia, Spain, 2005, pp. 933938.
    30. 30)
      • 18. ITU-T Recommendation P.862: ‘Perceptual evaluation of speech quality (PESQ):An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, February 2001.
    31. 31)
      • 25. Kekre, S.H., Saxena, H.B., C.L: ‘A two-state Markov model of speech in conversation and its application to computer communication systems’, Comput. Electr. Eng., 1977, 4, (2), pp. 133141.
    32. 32)
      • 16. ITU-T Recommendation P.861: ‘Objective quality measurement of telephone-band (300–3400 Hz) speech codecs’. International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), Geneva, February 1996.
    33. 33)
      • 8. Raja, A., Flanagan, C.: Genetic Programming, chapter Real-Time, Non-intrusive Speech Quality Estimation: a Signal-Based Model, 2008, pp. 3748.
    34. 34)
      • 9. Soloducha, M., Raake, A., Kettler, F., et al: ‘Testing conversational quality of voip with different terminals and degradations’. 2017 Ninth Int. Conf. on Quality of Multimedia Experience (QoMEX), Erfurt, Germany, 2017, pp. 13.
    35. 35)
      • 28. Witten, I.H., Frank, E., Hall, M.A.: ‘Data mining: practical machine learning tools and techniques’ (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011, 3rd edn.).
    36. 36)
      • 22. Qawaqneh, Z., Mallouh, A.A., Barkana, B.D.: ‘Deep neural network framework and transformed mfccs for speaker's age and gender classification’, Knowl.-Based Syst., 2017, 115, pp. 514.
    37. 37)
      • 32. ITU-T Recommendation G. 729: ‘Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP)’, International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), March 1996.

Related content

This is a required field
Please enter a valid email address