access icon free High-band feature extraction for artificial bandwidth extension using deep neural network and H optimisation

This work aims to enhance the quality of narrowband (0–4 kHz) voice signal in terms of frequency components, i.e. missing high-frequency components in a range of 4–8 kHz. The proposed artificial bandwidth extension framework uses the optimisation. In this context, a signal model is used to get a better representation of wideband (0–8 kHz) information of a signal. The optimisation is used to obtain the synthesis filter for a given signal model, which is used to synthesise the high-band (4–8 kHz) signal. The discrete Fourier transform addition is performed to add the narrowband signal and estimated high-band signal for removing the leaked information from the synthesis filter and non-ideal low pass filter. Gain adjustment is performed on the estimated high-band signal to make its energy equal to the true high-band signal. Non-stationary characteristics of speech signals generate an assorted variety in synthesis filters and corresponding gain. For this, a deep neural network (DNN) is used to estimate the synthesis filter and gain by using the given narrowband information. The authors analyse the performances of the DNN model on two data sets. Objective and subjective analyses are carried out on these data sets.

Inspec keywords: low-pass filters; audio signal processing; feature extraction; speech enhancement; neural nets

Other keywords: frequency 4.0 kHz to 8.0 kHz; narrowband signal; given signal model; given narrowband information; frequency 0.0 kHz to 8.0 kHz; deep neural network; high-frequency components; frequency 0.0 kHz to 4.0 kHz; speech signals; artificial bandwidth extension framework; high-band feature extraction; synthesis filter; high-band signal

Subjects: Other topics in statistics; Other topics in statistics; Speech and audio signal processing; Neural nets; Filtering methods in signal processing; Speech processing techniques

References

    1. 1)
      • 15. Nour-Eldin, A.H., Kabal, P.: ‘Mel-frequency cepstral coefficient-based bandwidth extension of narrowband speech’. Proc. Ninth Annual Conf. of the Int. Speech Communication Association, Brisbane Australia, 2008.
    2. 2)
      • 53. Goodfellow, I., Bengio, Y., Courville, A.: ‘Deep learning’ (MIT Press, Cambridge, MA, USA, 2016).
    3. 3)
      • 39. Abel, J., Strake, M., Fingscheidt, T.: ‘Artificial bandwidth extension using deep neural networks for spectral envelope estimation’. IEEE Int. Workshop on Acoustic Signal Enhancement (IWAENC), Xi'an, People's Republic of China, 2016, pp. 15.
    4. 4)
      • 1. Kornagel, U.: ‘Techniques for artificial bandwidth extension of telephone speech’, Signal Process., 2006, 86, (6), pp. 12961306.
    5. 5)
      • 12. Andersen, B., Dyreby, J., Jensen, B., et al: ‘Bandwidth expansion of narrow band speech using linear prediction’, Web source, 2015, vol. 26, availabl at http://kom.aau.dk/group/04gr742/pdf/article.pdf.
    6. 6)
      • 23. Park, K.-Y., Kim, H.S.: ‘Narrowband to wideband conversion of speech using GMM based transformation’. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, Turkey, 2000, vol. 3, pp. 18431846.
    7. 7)
      • 35. YağLı, C., Turan, M.T., Erzin, E.: ‘Artificial bandwidth extension of spectral envelope along a Viterbi path’, Speech Commun., 2013, 55, (1), pp. 111118.
    8. 8)
      • 33. Song, G.-B., Martynovich, P.: ‘A study of HMM-based bandwidth extension of speech signals’, Signal Process., 2009, 89, (10), pp. 20362044.
    9. 9)
      • 8. Qian, Y., Kabal, P.: ‘Dual-mode wideband speech recovery from narrowband speech’. Eighth European Conf. on Speech Communication and Technology, Geneva, Switzerland, 2003, pp. 14331436.
    10. 10)
      • 40. Abel, J., Fingscheidt, T.: ‘A DNN regression approach to speech enhancement by artificial bandwidth extension’. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 2017, pp. 219223.
    11. 11)
      • 48. Shaked, U., Theodor, Y.: ‘H optimal estimation: a tutorial’. Proc. 31st IEEE Conf. on Decision and Control, Tucson, AZ, USA, 1992, pp. 22782286.
    12. 12)
      • 56. Larcher, A., Lee, K.A., Ma, B., et al: ‘Text-dependent speaker verification: classifiers, databases and RSR2015’, Speech Commun., 2014, 60, pp. 5677.
    13. 13)
      • 59. ‘ITU-T (2005), P.862 Amendment 2: Revised Annex A - Reference implementations and conformance testing for ITU-T Recs P.862, P.862.1 and P.862.2’, ITU-T Recommendation, 2005. Available at http://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en.
    14. 14)
      • 42. Gupta, D., Shekhawat, H.S.: ‘Artificial bandwidth extension using h optimization’. Proc. Interspeech, Graz, Austria, 2019, pp. 34213425.
    15. 15)
      • 52. Itakula, F.: ‘Line spectrum representation of linear predictive coefficients of speech signal’, J. Acoust. Soc. Am., 1975, 57, pp. S35S35.
    16. 16)
      • 64. Vaidyanathan, P.P.: ‘Multirate systems and filter banks (Prentice-Hall signal processing series)’ (Prentice Hall, USA, 1993).
    17. 17)
      • 5. Fuemmeler, J.A., Hardie, R.C., Gardner, W.R.: ‘Techniques for the regeneration of wideband speech from narrowband speech’, EURASIP J. Appl. Signal Process., 2001, 2001, (1), pp. 266274.
    18. 18)
      • 49. Chen, T., Francis, B.A.: ‘Optimal sampled-data control systems’, vol. 124 (Springer, London, UK, 1995).
    19. 19)
      • 32. Seltzer, M.L., Acero, A.: ‘Training wideband acoustic models using mixed-bandwidth training data for speech recognition’, IEEE Trans. Audio Speech Lang. Process., 2007, 15, (1), pp. 235245.
    20. 20)
      • 3. Shao, X.: ‘Robust algorithms for speech reconstruction on mobile devices’. PhD dissertation, University of East Anglia, 2005.
    21. 21)
      • 43. Gupta, D., Shekhawat, H.: ‘Artificial bandwidth extension using H optimization and speech production model’. IEEE 29th Int. Conf. on Radioelektronika (RADIOELEKTRONIKA), Pardubice, Czech Republic, 2019, pp. 16.
    22. 22)
      • 37. Hinton, G., Deng, L., Yu, D., et al: ‘Deep neural networks for acoustic modeling in speech recognition’, IEEE Signal Process. Mag., 2012, 29, pp. 8297.
    23. 23)
      • 9. Jax, P., Vary, P.: ‘On artificial bandwidth extension of telephone speech’, Signal Process., 2003, 83, (8), pp. 17071719.
    24. 24)
    25. 25)
      • 38. Xu, Y., Du, J., Dai, L.-R., et al: ‘An experimental study on speech enhancement based on deep neural networks’, IEEE Signal Process. Lett., 2014, 21, (1), pp. 6568.
    26. 26)
      • 26. Ohtani, Y., Tamura, M., Morita, M., et al: ‘GMM-based bandwidth extension using sub-band basis spectrum model’. Proc. Fifteenth Annual Conf. of the Int. Speech Communication Association, Singapore, 2014.
    27. 27)
      • 4. Marelli, D., Balazs, P.: ‘On pole-zero model estimation methods minimizing a logarithmic criterion for speech analysis’, IEEE Trans. Audio Speech Lang. Process., 2010, 18, (2), pp. 237248.
    28. 28)
      • 20. Liu, B., Tao, J., Wen, Z., et al: ‘A novel method of artificial bandwidth extension using deep architecture’, iSixteenth Annual Conference of the International Speech Communication Association, Dresden, Germany, 2015.
    29. 29)
      • 24. Nilsson, M., Gustaftson, H., Andersen, S.V., et al: ‘Gaussian mixture model based mutual information estimation between frequency bands in speech’. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, 2002, vol. 1, pp. 1525.
    30. 30)
      • 17. Kim, K.-T., Lee, M.-K., Kang, H.-G.: ‘Speech bandwidth extension using temporal envelope modeling’, IEEE Signal Process. Lett., 2008, 15, pp. 429432.
    31. 31)
      • 34. Katsir, I., Cohen, I., Malah, D.: ‘Speech bandwidth extension based on speech phonetic content and speaker vocal tract shape estimation’. IEEE 19th European Signal Processing Conf., Barcelona, Spain, 2011, pp. 461465.
    32. 32)
      • 6. Enbom, N., Kleijn, W.B.: ‘Bandwidth expansion of speech based on vector quantization of the Mel frequency cepstral coefficients’. Proc. IEEE Workshop on Speech Coding, Porvoo, Finland, 1999, pp. 171173.
    33. 33)
      • 18. Sadasivan, J., Mukherjee, S., Seelamantula, C.S.: ‘Joint dictionary training for bandwidth extension of speech signals’. IEEE Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, People's Republic of China, 2016, pp. 59255929.
    34. 34)
      • 7. Makhoul, J., Berouti, M.: ‘High-frequency regeneration in speech coding systems’. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Cambridge, United Kingdom, 1979, vol. 4, pp. 428431.
    35. 35)
      • 14. Abel, J., Fingscheidt, T.: ‘Artificial speech bandwidth extension using deep neural networks for wideband spectral envelope estimation’, IEEE/ACM Trans. Audio Speech Lang. Process., 2018, 26, (1), pp. 7183.
    36. 36)
      • 54. Verhelst, W.: ‘Overlap-add methods for time-scaling of speech’, Speech Commun., 2000, 30, (4), pp. 207221.
    37. 37)
      • 62. MATLAB, R2015a. Natick, Massachusetts: The MathWorks Inc., 2015.
    38. 38)
      • 51. Chiang, R.Y., Safonov, M.G.: ‘MATLAB: robust control toolbox user's guide’ (Math Works, Natick, MA, USA, 1996).
    39. 39)
      • 29. Li, S., Villette, S., Ramadas, P., et al: ‘Speech bandwidth extension using generative adversarial networks’. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018, pp. 50295033.
    40. 40)
      • 55. Speech database development at MIT: TIMIT and beyond’.
    41. 41)
      • 47. MathWorks. Available at http://www.mathworks.com/.
    42. 42)
      • 19. Li, K., Lee, C.-H.: ‘A deep neural network approach to speech bandwidth expansion’. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 2015, pp. 43954399.
    43. 43)
      • 58. R. P. ITU-T: ‘862.1: Mapping function for transforming P. 862 raw result scores to MOS-LQO’, International Telecommunication Union, Geneva, Switzerland, 2003.
    44. 44)
      • 60. Wojcicki, K.: ‘PESQ MATLAB wrapper’, MATLAB Central File Exchange, 12 June 2020. Available at https://www.mathworks.com/matlabcentral/fileexchange/33820-pesq-matlab-wrapper.
    45. 45)
      • 36. Schlien, T., Jax, P., Vary, P.: ‘Acoustic tube interpolation for spectral envelope estimation in artificial bandwidth extension’. 13th ITG-Symp. on Speech Communication, Oldenburg, Germany, 2018, pp. 15.
    46. 46)
      • 41. Yamamoto, Y., Nagahara, M., Khargonekar, P.P.: ‘Signal reconstruction via H sampled-data control theory beyond the Shannon paradigm’, IEEE Trans. Signal Process., 2012, 60, (2), pp. 613625.
    47. 47)
      • 22. Unno, T., McCree, A.: ‘A robust narrowband to wideband extension system featuring enhanced codebook mapping’. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Philadelphia, PA, USA, 2005, vol. 1, pp. 805808.
    48. 48)
      • 57. Hu, Y., Loizou, P.C.: ‘Evaluation of objective quality measures for speech enhancement’, IEEE Trans. Audio Speech Lang. Process., 2008, 16, (1), pp. 229238.
    49. 49)
      • 31. Jax, P., Vary, P.: ‘Wideband extension of telephone speech using a hidden Markov model’. Proc. IEEE Workshop on Speech Coding, Delavan, WI, USA, 2000, pp. 133135.
    50. 50)
      • 11. Loizou, P.C.: ‘Speech enhancement: theory and practice’ (CRC Press, Boca Raton, Florida, USA, 2013, 2nd edn.).
    51. 51)
      • 44. Abel, J., Strake, M., Fingscheidt, T.: ‘A simple cepstral domain DNN approach to artificial speech bandwidth extension’. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018, pp. 54695473.
    52. 52)
      • 21. Bachhav, P.B., Todisco, M., Mossi, M., et al: ‘Artificial bandwidth extension using the constant-Q transform’. IEEE Proc. Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 2017, pp. 55505554.
    53. 53)
      • 27. Wang, Y., Zhao, S., Liu, W., et al: ‘Speech bandwidth expansion based on deep neural networks’. Proc. Sixteenth Annual Conf. of the Int. Speech Communication Association, Dresden, Germany, 2015.
    54. 54)
      • 50. Chen, T., Francis, B.A.: ‘Design of multirate filter banks by H optimization’, IEEE Trans. Signal Process., 1995, 43, (12), pp. 28222830.
    55. 55)
      • 45. Makhoul, J.: ‘Linear prediction: a tutorial review’, Proc. IEEE, 1975, 63, (4), pp. 561580.
    56. 56)
      • 16. Sunil, Y., Sinha, R.: ‘Exploration of class specific ABWE for robust children's ASR under mismatched condition’. IEEE Proc. Int. Conf. on Signal Processing and Communications (SPCOM), Bangalore, India, 2012, pp. 15.
    57. 57)
      • 13. Li, Y., Kang, S.: ‘Artificial bandwidth extension using deep neural network-based spectral envelope estimation and enhanced excitation estimation’, IET Signal Process., 2016, 10, (4), pp. 422427.
    58. 58)
      • 46. Markel, J.D., Gray, A.Jr.: ‘Linear prediction of speech (communication and cybernetics 12)’ (Springer-Verlag, Berlin, Heidelberg, 1976, 1st edn.).
    59. 59)
      • 28. Nour-Eldin, A.H., Kabal, P.: ‘Memory-based approximation of the Gaussian mixture model framework for bandwidth extension of narrowband speech’. Proc. Twelfth Annual Conf. of the Int. Speech Communication Association, Florence, Italy, 2011.
    60. 60)
      • 2. Pulakka, H., Laaksonen, L., Vainio, M., et al: ‘Evaluation of an artificial speech bandwidth extension method in three languages’, IEEE Trans. Audio Speech Lang. Process., 2008, 16, (6), pp. 11241137.
    61. 61)
    62. 62)
      • 30. Jax, P., Vary, P.: ‘Feature selection for improved bandwidth extension of speech signals’. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Montreal, QC, Canada, 2004, vol. 1, pp. I697.
    63. 63)
      • 61. Kingma, D.P., Ba, J.: ‘Adam: a method for stochastic optimization’, arXiv preprint arXiv:1412.6980, 2014.
    64. 64)
      • 25. Pulakka, H., Remes, U., Palomäki, K., et al: ‘Speech bandwidth extension using Gaussian mixture model-based estimation of the highband Mel spectrum’. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 51005103.
    65. 65)
      • 63. I. Rec: ‘P. 800: methods for subjective determination of transmission quality’, International Telecommunication Union, Geneva, 1996, p. 22.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2020.0214
Loading

Related content

content/journals/10.1049/iet-spr.2020.0214
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading