This is an open access article published by the IET under the Creative Commons Attribution -NonCommercial License (http://creativecommons.org/licenses/by-nc/3.0/)
Statistical speech reconstruction for larynx-related dysphonia has achieved good performance using Gaussian mixture models and, more recently, restricted Boltzmann machine arrays; however, deep neural network (DNN)-based systems have been hampered by the limited amount of training data available from individual voice-loss patients. The authors propose a novel DNN structure that allows a partially supervised training approach on spectral features from smaller data sets, yielding very good results compared with the current state-of-the-art.
References
-
-
1)
-
8. Kawahara, H., Morise, M., Takahashi, T., et al: ‘TANDEM-STRAIGHT: a temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation’. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2008 ICASSP 2008, 2008, pp. 3933–3936.
-
2)
-
4. McLoughlin, I.V., Li, J., Song, Y.: ‘Reconstruction of continuous voiced speech from whispers’. Proc. Interspeech, August 2013, pp. 1022–1026.
-
3)
-
15. McLoughlin, I.V.: ‘Speech and audio processing: a MATLAB-based approach’ (Cambridge University Press, 2016).
-
4)
-
10. Jiang, B., Song, Y., Wei, S., et al: ‘Deep bottleneck features for spoken language identification’, PLoS ONE, 2014, 9, (7), p. e100795 (doi: 10.1371/journal.pone.0100795).
-
5)
-
6. Li, J.-J., McLoughlin, I.V., Dai, L.-R., et al: ‘Whisper-to-speech conversion using restricted Boltzmann machine arrays’, Electron. Lett., 2014, 50, (24), pp. 1781–1782 (doi: 10.1049/el.2014.1645).
-
6)
-
5. Toda, T., Nakagiri, M., Shikano, K.: ‘Statistical voice conversion techniques for body-conducted unvoiced speech enhancement’, IEEE Trans. Audio Speech Lang. Process., 2012, 20, (9), pp. 2505–2517 (doi: 10.1109/TASL.2012.2205241).
-
7)
-
14. Toda, T., Black, A.W., Tokuda, K.: ‘Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory’, IEEE Trans. Audio Speech Lang. Process., 2007, 15, (8), pp. 2222–2235 (doi: 10.1109/TASL.2007.907344).
-
8)
-
7. Hinton, G., Deng, L., Yu, D., et al: ‘Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups’, IEEE Signal Process. Mag., 2012, 29, (6), pp. 82–97 (doi: 10.1109/MSP.2012.2205597).
-
9)
-
12. Chen, L.-H., Ling, Z.-H., Liu, L.-J., et al: ‘Voice conversion using deep neural networks with layer-wise generative training’, IEEE/ACM Trans. Audio Speech Lang. Process., 2014, 22, (12), pp. 1859–1872 (doi: 10.1109/TASLP.2014.2353991).
-
10)
-
11. McLoughlin, I., Zhang, H.-M., Xie, Z.-P., et al: ‘Robust sound event classification using deep neural networks’, IEEE Trans. Audio Speech Lang. Process., 2015, 23, pp. 540–552 (doi: 10.1109/TASLP.2015.2389618).
-
11)
-
1. Mcloughlin, I.V., Sharifzadeh, H.R., Tan, S.L., et al: ‘Reconstruction of phonated speech from whispers using formant-derived plausible pitch modulation’, ACM Trans. Accessible Comput. (TACCESS), 2015, 6, (4), p. 12.
-
12)
-
7. Tajiri, Y., Tanaka, K., Toda, T., et al: ‘Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments’. 16th Annual Conf. of the Int. Speech Communication Association, 2015.
-
13)
-
3. Morris, R.W., Clements, M.A.: ‘Reconstruction of speech from whispers’, Med. Eng. Phys., 2002, 24, (7), pp. 515–520 (doi: 10.1016/S1350-4533(02)00060-7).
-
14)
-
2. Sharifzadeh, H.R., McLoughlin, I.V., Ahmadi, F.: ‘Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec’, IEEE Trans. Biomed. Eng., 2010, 57, pp. 2448–2458 (doi: 10.1109/TBME.2010.2053369).
-
15)
-
13. Xu, Y., Du, J., Dai, L.-R., et al: ‘A regression approach to speech enhancement based on deep neural networks’, IEEE/ACM Trans. Audio Speech Lang. Process., 2015, 23, (1), pp. 7–19 (doi: 10.1109/TASLP.2014.2364452).
http://iet.metastore.ingenta.com/content/journals/10.1049/htl.2016.0103
Related content
content/journals/10.1049/htl.2016.0103
pub_keyword,iet_inspecKeyword,pub_concept
6
6