Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Inverse filter based excitation model for HMM-based speech synthesis system

Even today, the speech generated by hidden Markov model (HMM)-based speech synthesis system (HTS) still has the buzziness due to the improper modelling of the excitation signal. This study proposes an efficient excitation modelling approach for improving the quality of HTS. In the proposed method, the residual signal obtained from inverse filter is parameterised as excitation features. HMMs are used to model these excitation parameters. During synthesis, the excitation signal is constructed by overlap adding the natural residual segments, and the excitation signal is further modified as per the target source features generated from HMMs. The proposed approach is incorporated in the HTS. Performance evaluation results indicate that the proposed method enhances the quality of synthesis, and is better than the state-of-the-art approaches used for modelling the excitation signal.

References

    1. 1)
      • 12. Narendra, N.P., Rao, K.S.: ‘Parameterization of excitation signal for improving the quality of HMM-based speech synthesis system’, Circuits Syst. Signal Process., 2017, 36, (9), pp. 36503673.
    2. 2)
      • 14. Wakita, H.: ‘Residual energy of linear prediction applied to vowel and speaker recognition’, IEEE Trans. Acoust., Speech, Signal Process., 1976, 24, (3), pp. 270271.
    3. 3)
      • 10. Cabral, J.P.: ‘Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification’. Proc. of Interspeech, 2013, pp. 10821086.
    4. 4)
      • 9. Drugman, T., Dutoit, T.: ‘The deterministic plus stochastic model of the residual signal and its applications’, IEEE Trans. Audio, Speech, Lang. Process., 2012, 20, (3), pp. 968981.
    5. 5)
      • 3. McCree, A., et al: ‘A 2.4 kbit/s MELP coder candidate for the new US federal standard’. Proc. of IEEE Int. Conf. Acoustics, Speech, and Signal Processing, 1996, vol. 1, pp. 200203.
    6. 6)
      • 4. Kawahara, H., et al: ‘Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency based f0 extraction: possible role of a repetitive structure in sounds’, Speech Commun., 1999, 27, (3), pp. 187207.
    7. 7)
      • 23. ‘CMU Arctic Speech Synthesis Databases’. Available at: http://festvox.org/cmu_arctic/.
    8. 8)
      • 11. Narendra, N.P., Reddy, M.K., Rao, K.S.: ‘Excitation modeling for HMM-based speech synthesis based on principal component analysis’. Proc. of IEEE National Conf. Communication, 2016, pp. 16.
    9. 9)
      • 15. ‘Speech Signal Processing Toolkit (SPTK)’. Available at: http://sp-tk.sourceforge.net/.
    10. 10)
      • 20. Young, S., et al: ‘The hidden markov model toolkit (HTK) version 3.4’. 2006. Available at: http://htk.eng.cam.ac.uk/.
    11. 11)
      • 13. Drugman, T., et al: ‘Comparative study of glottal source estimation techniques’, Comput. Speech Lang., 2012, 26, (1), pp. 2034.
    12. 12)
      • 16. Thomas Quatieri, F.: ‘Discrete-time speech signal processing: principles and practice’ (Prentice-Hall, NJ, 2001).
    13. 13)
      • 5. Zen, H., et al: ‘Details of the nitech HMM-based speech synthesis system for the Blizzard challenge 2005’, IEICE Trans. Inf. Syst., 2007, 90, (1), pp. 325333.
    14. 14)
      • 17. ‘HMM-based Speech Synthesis System (HTS)’. Available at: http://hts.sp.nitech.ac.jp/.
    15. 15)
      • 8. Wen, Z., et al: ‘Pitch-scaled spectrum based excitation model for HMM-based speech synthesis’. Proc. of IEEE Int. Conf. Signal Processing (ICSP), 2012, vol. 1, pp. 609612.
    16. 16)
      • 1. Tokuda, K., et al: ‘Speech synthesis based on hidden Markov models’, Proc. IEEE, 2013, 101, (5), pp. 12341252.
    17. 17)
      • 19. Reddy, M.K., Rao, K.S.: ‘Robust pitch extraction method for the HMM-based speech synthesis system’, IEEE Signal Process. Lett., 2017, 24, (8), pp. 11331137.
    18. 18)
      • 21. Shinoda, K., Watanabe, T.: ‘Mdl-based context-dependent subword modeling for speech recognition’, Acoust. Sci. Technol., 2001, 21, (2), pp. 7986.
    19. 19)
      • 18. Zen, H., Toda, T., Tokuda, K.: ‘The nitech-naist HMM-based speech synthesis system for the blizzard challenge 2006’, IEICE Trans. Inf. Syst., 2008, 91, (6), pp. 17641773.
    20. 20)
      • 24. ‘Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow band telephone networks and speech codecs’, ITU-T Draft Recommendation P.862, 2000.
    21. 21)
      • 7. Raitio, T., et al: ‘HMM-based speech synthesis utilizing glottal inverse filtering’, IEEE Trans. Audio, Speech, Lang. Process., 2011, 19, (1), pp. 153165.
    22. 22)
      • 22. Toda, T., et al: ‘A speech parameter generation algorithm considering global variance for HMM-based speech synthesis’, IEICE Trans. Inf. Syst., 2007, E90-D, (5), pp. 816824.
    23. 23)
      • 2. Yoshimura, T., et al: ‘Mixed excitation for HMM-based speech synthesis’. Proc. of Seventh European Conf. Speech Communication and Technology, 2001.
    24. 24)
      • 6. Maia, R., et al: ‘An excitation model for HMM-based speech synthesis based on residual modeling’. 6th ISCA Workshop on Speech Synthesis, Bonn, Germany, 2007.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-spr.2017.0546
Loading

Related content

content/journals/10.1049/iet-spr.2017.0546
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address