Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Approach for time-scale modification of speech based on TCNMF

A novel approach for time-scale modification (TSM) of speech based on temporal continuous nonnegative matrix factorisation (TCNMF) is presented. First, the magnitude spectrum of the speech is factorised to the nonnegative space and the time-varying gains, and then the TSM problem is transformed into an interpolation problem of the time-varying gains, which leads to a better performance over the traditional methods based on waveform overlap-add. The superiority of the proposed approach is confirmed by the comparative tests against the traditional methods, including OLA, SOLA, WSOLA, and PSOLA.

References

    1. 1)
      • 2. Roucos, S., Wilgus, A.: ‘High quality time-scale modification for speech’. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Tampa, FL, USA, 1985, Vol. 10, pp. 493496.
    2. 2)
      • 5. Tuomas, V.: ‘Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria’, IEEE Trans. Audio Speech Lang. Process., 2007, 15, (3), pp. 10661074 (doi: 10.1109/TASL.2006.885253).
    3. 3)
      • 6. Lee, D.D., Seung, H.S.: ‘Learning the parts of objects by nonnegative matrix factorization’, Nature, 1999, 401, (6755), pp. 788791 (doi: 10.1038/44565).
    4. 4)
      • 8. Zhu, X., Beauregard, G.T., Wyse, L.L.: ‘Real-time signal estimation from modified short-time Fourier transform magnitude spectra’, IEEE Trans. Audio Speech Lang. Process., 2007, 15, (5), pp. 16451653 (doi: 10.1109/TASL.2007.899236).
    5. 5)
      • 7. Huang, J., Zhang, X., Zhang, Y.: ‘Recovery of lost speech segments using incremental subspace learning’, ETRI J., 2012, 34, (4), pp. 255259.
    6. 6)
      • 9. Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assesesment of Narrowband Telephone Networks and Speech Codecs, ITU-T Recommendation P.862, 2001.
    7. 7)
      • 4. Valbret, H., Moulines, E., Tubach, J.P.: ‘Voice transformation using PSOLA technique’, Speech Commun., 1992, 11, (2–3), pp. 175187 (doi: 10.1016/0167-6393(92)90012-V).
    8. 8)
      • 3. Verhelst, W., Roelands, M.: ‘An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech’. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Minneapolis, MN, USA, 1993, pp. 554557.
    9. 9)
      • 1. Griffin, D.W., Lim, J.S.: ‘Signal estimation from modified short-time Fourier transforms’, IEEE Trans. Acoust. Speech Signal Process., 1984, 32, (2), pp. 236243 (doi: 10.1109/TASSP.1984.1164317).
    10. 10)
      • Roucos, S., Wilgus, A.: `High quality time-scale modification for speech', IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 1985, Tampa, FL, USA, 10, p. 493–496.
    11. 11)
    12. 12)
    13. 13)
      • Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assesesment of Narrowband Telephone Networks and Speech Codecs, ITU-T Recommendation P.862, 2001.
    14. 14)
    15. 15)
    16. 16)
    17. 17)
      • Verhelst, W., Roelands, M.: `An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech', IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 1993, Minneapolis, MN, USA, p. 554–557.
    18. 18)
http://iet.metastore.ingenta.com/content/journals/10.1049/el.2012.3262
Loading

Related content

content/journals/10.1049/el.2012.3262
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address