© The Institution of Engineering and Technology
Unit selection speech systems generate synthetic speech by concatenation of acoustic units extracted from a natural recording. Given a large speech database, the sequence of units with the best global cost is chosen by means of a Viterbi search. In this reported work, it is shown that small subcosts not related to perceptual measures can affect the sequence of units that is finally chosen, with a potential effect on the quality of synthetic speech. A segmentwise unit selection approach that minimises this effect is then proposed.
References
-
-
1)
-
F. Campillo ,
E.R. Banga
.
A method for combining intonation modelling and speech unit selection in corpus-based speech synthesis systems.
Speech Commun.
,
941 -
956
-
2)
-
Tokuda, K., Kobayashi, T., Imai, S.: `Speech parameter generation from HMM using dynamic features', ICASSP, 1995, Detroit, MI, USA.
-
3)
-
Black, A., Taylor, P.: `Automatically clustering similar units for unit selection in speech synthesis', Proc. Eurospeech, 1997, Rhodes, Greece, 2, p. 601–604.
-
4)
-
Hunt, A., Black, A.: `Unit selection in a concatenative speech synthesis system using a large speech database', Proc. of ICASSP, 1996, 1, p. 373–376, Taipei, Taiwan.
-
5)
-
E. Moulines ,
F. Charpentier
.
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones.
Speech Commun.
,
453 -
457
http://iet.metastore.ingenta.com/content/journals/10.1049/el.2011.0315
Related content
content/journals/10.1049/el.2011.0315
pub_keyword,iet_inspecKeyword,pub_concept
6
6