Audio-visual speaker identification with asynchronous articulatory feature

Access Full Text

Audio-visual speaker identification with asynchronous articulatory feature

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
Electronics Letters — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

A visual component is considerably supplementary to audio information and the apparent asynchrony between acoustic and visual cues may be effectively represented by an asynchronous articulatory feature. A new approach to speaker identification using an articulatory feature-based audio-visual model based on the dynamic Bayesian network is presented. Considerably satisfactory results were achieved in experiments on the audio-visual bimodal CMU database.

Inspec keywords: audio-visual systems; speaker recognition; belief networks

Other keywords: articulatory feature-based audio-visual model; audio-visual speaker identification; Bayesian network; visual component; apparent asynchrony; audio-visual bimodal CMU database; audio information; asynchronous articulatory feature

Subjects: Speech recognition and synthesis; Knowledge engineering techniques; Speech processing techniques

References

    1. 1)
      • Livescu, K., Cetin, O., Hasegawa-Johnson, M., King, S., Bartels, C., Borges, N., Kantor, A., Lal, P., Yung, L., Bezman, A., Dawson-Haggerty, S., Woods, B.: `Articulatory feature-based methods for acoustic and audio-visual speech recognition: 2006 JHU Summer Workshop Final Report', Tech. Rep. WS06, 2006.
    2. 2)
      • Zhang, Y., Diao, Q., Huang, S., Hu, W., Bartels, C., Bilmes, J.: `DBN based multi-stream models for speech', Proc. of Int. Conf. on Acoustic, Speech and Signal Processing, (ICASSP), 2003, Hong Kong, China, p. 836–839.
    3. 3)
      • C.P. Browman , L. Goldstein . Articulatory phonology: an overview. Phonetica , 155 - 180
    4. 4)
      • Chu, S.M., Huang, T.S.: `Multi-model sensory fusion with application to audio-visual speech recognition', Proc. European Conf. on Speech Communication and Technology, (Eurospeech), 2001, 2001, Aalborg, Denmark.
    5. 5)
      • Bilmes, J., Zweig, G.: `The graphical models toolkit: an open source software system for speech and time-series processing', Proc. of Int. Conf. on Acoustic, Speech and Signal Processing, (ICASSP), 2002, Florida, USA, p. 3916–3919.
    6. 6)
      • T. Chen . Audiovisual speech processing. IEEE Trans. Signal Process. , 1 , 9 - 21
http://iet.metastore.ingenta.com/content/journals/10.1049/el.2010.3206
Loading

Related content

content/journals/10.1049/el.2010.3206
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading