Audio-visual speaker identification with asynchronous articulatory feature

Audio-visual speaker identification with asynchronous articulatory feature

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
Electronics Letters — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

A visual component is considerably supplementary to audio information and the apparent asynchrony between acoustic and visual cues may be effectively represented by an asynchronous articulatory feature. A new approach to speaker identification using an articulatory feature-based audio-visual model based on the dynamic Bayesian network is presented. Considerably satisfactory results were achieved in experiments on the audio-visual bimodal CMU database.


    1. 1)
      • Chu, S.M., Huang, T.S.: `Multi-model sensory fusion with application to audio-visual speech recognition', Proc. European Conf. on Speech Communication and Technology, (Eurospeech), 2001, 2001, Aalborg, Denmark.
    2. 2)
      • C.P. Browman , L. Goldstein . Articulatory phonology: an overview. Phonetica , 155 - 180
    3. 3)
      • Livescu, K., Cetin, O., Hasegawa-Johnson, M., King, S., Bartels, C., Borges, N., Kantor, A., Lal, P., Yung, L., Bezman, A., Dawson-Haggerty, S., Woods, B.: `Articulatory feature-based methods for acoustic and audio-visual speech recognition: 2006 JHU Summer Workshop Final Report', Tech. Rep. WS06, 2006.
    4. 4)
      • Zhang, Y., Diao, Q., Huang, S., Hu, W., Bartels, C., Bilmes, J.: `DBN based multi-stream models for speech', Proc. of Int. Conf. on Acoustic, Speech and Signal Processing, (ICASSP), 2003, Hong Kong, China, p. 836–839.
    5. 5)
      • T. Chen . Audiovisual speech processing. IEEE Trans. Signal Process. , 1 , 9 - 21
    6. 6)
      • Bilmes, J., Zweig, G.: `The graphical models toolkit: an open source software system for speech and time-series processing', Proc. of Int. Conf. on Acoustic, Speech and Signal Processing, (ICASSP), 2002, Florida, USA, p. 3916–3919.

Related content

This is a required field
Please enter a valid email address