Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

Audio-visual speaker identification with asynchronous articulatory feature

Audio-visual speaker identification with asynchronous articulatory feature

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
Electronics Letters — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

A visual component is considerably supplementary to audio information and the apparent asynchrony between acoustic and visual cues may be effectively represented by an asynchronous articulatory feature. A new approach to speaker identification using an articulatory feature-based audio-visual model based on the dynamic Bayesian network is presented. Considerably satisfactory results were achieved in experiments on the audio-visual bimodal CMU database.

References

    1. 1)
      • Livescu, K., Cetin, O., Hasegawa-Johnson, M., King, S., Bartels, C., Borges, N., Kantor, A., Lal, P., Yung, L., Bezman, A., Dawson-Haggerty, S., Woods, B.: `Articulatory feature-based methods for acoustic and audio-visual speech recognition: 2006 JHU Summer Workshop Final Report', Tech. Rep. WS06, 2006.
    2. 2)
      • Zhang, Y., Diao, Q., Huang, S., Hu, W., Bartels, C., Bilmes, J.: `DBN based multi-stream models for speech', Proc. of Int. Conf. on Acoustic, Speech and Signal Processing, (ICASSP), 2003, Hong Kong, China, p. 836–839.
    3. 3)
      • C.P. Browman , L. Goldstein . Articulatory phonology: an overview. Phonetica , 155 - 180
    4. 4)
      • Chu, S.M., Huang, T.S.: `Multi-model sensory fusion with application to audio-visual speech recognition', Proc. European Conf. on Speech Communication and Technology, (Eurospeech), 2001, 2001, Aalborg, Denmark.
    5. 5)
      • Bilmes, J., Zweig, G.: `The graphical models toolkit: an open source software system for speech and time-series processing', Proc. of Int. Conf. on Acoustic, Speech and Signal Processing, (ICASSP), 2002, Florida, USA, p. 3916–3919.
    6. 6)
      • T. Chen . Audiovisual speech processing. IEEE Trans. Signal Process. , 1 , 9 - 21
http://iet.metastore.ingenta.com/content/journals/10.1049/el.2010.3206
Loading

Related content

content/journals/10.1049/el.2010.3206
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address