New interface for musical instruments using lip reading

New interface for musical instruments using lip reading

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Image Processing — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

As smart audio-visual multimedia devices are developed for various applications, there has been a growing interest in effective human–computer interaction (HCI) interfaces for specific environments. There have also been great efforts to implement HCI interfaces into musical instruments, in which it would be possible to take intuitions, comfort and expressiveness into the musical instruments. However, most of the traditional HCI interfaces are not applicable because both hands are likely to be occupied while playing a musical instrument. In this environment, a lip reading method can be used. A lip reading method is a HCI method that analyses lip motion to recognise spoken words. In this study, a lip reading method is proposed with its application. As a specific example of the interface for musical instruments, the guitar effector application is presented. The proposed lip reading method uses a constrained local model instead of the conventional active appearance model for effective facial feature tracking. The proposed method also uses a dynamic time warping-based classifier for word recognition which is effective for simple real-time implementation of lip reading. The proposed lip reading method shows 85.0% word recognition accuracy on OuluVS database and is effectively applied to the proposed guitar effector application.


    1. 1)
      • 1. Cook, P.: ‘Principles for designing computer music controllers’. Proc. 2001 Conf. on New Interfaces for Musical Expression, Seattle, WA, USA, April 2001, pp. 14.
    2. 2)
      • 2. Tarabella, L., Bertini, G.: ‘Giving expression to multimedia performance’. Proc. ACM Workshops on Multimedia, Los Angeles, CA, USA, October 2000, pp. 3538.
    3. 3)
      • 3. Funk, M., Kuwabara, K., Lyons, M.: ‘Sonification of facial actions for musical expression’. Proc. 2005 Conf. on New Interfaces for Musical Expression, Vancouver, Canada, May 2005, pp. 127131.
    4. 4)
      • 4. Lyons, M., Haehnel, M., Tetsutani, N.: ‘Designing, playing, and performing with a vision-based mouth interface’. Proc. 2003 Conf. on New Interfaces for Musical Expression, Montreal, Canada, May 2003, pp. 116121.
    5. 5)
    6. 6)
      • 6. Su, J., Sarivastava, A., Souza, F., Sarkar, S.: ‘Rate-invariant analysis of trajectories on Riemannian manifolds with application in visual speech recognition’. Proc. IEEE Computer Vision and Pattern Recognition, Columbus, OH, USA, June 2014, pp. 620627.
    7. 7)
    8. 8)
      • 8. Bakry, A., Elgammal, A.: ‘MKPLS: manifold kernel partial least squares for lipreading and speaker identification’. Proc. IEEE Computer Vision and Pattern Recognition, Portland, OR, USA, June 2013, pp. 684691.
    9. 9)
      • 9. Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: ‘Incremental face alignment in the wild’. Proc. IEEE Computer Vision and Pattern Recognition, Columbus, OH, USA, June 2014, pp. 18591866.
    10. 10)
    11. 11)
    12. 12)
      • 12. Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.: ‘Interactive facial feature localization’. Proc. European Conf. on Computer Vision, Florence, Italy, October 2012, pp. 679692.
    13. 13)
    14. 14)
    15. 15)
      • 15. Baltrusaitis, T., Robinson, P., Morency, L.: ‘3D constrained local model for rigid and non-rigid facial tracking’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Providence, RI, USA, June 2012, pp. 1621.
    16. 16)
    17. 17)
    18. 18)
      • 18. Lan, Y., Theobald, B.-J., Harvey, R.: ‘View independent computer lip-reading’. Proc. IEEE Int. Conf. on Multimedia and Expo, Melbourne, Australia, July 2012, pp. 432437.
    19. 19)
    20. 20)
    21. 21)

Related content

This is a required field
Please enter a valid email address