access icon free New interface for musical instruments using lip reading

As smart audio-visual multimedia devices are developed for various applications, there has been a growing interest in effective human–computer interaction (HCI) interfaces for specific environments. There have also been great efforts to implement HCI interfaces into musical instruments, in which it would be possible to take intuitions, comfort and expressiveness into the musical instruments. However, most of the traditional HCI interfaces are not applicable because both hands are likely to be occupied while playing a musical instrument. In this environment, a lip reading method can be used. A lip reading method is a HCI method that analyses lip motion to recognise spoken words. In this study, a lip reading method is proposed with its application. As a specific example of the interface for musical instruments, the guitar effector application is presented. The proposed lip reading method uses a constrained local model instead of the conventional active appearance model for effective facial feature tracking. The proposed method also uses a dynamic time warping-based classifier for word recognition which is effective for simple real-time implementation of lip reading. The proposed lip reading method shows 85.0% word recognition accuracy on OuluVS database and is effectively applied to the proposed guitar effector application.

Inspec keywords: audio-visual systems; multimedia systems; human computer interaction; motion estimation; word processing; feature extraction; image classification; musical instruments

Other keywords: dynamic time warping-based classifier; OuluVS database; guitar effector application; lip reading method; lip motion analyses; HCI interface method; musical instruments; human computer interaction; facial feature tracking; constrained local model; smart audiovisual multimedia device; spoken word recognition accuracy; active appearance model

Subjects: Humanities computing; Image recognition; Computer vision and image processing techniques; User interfaces; Multimedia; Multimedia communications; Word processing

References

    1. 1)
      • 4. Lyons, M., Haehnel, M., Tetsutani, N.: ‘Designing, playing, and performing with a vision-based mouth interface’. Proc. 2003 Conf. on New Interfaces for Musical Expression, Montreal, Canada, May 2003, pp. 116121.
    2. 2)
    3. 3)
      • 8. Bakry, A., Elgammal, A.: ‘MKPLS: manifold kernel partial least squares for lipreading and speaker identification’. Proc. IEEE Computer Vision and Pattern Recognition, Portland, OR, USA, June 2013, pp. 684691.
    4. 4)
    5. 5)
    6. 6)
    7. 7)
      • 1. Cook, P.: ‘Principles for designing computer music controllers’. Proc. 2001 Conf. on New Interfaces for Musical Expression, Seattle, WA, USA, April 2001, pp. 14.
    8. 8)
      • 9. Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: ‘Incremental face alignment in the wild’. Proc. IEEE Computer Vision and Pattern Recognition, Columbus, OH, USA, June 2014, pp. 18591866.
    9. 9)
    10. 10)
      • 12. Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.: ‘Interactive facial feature localization’. Proc. European Conf. on Computer Vision, Florence, Italy, October 2012, pp. 679692.
    11. 11)
    12. 12)
    13. 13)
    14. 14)
    15. 15)
      • 2. Tarabella, L., Bertini, G.: ‘Giving expression to multimedia performance’. Proc. ACM Workshops on Multimedia, Los Angeles, CA, USA, October 2000, pp. 3538.
    16. 16)
      • 18. Lan, Y., Theobald, B.-J., Harvey, R.: ‘View independent computer lip-reading’. Proc. IEEE Int. Conf. on Multimedia and Expo, Melbourne, Australia, July 2012, pp. 432437.
    17. 17)
    18. 18)
    19. 19)
      • 3. Funk, M., Kuwabara, K., Lyons, M.: ‘Sonification of facial actions for musical expression’. Proc. 2005 Conf. on New Interfaces for Musical Expression, Vancouver, Canada, May 2005, pp. 127131.
    20. 20)
      • 6. Su, J., Sarivastava, A., Souza, F., Sarkar, S.: ‘Rate-invariant analysis of trajectories on Riemannian manifolds with application in visual speech recognition’. Proc. IEEE Computer Vision and Pattern Recognition, Columbus, OH, USA, June 2014, pp. 620627.
    21. 21)
      • 15. Baltrusaitis, T., Robinson, P., Morency, L.: ‘3D constrained local model for rigid and non-rigid facial tracking’. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Providence, RI, USA, June 2012, pp. 1621.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-ipr.2014.1014
Loading

Related content

content/journals/10.1049/iet-ipr.2014.1014
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading