Although many 3D head pose estimation methods based on monocular vision can achieve an accuracy of 5°, how to reduce the number of required training samples and how to not to use any hardware parameters as input features are still among the biggest challenges in the field of head pose estimation. To aim at these challenges, the authors propose an accurate head pose estimation method which can act as an extension to facial key point detection systems. The basic idea is to use the normalised distance between key points as input features, and to use ℓ¹-minimisation to select a set of sparse training samples which can reflect the mapping relationship between the feature vector space and the head pose space. The linear combination of the head poses corresponding to these samples represents the head pose of the test sample. The experiment results show that the authors’ method can achieve an accuracy of 2.6° without any extra hardware parameters or information of the subject. In addition, in the case of large head movement and varying illumination, the authors’ method is still able to estimate the head pose.

References

1. 1)
  - 15. Hara, K, Chellappa, R.: ‘Growing regression forests by classification: applications to object pose estimation’, Comput. Sci., 2013, 8690, pp. 555–567.
2. 2)
  - 8. An, K.H., Chung, M.J.: ‘3D head tracking and pose-robust 2D texture map-based face recognition using a simple ellipsoid model’. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems. IEEE, 2008, pp. 307–312.
3. 3)
  - 17. Wright, J., Yang, A.Y., Ganesh, A., et al: ‘Robust face recognition via sparse representation’, IEEE Trans. Pattern Anal. Mach. Intell., 2009, 31, (2), pp. 210–227.
4. 4)
  - 5. Kruger, N., Potzsch, M., Malsburg, C.V.D.: ‘Determination of face position and pose with a learned representation based on labelled graphs’, Image Vis. Comput., 1997, 15, (8), pp. 665–673.
5. 5)
  - 13. Ng, J., Gong, S.: ‘Multi-view face detection and pose estimation using a composite support vector machine across the view sphere’. Proc. IEEE Int. Workshop Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 1999, pp. 14–21.
6. 6)
  - 23. Wright, J., Ganesh, A., Zhou, Z., et al: ‘Demo: Robust face recognition via sparse representation’. IEEE Int. Conf. on Automatic Face and Gesture Recognition, Fg 08, 2008, pp. 1–2.
7. 7)
  - 18. Funes Mora, K.A., Odobez, J.: ‘Gaze estimation from multimodal Kinect data’. 2012 IEEE Computer Society Conf. on IEEE Computer Vision and Pattern Recognition Workshops (CVPRW), 2012, pp. 25–30.
8. 8)
  - 6. Wiskott, L., Fellous, J.-M., Kruger, N., et al: ‘Face recognition and gender determination’, in Bichsel, M. (Ed.): ‘Proc. int. workshop on automatic face and gesture recognition’, Zurich (Published by theMultimedia Laboratory, 1995).
9. 9)
  - 26. http://www.cs.bu.edu/groups/ivc/HeadTracking/.
10. 10)
  - 24. Donoho, D.L., Tsaig, Y.: ‘Fast solution of l1-norm minimization problems when the solution may be sparse’, IEEE Trans. Inf. Theory, 2008, 54, (11), pp. 4789–4812.
11. 11)
  - 7. Valenti, R, Sebe, N, Gevers, T.: ‘Combining head pose and eye location information for gaze estimation’, IEEE Trans. Image Process., 2012, 21, (2), pp. 802–815.
12. 12)
  - 9. Sung, J., Kanade, T., Kim, D.: ‘Pose robust face tracking by combining active appearance models and cylinder head models’, Int. J. Comput. Vis., 2008, 80, (2), pp. 260–274.
13. 13)
  - 29. http://www-prima.inrialpes.fr/Pointing04/data-face.html.
14. 14)
  - 1. Brown, L.M., Tran, Y.L.: ‘Comparative study of coarse head pose estimation’. IEEE Workshop on Motion and Video Computing. IEEE Computer Society, 2002, pp. 125–130.
15. 15)
  - 2. Wu, J., Trivedi, M.M.: ‘A two-stage head pose estimation framework and evaluation’, Pattern Recognit., 2008, 41, (3), pp. 1138–1158.
16. 16)
  - 19. Funes Mora, K.A., Odobez, J.: ‘Person independent 3d gaze estimation from remote rgb-dcameras’. ICIP, 2013.
17. 17)
  - 4. Ba, S., Odobez, J.: ‘From camera head pose to 3D global room head pose using multiple camera views’. Proc. Int. Workshop Classification Events Activities Relationships, 2007.
18. 18)
  - 11. Huang, S.K., Trivedi, M.M.: ‘Robust real-time detection, tracking, and pose estimation of faces in video streams’. Proc. of Int. Conf. on Pattern Recognition. ICPR., 2004, vol. 3, pp. 965–968.
19. 19)
  - 10. Zhang, H., Zhou, Y., Chen, L., et al: ‘Estimating face pose by facial asymmetry and geometry’. 2013 10th IEEE Int. Conf.rence and Workshops on Automatic Face and Gesture Recognition (FG). IEEE Computer Society, 2004, pp. 651–656.
20. 20)
  - 27. Roweis, S.T., Saul, L.K.: ‘Nonlinear dimensionality reduction by locally linear embedding’, Science, 2000, 290, (5500), pp. 2323–2326.
21. 21)
  - 12. Ng, J., Gong, S.: ‘Composite support vector machines for detection of faces across views and pose estimation’, Image Vis. Comput., 2002, 20, (5–6), pp. 359–368.
22. 22)
  - 16. Geng, X., Xia, Y.: ‘Head pose estimation based on multivariate label distribution’. 2014 IEEE Conf. on IEEE Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1837–1842.
23. 23)
  - 25. Donoho, D.L.: ‘For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution’, Commun. Pure Appl. Math., 2006, 59, (6), pp. 797–829..
24. 24)
  - 21. Wang, J.G., Sung, E.: ‘EM enhancement of 3D head pose estimated by point at infinity’, Image Vis. Comput., 2007, 25, (12), pp. 1864–1874.
25. 25)
  - 20. Ren, S., Cao, X., Wei, Y., et al: ‘Face alignment at 3000 FPS via regressing local binary features’. 2014 IEEE Conf. on IEEE Computer Vision and Pattern Recognition (CVPR),, 2014, pp. 1685–1692.
26. 26)
  - 22. Gee, A., Cipolla, R.: ‘Determining the gaze of faces in images’, Image Vis. Comput., 1994, 12, (10), pp. 639–647.
27. 27)
  - 3. Malassiotis, S., Strintzis, M.G.: ‘Robust real-time 3D head pose estimation from range data’, Pattern Recognit., 2005, 38, (8), pp. 1153–1165.
28. 28)
  - 28. Xiao, J., Moriyama, T., Kanade, T., et al: ‘Robust full-motion recovery of head by dynamic templates and re-registration techniques’, Int. J. Imaging Syst. Technol., 2003, 13, (1), pp. 85–94.
29. 29)
  - 14. Huang, D., Storer, M., De la Torre, F., et al: ‘Supervised local subspace learning for continuous head pose estimation’. 2011 IEEE Conf. on IEEE Computer Vision and Pattern Recognition (CVPR),, 2011, pp. 2921–2928.

Training-based head pose estimation under monocular vision

References

Related content