Facial Point Classifier using Convolution Neural Network and Cascade Facial Point Detector

컨볼루셔널 신경망과 케스케이드 안면 특징점 검출기를 이용한 얼굴의 특징점 분류

  • Received : 2015.08.23
  • Accepted : 2016.01.25
  • Published : 2016.03.01


Nowadays many people have an interest in facial expression and the behavior of people. These are human-robot interaction (HRI) researchers utilize digital image processing, pattern recognition and machine learning for their studies. Facial feature point detector algorithms are very important for face recognition, gaze tracking, expression, and emotion recognition. In this paper, a cascade facial feature point detector is used for finding facial feature points such as the eyes, nose and mouth. However, the detector has difficulty extracting the feature points from several images, because images have different conditions such as size, color, brightness, etc. Therefore, in this paper, we propose an algorithm using a modified cascade facial feature point detector using a convolutional neural network. The structure of the convolution neural network is based on LeNet-5 of Yann LeCun. For input data of the convolutional neural network, outputs from a cascade facial feature point detector that have color and gray images were used. The images were resized to $32{\times}32$. In addition, the gray images were made into the YUV format. The gray and color images are the basis for the convolution neural network. Then, we classified about 1,200 testing images that show subjects. This research found that the proposed method is more accurate than a cascade facial feature point detector, because the algorithm provides modified results from the cascade facial feature point detector.


cascade facial point detector;convolutional neural network;YUV format;human-robot interaction


  1. J. H. Yu, S. M. Park, K. E. Ko, and K. B. Sim, "Face classification using cascade facial detection and convolutional neural network," Proc. of Korean Institute of Intelligent Systems Fall Conference, vol. 25, no. 2, pp. 157-159, 2015.
  2. R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. G. Taylor, "Emotion recognition in human-computer interaction," IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32-80, Jan. 2001.
  3. M. Valstar, B. Martinez, X. Binefa, and M. Pantic, "Facial point detection using boosted regression and graph models," Proc. of 2010 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2729-2736, 2010.
  4. M. F. Valstar, I. Patras, and M. Pantic, "Facial action unit detection using probabilistic actively learned support vector machines on tracked facial point data," Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 76-83, Jun. 2005.
  5. Y. Sun, X. Wang, and X. Tang, "Deep convolutional network cascade for facial point detection," Proc. of 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476-3483, Jun. 2013.
  6. S. Berretti, B. B. Amor, M. Daoudi, and A. Del Bimbo, "3D facial expression recognition using SIFT descriptors of automatically detected keypoints," The Visual Computer, vol. 27, pp. 1021-1036, Jun. 2011.
  7. J. Wang, R. Xiong, and J. Chu, "Facial feature points detecting based on Gaussian mixture models," Pattern Recognition Letters, vol. 53, no. 1, pp. 62-68, Feb. 2015.
  8. E. Owusu, Y. Zhan, and Q. R. Mao, "An SVM-AdaBoost facial expression recognition system," Applied Intelligence, vol. 40, no. 3, pp. 536-545, Apr. 2014.
  9. Y. M. Hong, I. S. Lee, J. S. Park, Y. S. Jo, and C. B. Kim, "Improvement in viola-jones method for real-time face recognition system," The Korean Institute of Electrical Engineers (in Korean), vol. 61, no. 1, pp. 143-147, 2012.
  10. W. Burgin, C. Pantofaru, and W. D. Smart, "Using depth information to improve face detection," Proc. of the 6th International Conference on Human-Robot Interaction, NewYork, USA, pp. 119-120, 2011.
  11. A. Jain, J. Bharti, and M. K. Gupta, "Improvements in OpenCV's viola jones algorithm in face detection-tilted face detection," International Journal on Signal and Image Processing, vol. 5, no. 2, pp. 21-28, 2014.
  12. P. Viola and M. Jones "Rapid object detection using a boosted cascade of simple features," Proc. of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 511-518, 2001.
  13. R. Lienhart, A. Kuranov, and V. Pisarevsky, "Empirical analysis of detection cascades of boosted classifiers for rapid object detection," Pattern Recognition, vol. 2781, pp. 297-304, Sep. 2003.
  14. A Jain, J. Bharti, M. K. Gupta, "Improvements in OpenCV's viola jones algorithm in face detection-tilted face detection," International journal of Signal and Image Processing, vol. 5, pp. 21-28, 2014.
  15. W. Wang, J. Yang, J. Xiao, S. Li and D. Zhou, "Face recognition based on deep learning," Human Centered Computing, vol. 8944, pp. 812-820, Mar. 2015.
  16. Y. K. Park, J. K. Park, H. I. On, and D. J. Kang, "Convolutional neural network-based system for vehicle front-side detection," Journal of Institute of Control, Robotics and Systems (in Korean), vol. 21, no. 11, pp. 1008-1016, 2015.
  17. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proc. of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 2015.
  18. D. M. Kwak, S. W. Park, and H. N. Lee, Machine Learning to Deep Learning, PubPle, Seoul, 2015.
  19. Y. L. Boureau, J. Ponce, and Y. LeCun, "A theoretical analysis of feature pooling in visual recognition," Proc. of the 27th International Conference on Machine Learning, pp. 1-8, 2010.
  20. Y. Bengio, "Learning deep architectures for AI," Foundations and $Trends^{(R)}$ in Machine Learning, vol. 2, no. 1, pp. 1-127, Jan. 2009.
  21. R. Hecht-Nielsen, "Theory of the backpropagation neural network," International Joint Conference on Neural Networks, vol. 1, pp. 593-605, 1989.

Cited by

  1. Mobile Robot Control using Smart Phone for internet of Things vol.26, pp.5, 2016,
  2. Dictionary learning feature space via sparse representation classification for facial expression recognition pp.1573-7462, 2017,


Supported by : 한국연구재단