Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

  • Maity, Sayan (Dept. of Industrial Engineering, University of Miami) ;
  • Abdel-Mottaleb, Mohamed (Dept. of Electrical and Computer Engineering, University of Miami) ;
  • Asfour, Shihab S. (Dept. of Electrical and Computer Engineering, University of Miami)
  • Received : 2018.04.26
  • Accepted : 2019.03.23
  • Published : 2020.02.29


Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.


  1. F. Karray, J. A. Saleh, M. N. Arab, and M. Alemzadeh, "Multi modal biometric systems: a state of the rt survey," in Proceedings of the 4th International Conference on Computational Intelligence, Robotics and Autonomous Systems (CIRAS), Palmerston North, New Zealand, 2007.
  2. S. Cadavid, M. H. Mahoor, and M. Abdel-Mottaleb, "Multi-modal biometric modeling and recognition of the human face and ear," in Proceedings of 2009 IEEE International Workshop on Safety, Security & Rescue Robotics (SSRR), Denver, CO, 2009, pp. 1-6.
  3. M. H. Mahoor and M. Abdel-Mottaleb, "A multimodal approach for face modeling and recognition," IEEE Transactions on Information Forensics and Security, vol. 3, no. 3, pp. 431-440, 2008.
  4. A. Ross and A. K. Jain, "Multimodal biometrics: an overview," in Proceedings of 2004 12th European Signal Processing Conference, Vienna, Austria, 2004. pp. 1221-1224.
  5. P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, HI, 2001, pp. 511-518.
  6. Z. Huang, Y. Liu, C. Li, M. Yang, and L. Chen, "A robust face and ear based multimodal biometric system using sparse representation," Pattern Recognition, vol. 46, no. 8, pp. 2156-2168, 2013.
  7. G. Fahmy, A. El-Sherbeeny, S. Mandala, M. Abdel-Mottaleb, and H. Ammar, "The effect of lighting direction/condition on the performance of face recognition algorithms," in Proceedings of SPIE 6534: Biometric Technology for Human Identification III. Bellingham, WA: International Society for Optics and Photonics, 2006.
  8. K. C. Lee, J. Ho, M. H. Yang, and D. Kriegman, "Visual tracking and recognition using probabilistic appearance manifolds," Computer Vision and Image Understanding, vol. 99, no. 3, pp. 303-331, 2005.
  9. A. M. Bronstein, M. M. Bronstein, and R. Kimmel, "Three-dimensional face recognition," International Journal of Computer Vision, vol. 64, no. 1, pp. 5-30, 2005.
  10. K. Etemad and R. Chellappa, "Discriminant analysis for recognition of human face images," Journal of the Optical Society of America A, vol. 14, no. 8, pp. 1724-1733, 1997.
  11. C. Lu and X. Tang, "Surpassing human-level face verification performance on LFW with GaussianFace," in Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, 2015, pp. 3811-3819.
  12. Y. Taigman, M. Yang, M. A. Ranzato, and L. Wolf, "Deepface: closing the gap to human-level performance in face verification," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, 2014, pp. 1701-1708.
  13. M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.
  14. P. P. Sarangi, B. P. Mishra, and S. Dehuri, "Multimodal biometric recognition using human ear and profile face," in Proceedings of 2018 4th International Conference on Recent Advances in Information Technology (RAIT), Dhanbad, India, 2018, pp. 1-6.
  15. Y. C. Chen, V. M. Patel, P. J. Phillips, and R. Chellappa, "Dictionary-based face recognition from video," in Computer Vision - ECCV 2012. Heidelberg: Springer, 2012, pp. 766-779.
  16. K. C. Lee, J. Ho, M. H. Yang, and D. Kriegman, "Video-based face recognition using probabilistic appearance manifolds," in Proceedings of 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Madison, WI, 2003, pp. 313-320.
  17. U. Park and A. K. Jain, "3D model-based face recognition in video," in Advances in Biometrics. Heidelberg: Springer, 2007, pp. 1085-1094.
  18. U. Park, A. K. Jain, and A. Ross, "Face recognition in video: adaptive fusion of multiple matchers," in Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, 2007, pp. 1-8.
  19. A. Lumini and L. Nanni, "Overview of the combination of biometric matchers," Information Fusion, vol. 33, pp. 71-85, 2017.
  20. A. Abaza, A. Ross, C. Hebert, M. A. F. Harrison, and M. S. Nixon, "A survey on ear biometrics," ACM Computing Surveys, vol. 45, no. 2, pp. 1-35, 2013.
  21. N. B. Boodoo and R. K. Subramanian, "Robust multi biometric recognition using face and ear images," International Journal of Computer Science and Information Security, vol. 6, no. 2, pp. 164-169, 2009
  22. D. R. Kisku, J. K. Sing, and P. Gupta, "Multibiometrics belief fusion," CoRR, 2010;
  23. X. Pan, Y. Cao, X. Xu, Y. Lu, and Y. Zhao, "Ear and face based multimodal recognition based on KFDA," in Proceedings of 2008 International Conference on Audio, Language and Image Processing, Shanghai, China, 2008, pp. 965-969.
  24. S. El-Naggar, A. Abaza, and T. Bourlai, "A study on human recognition using auricle and side view face images," in Surveillance in Action. Cham: Springer, 2018, pp. 77-104.
  25. J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng, "Multimodal deep learning," in Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, 2011, pp. 689-696.
  26. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in Neural Information Processing Systems, vol. 25, pp. 1097-1105, 2012.
  27. S. Chopra, R. Hadsell, and Y. LeCun, "Learning a similarity metric discriminatively, with application to face verification," in Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, 2005, pp. 539-546.
  28. G. Goswami, R. Bhardwaj, R. Singh, and M. Vatsa, "MDLFace: memorability augmented deep learning for video face recognition," in Proceedings of IEEE International Joint Conference on Biometrics, Clearwater, FL, 2014, pp. 1-7.
  29. D. Menotti, G. Chiachia, A. Pinto, W. R. Schwartz, H. Pedrini, A. X. Falcao, and A. Rocha, "Deep representations for iris, face, and fingerprint spoofing detection," IEEE Transactions on Information Forensics and Security, vol. 10, no. 4, pp. 864-879, 2015.
  30. S. Gao, Y. Zhang, K. Jia, J. Lu, and Y. Zhang, "Single sample face recognition via learning deep supervised autoencoders," IEEE Transactions on Information Forensics and Security, vol. 10, no. 10, pp. 2108-2118, 2015.
  31. P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, "The FERET evaluation methodology for face-recognition algorithms," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 10, pp. 1090-1104, 2000.
  32. K. Chang, K. W. Bowyer, S. Sarkar, and B. Victor, "Comparison and combination of ear and face images in appearance-based biometrics," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 9, pp. 1160-1165, 2003.
  33. C. Liu and H. Wechsler, "Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition," IEEE Transactions on Image processing, vol. 11, no. 4, pp. 467-476, 2002.
  34. R. Khorsandi, S. Cadavid, and M. Abdel-Mottaleb, "Ear recognition via sparse representation and Gabor filters," in Proceedings of 2012 IEEE 5th International Conference on Biometrics: Theory, Applications and Systems (BTAS), Arlington, VA, 2012, pp. 278-282.
  35. S. Urolagin, K. V. Prema, and N. S. Reddy, "Rotation invariant object recognition using Gabor filters," in Proceedings of 2010 5th International Conference on Industrial and Information Systems, Mangalore, India, 2010, pp. 404-407.
  36. M. Yang and L. Zhang, "Gabor feature based sparse representation for face recognition with Gabor occlusion dictionary," in Computer Vision - ECCV 2010. Heidelberg: Springer, 2010, pp. 448-461.
  37. Y. Bengio, "Learning deep architectures for AI," Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1-127, 2009.
  38. D. E. Rumelhart and J. L. MacClelland, Parallel Distributed Processing, Explorations in the Microstructure of Cognition. Cambridge, MA: MIT Press, 1986.
  39. I. Jolliffe, "Principal Component Analysis," in Encyclopedia of Statistics in Behavioral Science. Hoboken, NJ: John Wiley & Sons, 2005.
  40. S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio, "Contractive auto-encoders: explicit invariance during feature extraction," in Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, 2011, pp. 833-840.
  41. P. Lennie, "The cost of cortical computation," Current Biology, vol. 13, no. 6, pp. 493-497, 2003.
  42. P. Vincent, H. Larochelle, Y. Bengio, and P. A. Manzagol, "Extracting and composing robust features with denoising autoencoders," in Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 2008, pp. 1096-1103.
  43. G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-507, 2006.
  44. Y. Bengio and Y. LeCun, "Scaling learning algorithms towards AI," in Large-Scale Kernel Machines. Cambridge, MA: MIT Press, 2007, pp. 321-360.
  45. M. Ranzato, C. Poultney, S. Chopra, and Y. L. Cun, "Efficient learning of sparse representations with an energy-based model," Advances in Neural Information Processing Systems, vol. 19, pp. 1137-1144, 2006.
  46. Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy layer-wise training of deep networks," Advances in Neural Information Processing Systems, vol. 19, pp. 153-160, 2006.
  47. R. Lengelle and T. Denoeux, "Training MLPs layer by layer using an objective function for internal representations," Neural Networks, vol. 9, no. 1, pp. 83-97, 1996.
  48. G. E. Hinton, S. Osindero, and Y. W. Teh, "A fast learning algorithm for deep belief nets," Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.
  49. H. Mohimani, M. Babaie-Zadeh, and C. Jutten, "A fast approach for overcomplete sparse decomposition based on smoothed $l_0$ norm," IEEE Transactions on Signal Processing, vol. 57, no. 1, pp. 289-301, 2008.
  50. A. A. Ross, K. Nandakumar, and A. K. Jain, Handbook of Multibiometrics. New York, NY: Springer, 2006.
  51. F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust Statistics: The Approach Based on Influence Functions. New York, NY: John Wiley & Sons, 2011.
  52. M. Aharon, M. Elad, and A. Bruckstein, "K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation," IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311-4322, 2006.