DOI QR코드

DOI QR Code

Face Recognition using Correlation Filters and Support Vector Machine in Machine Learning Approach

  • Long, Hoang (Dept. of Artificial Intelligence Convergence, Pukyong National University) ;
  • Kwon, Oh-Heum (Dept. of IT Convergence and Application Engineering, Pukyong National University) ;
  • Lee, Suk-Hwan (Dept. of Computer Engineering, Dong-A University) ;
  • Kwon, Ki-Ryong (Dept. of IT Convergence and Application Engineering, Pukyong National University)
  • Received : 2020.11.09
  • Accepted : 2021.04.06
  • Published : 2021.04.30

Abstract

Face recognition has gained significant notice because of its application in many businesses: security, healthcare, and marketing. In this paper, we will present the recognition method using the combination of correlation filters (CF) and Support Vector Machine (SVM). Firstly, we evaluate the performance and compared four different correlation filters: minimum average correlation energy (MACE), maximum average correlation height (MACH), unconstrained minimum average correlation energy (UMACE), and optimal-tradeoff (OT). Secondly, we propose the machine learning approach by using the OT correlation filter for features extraction and SVM for classification. The numerical results on National Cheng Kung University (NCKU) and Pointing'04 face database show that the proposed method OT-SVM gets higher accuracy in face recognition compared to other machine learning methods. Our approach doesn't require graphics card to train the image. As a result, it could run well on a low hardware system like an embedded system.

Keywords

1. INTRODUCTION

Correlation Filters (CF), for example, some of the most popular correlations fitters such as MACH, MACE, and their alternative[1,2], applied outstanding to various pattern recognition. In CF methods, we can construct an array of frequency domain for catching the desired parts and deemphasizing the undesired parts of training images to take changes in these images. For instance, face recognition is obtained by cross-correlating an input image with a filter and matching with the correlation output. The images for testing will match the training images if we have a peak on the correlation plane. Fig. 1 presents the implementation of face recognition utilizing a CF and Fast Fourier Transforms (FFTs).

MTMDCW_2021_v24n4_528_f0001.png 이미지

Fig. 1. Applying correlation filters in face recognition.

MACH filter is a type of correlation filter, which maximizes the correlation peak intensity like a response to the average training images. Another category of CF is MACE filters, which produce sharp correlations and offer excellent discrimination performance[3].

In this paper, we will develop a machine learning method, based on OT correlation filters in face recognition. The paper will summarize the related work in section 2, then present the MACE, MACH, UMACE, OT filter, and their combination with SVM in section 3. We show experimental results in section 4 and give a conclusion in section 5.

2. RELATED WORKS

We have two states: accept and reject in the facial security system, which are determined by the matching of a testing face with the face in the database. Correlation filters, such as the MACE filter, can handle this assignment completely. Marios Savvides [4] used a database of the facial expression from the lab of Advanced Multimedia Processing at CMU for performance evaluation of the MACE filter. Zhu et al. [5] introduced the feature correlation filter (FCF), an extended version of the MACE filter in feature space, tests on CMU’s database. Another work on correlation filters is the MACH filter, which had been proposed by Nevel et al. [6], used for classifying the laser radar imagery. The MACH filter is also an efficient method to solve the problem in action recognition [7].

Random Forest (RF) based algorithm, which is an unsupervised machine learning method, is popular in computer vision and in solving facial recognition. The author in [8] presented an improved random forest-based method for face recognition by weighting the features when constructing tree predictors in the random forest. Several popular feature extraction methods have been used in [9,10] for face classification, such as Local Binary Patterns, Spatial Pyramid Based Local Descriptor and even Grey-scale intensity features.

In this paper, we first evaluate the facial recognition with four correlation filters: MACE, MACH, UMACE, and OT; then combine the OT with SVM to improve the recognition accuracy. We use the NCKU face database[11] and Pointing’04 face database[12] for testing because the CMU dataset is not free. Before moving to the next section, we review the ideal of SVM for solving classification problems. Fig. 2 shows two classification cases in SVM: linearly separable and non-linear separable. For the first case, SVM will find the decision boundary (hyperplane) with the largest margin to separate between two classes[13]. We can overcome the second case with the kernel trick, which has three popular types: linear, polynomial, and Radial Based Function - RBF. We use the RBF kernel for the proposed method because Yekkehkhany et al. indicate that the SVM with RBF gets higher classification results[14].

MTMDCW_2021_v24n4_528_f0002.png 이미지

Fig. 2. SVM: (a) linearly separable and (b) non-linear separable.

3. DESIGN OF CORRELATION FILTERS – SUPPORT VECTOR MACHINE

3.1. MACE, MACH, UMACE, and OT filter design

Assume that we have M training images, denoted by xi (i =1,2,..., M), in which each image has the d1×d2 dimension. Vector h represents the correlation filter in a spatial domain.

We can convert the image from the spatial domain to the frequency domain by discrete Fourier transforms (DFT). Xi and H represent the 2D DFT of 2D images, xi, and the 2D filter, h, respectively.

The correlation result of the ith image and the filter is:

\(g_{i}(p, q)=h(p, q) \otimes x_{i}(p, q)\)       (1)

where ⊗ represents the correlation operator; xi(p, q) represents the ith image; and (p, q) are pixel values of the ith image. We express the 2D DFT of the correlation function gi by G. The MACE filter regulated the correlation plane's shape, gi (p, q). We solve the minimization problem of the average correlation energy (ACE) of the correlation plane from images of the training set to manage the shape of the correlation output[15].

The ACE can be expressed as:

\(E_{\text {avg }}=h^{\dagger} D h\)       (2)

Where diagonal matrix Di = XiX*i is the power spectral density, and diagonal matrix \(D=\frac{1}{M} \sum_{i=1}^{M} D_{i}\) is its mean from the training set.

The MACE filter is to minimize the ACE, so we find vector h that minimizes h Dh with the condition Xh = u. The solution can be described as:

\(h=D^{-1} X\left(X^{\dagger} D^{-1} X\right)^{-1} u\)       (3)

We have already obtained the formula for a MACE filter. The other filters are also mentioned in this section.

The MACH filter minimizes the variance of a correlation plane between images in the training set and maximizes the output of the average correlation at the origin. These filters are observed to be statistically optimally for at least the Gaussian noise assumption. As a result, MACH filters are regularly more distortion tolerant than MACE filters. The MACH filter in the frequency area is defined as

\(h=S^{-1} w\)       (4)

\(S=\frac{1}{M d_{1} d_{2}} \sum_{i=1}^{M}\left(I_{i}-W\right)\left(I_{i}-W\right)^{*}\)       (5)

where, a d1 × d2 diagonal matrix, S, compares the similarity of the images in training set with the class mean; w is a column vector of Fourier coefficients of the average training images; Ii and W are d1 × d2 diagonal matrices which contain the Fourier coefficients of ith image and mean of the images along the diagonal in the training set, respectively. The UMACE filter, the special version of the MACH filter (substitute S by D), can be written as:

\(h=\mu \lambda D^{-1} w\)       (6)

where the term µλ is constant, generally is set to 1, and does not affect the filter performance.

So far, we obtain the formula for the MACE, MACH, and UMACE filters. MACE is a particular instance of the well-known Synthetic Discriminant Function (SDF)[16]. The SDF filter has four inconsistent components, namely, output noise variance (ONV), average correlation height (ACH), average similarity measure (ASM), and average correlation energy (ACE). The ideal filter should reduce ONV, ACE, and ASM and increase ACH using optimal trade-off filters. The energy function is determined by equation 4[17].

\(\begin{aligned} E(h) &=\alpha(O N V)+\beta(A C E)+\gamma(A S M)-\delta(A C H) \\ &=\alpha h^{+} C h+\beta h^{+} D h+\gamma h^{+} S h-\delta\left|h^{+} m\right| \end{aligned}\)       (7)

We minimize this function associated with one constraint, and the others are constant. The answer will be the optimal-tradeoff (OT) filter:

\(h=\frac{m}{\alpha C+\beta D+\gamma S}\)       (8)

where α, β, and γ are the nonnegative optimal trade-off parameters.

3.2 OT-SVM design

Fig. 3 and 4 show the proposed method. Suppose we have N different people in our dataset. We create each correlation filters hk from M training images of the kth person. We implement the Fourier transform for all training images of N classes and use the outcomes to design all the correlation filters hk (k=1, 2, …, N), which are multiplied by the output of each training image after taking the Fourier transform. The inverse Fourier transformed was applied to the output of each multiplication, and give the final result, which use to determine the correlation peak in the traditional correlation filter methods. In our method, we use the final results to calculate the peak to side-lobe ratio (PSR) values, which are given by equation 7. The PSR values will be the input feature vectors of the SVM for the training phase. The process in the testing phase is similar to the training one, except that the SVM will predict the class for PSR values, also the class of each testing image. Each of the testing images in the dataset will create the 1xN feature vector because we have N correlation filters.

MTMDCW_2021_v24n4_528_f0003.png 이미지

Fig. 3. The proposed method uses for the training set.

MTMDCW_2021_v24n4_528_f0004.png 이미지

Fig. 4. The proposed method uses for the testing set.

\(P S R=\frac{\max (\operatorname{cor} r)-\operatorname{mean}(\operatorname{cor} r)}{s t d(\operatorname{cor} r)}\)       (9)

where max(corr), mean(corr), and std(corr) are the maximum, mean, and the standard deviation of the correlation plane, respectively.

3. NUMERICAL RESULTS

The experiments run on an INTEL CPU i7 7700 with 32 GB RAM. First of all, we utilize the NCKU face database to estimate the performance of the correlation filters. We pick randomly 15 subjects from the database. There are 74 images in total for each subject, in which a half were generated by rotating the pain every 5 degrees from +90° (the right profile) to -90° (the left profile). Then, the authors in [10] use the commercial software in image processing for the horizontal flipping process on these existing 37 images to obtain the remaining 37 images. Fig. 5 is an example in the NCKU face database, pictures from 0° (frontal) to -90° (left profile), obtained from this process.

MTMDCW_2021_v24n4_528_f0005.png 이미지

Fig. 5. The different views of one class in NCKU face database.

We take a training/testing split on the data using 59 images for training and 15 for testing with each subject. The training data will be 885 images and 225 for testing data with the whole dataset. As seen in table 1, the MACE filter yields an accuracy of 84.00%, while MACH and UMACE are 46.67% and 82.22%, respectively. Fig. 6 shows the detailed result of the MACE filter. MACH filter gives the worst result. In contrast, the MACE filter classifies well for most of the classes, except for two groups. Each person, 5 and 7 in group 1, has an accuracy of 46.7%. Three persons in group 2 have the same accuracy of 66.7%.

Table 1. The accuracy of various correlation filters on the NCKU database.

MTMDCW_2021_v24n4_528_t0001.png 이미지

MTMDCW_2021_v24n4_528_f0006.png 이미지

Fig. 6. The confusion matrix of MACE.

For each person in the dataset, for example, person A, we construct one correlation filter from 59 training images. For each testing image, we calculate the correlation output with the correlation filters. We have a peak on the correlation plane if the testing image is person A, and vice versa, as shown in Fig. 7.

MTMDCW_2021_v24n4_528_f0007.png 이미지

Fig. 7. The correlation peak shows how well the training and test images match: (a) No match and (b) Match.

So far, we have already compared the accuracy of four correlation filters: MACE, MACH, UMACE and OT. We evaluate the combination of the correlation filters and SVM in face recognition in the next step. We use the correlation filters to calculate the PSR values, which use as input vector of SVM.

Firstly, we compare the proposed method with the original correlation filter, as seen in table 1. The proposed method improves the accuracy of 12.44%, 46.22%, 13.78%, and 11.99% when compared to traditional filters MACE, MACH, UMACE, and OT, respectively (see table 1). All four filters get accuracy over 90% when using SVM to classify extracted features. Specifically, the OT-SVM got the highest accuracy of 99.55%. Fig. 8 gives the detailed result of the MACE-SVM in face recognition. The SVM-MACE has five classes under 100% accuracy. The SVM-MACH got the lowest accuracy at 92.89% but significantly improve with the original MACH (+46.22%).

MTMDCW_2021_v24n4_528_f0008.png 이미지

Fig. 8. The confusion matrix of MACE-SVM.

Secondly, we use both NCKU and Pointing’04 face database to compare the proposed method with other machine learning methods. For a fair observation, we split the face database the same way that the authors mention in [8-9]. Specifically, 10% of Pointing’04 images are testing. With NCKU, we use a subset containing the first 37 images of all subjects (see [8]), and one-third of them are testing set. We chose γ = 1 and α = β = 0.5 to create the optimal trade-off filter. The OT filter will extract the features from images, and SVM classifies these features. OT-SVM increases the accuracy by 11.03% on Pointing’04 and 5.87% on NCKU when compare to other machine learning methods (see table 2).

Table 2. The accuracy of various methods on the Pointing’04 and NCKU database (N/A: Not Available).

MTMDCW_2021_v24n4_528_t0002.png 이미지

Table 2 shows the result of several deep learning approaches on the Pointing’04 dataset. The deep learning approach in Reference [18] evaluates the Pointing’04 dataset with three different deep learning architectures: AlexNet, VGGNet, and ResNet50. Reference [19] applies a specific convolutional neural network structure: ConvNets, using two loss functions: alpha divergence (alpha) and KullbackLeibler (KL). Two different CNN models are evaluated on the Pointing’04 dataset (see Reference [20]). OT-SVM is more accurate than other deep learning methods, except ConvNet (KL) method. The ConvNet (KL) method requires a good GPU (an NVIDIA K40 GPU with 12GB memory), while a based-machine learning method such as OTSVM runs well without the GPU.

5. CONCLUSION

Face recognition system is a computational technology that uses different algorithms that recognize and then validate the recorded faces with facial images stored in the database. Face recognition is an essential topic in computer vision, and many researchers have researched this subject in various ways concerning many applications such as surveillance systems. This paper presents an application of machine learning and correlation filters on face recognition. The OT filter obtains the features from images, and SVM classifies them. Experimental results show that the OT-SVM method gives a higher accuracy compared to the traditional filters and other machine learning and deep learning methods. OT-SVM doesn’t require the GPU, and it could run well on limited hardware resources like embedded systems. For future work, we will integrate the proposed method with the embedded hardware and develop the full face recognition system.

This research was supported by the Basic Science Research Program through the National Re-search Foundation of Korea(NRF) funded by the Ministry of Education (2020R1I1A306659411, 2020R1F1A1069124) and the Ministry of Trade, Industry, and Energy for its financial support of the project titled “the establishment of advanced marine industry open laboratory and development of realistic convergence content”.

References

  1. Q.W. Shahid and A.N. Alvi, "Object Tracking using MACH Filter and Optical Flow in Cluttered Scenes and Variable Lighting Conditions," World Academy of Science, Engineering and Technology, Vol. 60, pp. 709-712, 2009.
  2. M. David and L.Y. Yu, "Face Recognition Subject to Variations in Facial Expression, Illumination and Pose using Correlation Filters," Journal Computer Vision and Image Understanding, Vol. 104, No. 1, pp. 1-15, 2006. https://doi.org/10.1016/j.cviu.2006.06.004
  3. G. Verma and A. Sinha, "Design of Advanced Correlation Filters for Finger Knuckle Print Authentication Systems," Proceedings of International Conference on Computer Vision and Image Processing, pp. 47-56, 2017.
  4. M. Savvides, B.V. Kumar, and P. Khosla, "Face Verification using Correlation Filters," Proceedings of the 3rd IEEE Automatic Identification Advanced Technologies, pp. 56-61, 2002.
  5. X. Zhu, S. Liao, Z. Lei, R. Liu, and S.Z. Li, "Feature Correlation Filter for Face Recognition," Proceedings of International Conference on Biometrics, pp. 77-86, 2007.
  6. A.J.V. Nevel and A. Mahalanobis, "Comparative Study of Maximum Average Correlation Height Filter Variants using Ladar Imagery," Optical Engineering, Vol. 42, No. 2, pp. 541- 551, 2003. https://doi.org/10.1117/1.1531975
  7. J. Ahmed, S. Abbasi, and M.Z.Shaikh, "Fast Spatiotemporal MACH Filter for Action Recognition," Machine vision and applications, Vol. 24, No. 5, pp. 909-918, 2013. https://doi.org/10.1007/s00138-013-0484-2
  8. R. Zhu, G. Sang, Y. Cai et al., "Head Pose Estimation with Improved Random Regression Forests," Proceedings of the 8th Chinese Conference on Biometric Recognition (CCBR '13), pp. 457-465, 2013.
  9. S. Gaoli, H. Chen, and Q. Zhao, "Head Pose Estimation with Improved Random Regression Forests," Mathematical Problems in Engineering, Vol. 2015, 2015.
  10. K.T. Kim and J.Y. Choi, "Using Spatial Pyramid Based Local Descriptor for Face Recognition," Journal of Korea Multimedia Society, Vol. 20, No. 5, pp. 758-768, 2017. https://doi.org/10.9717/kmms.2017.20.5.758
  11. The National Cheng Kung University Face Database, http://www.datatang.com/data/14866 (accessed March 10, 2020).
  12. Pointing'04 Database, http://www-prima.inrialpes.fr/perso/Gourier/Faces/HPDatabase.html (accessed March 10, 2020).
  13. A. Geron, Hands-On Machine Learning with Scikit-Learn & TensorFlow, O'Reilly Media, Inc., CA 95472, USA, 2017.
  14. B. Yekkehkhany, A. Safari, S. Homayouni, and M. Hasanlou, "A Comparison Study of Different Kernel Functions for Svm-Based Classification of Multi-Temporal Polarimetry SAR Data," International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XL-2/W3, pp. 281-285, 2014. https://doi.org/10.5194/isprsarchives-XL-2-W3-281-2014
  15. A. Mahalanobis, B.V.K.V. Kumar, and D. Casasent, "Minimum Average Correlation Energy Filters," Applied Optics, Vol. 26, No. 17, pp. 3633-3640, 1987. https://doi.org/10.1364/AO.26.003633
  16. B.V.K.V. Kumar, J.A. Fernandez, A. Rodriguez, and V.N. Boddeti, "Recent Advances in Correlation Filter Theory and Application," Proc. SPIE, Vol. 9094, pp. 909404, 2014.
  17. B.V.K.V. Kumar, D.W. Carlson, and A. Mahalanobis, "Optimal Trade-off Synthetic Discriminant Function Filters for Arbitrary Devices," Optics Letters, Vol. 19, pp. 1556-1558, 1994. https://doi.org/10.1364/OL.19.001556
  18. Y. Wang, W. Liang, J. Shen, Y. Jia, and L. F. Yu, "A Deep Coarse-to-fine Network for Head Pose Estimation from Synthetic Data," Pattern Recognition, Vol. 94, pp. 196-206, 2019. https://doi.org/10.1016/j.patcog.2019.05.026
  19. B.B. Gao, C. Xing, C.W. Xie, J. Wu, and X. Geng, "Deep Label Distribution Learning with Label Ambiguity," IEEE Transactions on Image Processing, Vol. 26, No. 6, pp. 2825-2838, 2017. https://doi.org/10.1109/TIP.2017.2689998
  20. S. Lee and T. Saitoh, "Head Pose Estimation Using Convolutional Neural Network," Proceedings of the IT Convergence and Security, pp. 164-171, 2017.