DOI QR코드

DOI QR Code

Anthropomorphic Animal Face Masking using Deep Convolutional Neural Network based Animal Face Classification

  • Khan, Rafiul Hasan (Dept. of IT Convergence and Application Engineering, Pukyong National University) ;
  • Lee, Youngsuk (Research Institute for Image & Culture Content, Dongguk University) ;
  • Lee, Suk-Hwan (Dept. of Information Security, Tongmyong University) ;
  • Kwon, Oh-Jun (Dept. of Computer Software Engineering, Dongeui University) ;
  • Kwon, Ki-Ryong (Dept. of IT Convergence and Application Engineering, Pukyong National University)
  • Received : 2019.03.22
  • Accepted : 2019.05.03
  • Published : 2019.05.31

Abstract

Anthropomorphism is the attribution of human traits, emotions, or intentions to non-human entities. Anthropomorphic animal face masking is the process by which human characteristics are plotted on the animal kind. In this research, we are proposing a compact system which finds the resemblance between a human face and animal face using Deep Convolutional Neural Network (DCNN) and later applies morphism between them. The whole process is done by firstly finding which animal most resembles the particular human face through a DCNN based animal face classification. And secondly, doing triangulation based morphing between the particular human face and the most resembled animal face. Compared to the conventional manual Control Point Selection system using an animator, we are proposing a Viola-Jones algorithm based Control Point selection process which detects facial features for the human face and takes the Control Points automatically. To initiate our approach, we built our own dataset containing ten thousand animal faces and a fourteen layer DCNN. The simulation results firstly demonstrate that the accuracy of our proposed DCNN architecture outperforms the related methods for the animal face classification. Secondly, the proposed morphing method manages to complete the morphing process with less deformation and without any human assistance.

Keywords

1. INTRODUCTION

 Anthropomorphism is the process of ascribing human characteristics to nonhuman things. The process of anthropomorphism is the application of morphing between human and animal. This application can be done between faces or between whole bodies. Till now, all the researches related to anthropomorphism have been done independently. Meaning, animal classes had been chosen arbitrarily not considering the fact of resemblance. Image morphing is the gradual transformation or transition from one image to another [1]. In recent times morphism has become very popular in films and television. A vast quantity of research in morphing techniques is undergoing for producing smooth transitions between images. The process of morphing includes coupling image warping with color in terpolation. Image warping applies 2D geometric transformations to the images to retain geometric alignment between their features, while color interpolation blends their colors [2].

 There are several morphing algorithms published such as One-Dimensional Morphing introduced in [3, 4], Cross-Dissolve Morphing introduced in [5], Mesh Warping introduced in [6], Field Morphing introduced in [7] and Triangulation-Based Morphing introduced in [8]. Conventional morphing is done between two images beginning with an animator establishing corresponding positions of mesh nodes, lines segments, curves or points. The correspondence is maintained by manually taking their positions. Also, the fact of resemblance often disregarded. The key purpose behind conventional morphing is to find a subtle technique of smooth transition between images. But, if we take the fact of ‘similarity’ into account then the process would be much smoother than expected. In finding ‘similarity’ there have been researches done regarding Animal Face Classification. Such as Tibor Trnovszky et al. introduced in [9], Guobin Chen et al. introduced in [10]. There are also some high-end DCNN algorithms such as Alexnet [11], GoogleNet [12], VGG19 [13] etc. Since our concentration was on the morphing process, so we wanted to build a DCNN algorithm which will be low in computing cost but at the same time will be more accurate than similar algorithms.

 In this paper, we are proposing a dense system containing Deep Convolutional Neural Network (DCNN) which will compare the features between the morphing images such a way that only the most similar images will be forwarded for the morphing process. Nowadays, DCNN is considered as the state of the art technique for image classification. DCNN is comprised of neurons that self-optimize through learning [14]. So, we took advantage of this technique and built our own classification model which produces the destination image for the morphing process. It all began by collecting dataset for the model. We collected and modified a ten class base dataset. Then we went on to build our fourteen layered DCNN which consists of updated parameters such as ‘Batch Normalization Layer’. We trained our model with Stochastic Gradient Descent with Momentum ‘SGDM’. Then we used this trained network to find which animal class is most similar to particular human faces. The result of the trained network is the most similar image of the animal class which was carried to the morphing process. Here, it should be notified that we modified the size of the image where it was required. After finishing comparing the features between the human face and the most similar animal face, the animal face was sent to the morphing process. The process of morphing starts by taking control points. Alternate to the manual Control Point selection in conventional morphing process, we are proposing a Viola-Jones [15] algorithm based Control Point selection system where the system takes the control points automatically. This system can be applied in all the point based morphing process. But, for the sake of simplicity, in this paper, we followed Triangulation-Based Morphing. Our Triangulation-Based Morphing follows Delaunay Triangulation.

 Our obtained results show that the accuracy of our DCNN is better than the related DCNN algorithms and the morphing output is much smoother and has less deformation ratio compare to its counterparts.

MTMDCW_2019_v22n5_558_f0001.png 이미지

MTMDCW_2019_v22n5_558_f0001.png 이미지

Fig. 1. Representation of the proposed method.

2. RELATED WORKS

2.1 Tibor Trnovszky et al.

 In this paper [9] Tibor Trnovszky et al. proposed a fully connected DCNN dedicated for five class animal face classification. They built a dataset of five class (bear, hog, deer, fox, and wolf) containing one hundred images in each class. Their proposed DCNN structure has been shown in Fig. 2. The input image contains 1024 pixels with a dimension of 32×32×3. Their convolutional network is divided into 8 blocks.

 In block A, animal faces from dataset were used as input data. Each animal face was resized into 32×32×3 pixels to improve the computation time. The input database was expanded to provide better experimental results. That means that the input data were scaled, rotated and shifted. Block B was2D DCNN layer which contains 16 feature maps with 3×3 kernel dimension. This 2D DCNN layer followed by a Rectified Linear Unit (ReLU) as an activation function. Block C contains a max pooling layer with a kernel dimension of 2×2 which subsequently followed by a dropout layer with a probability of 25%. They used dropout layer to reduce the overfitting issue. In block D, the second 2D DCNN was used with the same parameters as the first one, but the number of feature maps was doubled to 32. In block E, the Max Pooling layer and Dropout layer were used with the same value as in block C. As for the next block F, standard dense layer or popularly known as the fully connected layer was used. It had 256 neurons and as activation function ReLU was used. Then the next block G is followed by a Dropout layer with a probability set to 25%. For the eighth layer or block H, the output dense layer or fully connected layer was used for five classes with a softmax activation function.

MTMDCW_2019_v22n5_558_f0002.png 이미지

Fig. 2. DCNN proposed by Tibor TRNOVSZKY et al. [9].

2.2 Guobin Chen et al.

 In this paper, Goubin Chen et al. [10] proposed a novel DCNN based species recognition algorithm for wild animal classification on very challenging camera-trap imagery data. The images were captured with a motion-triggered camera. The moving foreground was selected as the region of interests and was fed to the proposed DCNN based species recognition algorithm. They designed DCNN with3 convolutional layers and 3 max-pooling layers. The convolutional layer has a kernel with a size of 9×9 while pooling layer has a kernel with a size of 2×2.

 The input layer size was 128×128 followed by a convolutional layer. The output of this operation was a 120×120 matrix. Then this layer is followed by the first pooling layer which produces 32 feature maps of size 60×60 and later it was fed to the second convolution layer. The output of the second convolution layer is 64 feature maps with a dimension of 52×52 for each map. Then it’s forwarded to the second pooling layer which produces 64 feature maps with a dimension of 26x26 for each one matrix. The second pooling layer is followed by the third convolution layer. The output of the third convolution layer is 32 feature maps with a dimension of 9×9. Subsequently, the third pooling layer comes and produces an output of 32 feature maps with a dimension of 9×9 matrices. After finishing applying the convolution functions, the fully connected layer comes and makes the convolutional output into a 2592 dimensional vector. This first fully connected layer is followed by another fully connected layer and a softmax layer. The softmax layer has 20 neurons and can be used to determine the label of the input image. They also used data augmentation step during their training stage.

MTMDCW_2019_v22n5_558_f0003.png 이미지

Fig. 3. DCNN proposed by Goubin Chen et al. [10].

3. PROPOSED METHOD

 The purpose of this work was to build a system which will use deep learning to extract features from images, compare between them and at the same time produce an automatic morphing process. Orthodox morphing process doesn’t check the similarity and also it needs the human touch to set the correspondence. Which somehow leads to less smooth and fragile morphing process. According to our proposed method, the process begins by taking a human face to find out the most similar animal class using our build-in DCNN architecture. This is the advantage of our proposed method as it checks the similarity issues before starting morphing. Then, the Viola-Jones [15] algorithm enables the system to take the corresponding human face points automatically. Since our animal classes were fixed, so, we took their corresponding points manually keeping a constant similar mapping with the human face all the time. After finishing taking all the control points, the Delaunay triangulation based morphing is applied between the human face and the animal face. The principal steps used in this method have been portrayed in Fig. 1.

3.1 Animal Dataset

 For the dataset, we collected images from various non-commercial sources. The created dataset includes ten classes of animals (bear, cat, deer, dog, elephant, horse, lion, rabbit, raccoon, raccoon and rat). Each class contains 1000 images and all the images were cropped for the face. Then all those images were resized to 227×227×3 since our network will be working with the same dimension. The visual of the process has been illustrated in Fig. 4. The Fig. 5 shows 40 images from the created animal dataset.

 There are variations in animal faces. Such as, some photos show a frontal view, some shows side view etc. Since our goal was to work with the faces, so all the images were taken in the frontal position with the easiness for some side movements. There were also some animal images different in scale. The accuracy of animal recognition system depends on the quality of the image dataset. The more different features it has the more accuracy the animal recognition system can achieve. Specifically, the variance in color, edges, and corners will differentiate between classes. So, we maintained a refined quality while we were choosing the images for the dataset.

MTMDCW_2019_v22n5_558_f0004.png 이미지

MTMDCW_2019_v22n5_558_f0004.png 이미지

Fig. 4. Dataset creation process.

MTMDCW_2019_v22n5_558_f0005.png 이미지

MTMDCW_2019_v22n5_558_f0005.png 이미지

Fig. 5. The example of the created animal database.

3.2 The proposed architecture for animal face
classification

 Our proposed DCNN is built with fourteen layers. There are total two Convolution Layers, three ReLU Layers, two Batch Normalization Layers, two Max Pooling Layers, two Fully Connected Layers and one of each Input Layer, Softmax Layer and Classification output Layer. Our goal was to create a network of low in computing cost but high at accuracy rate. So, we went through the current works related with animal face classification based on DCNN. All of these works have similar functions except the normalization function. Usually when it comes to a large amount of data analysis then the use of normalization becomes handy. In image processing, normalization is used to change the range of pixel intensity values. Normalization is also kwon as contrast stretching or histogram stretching. The purpose of normalization in the various applications is usually to bring the image, or another type of signal, into a range that is more familiar or normal. The purpose is to achieve consistency in the dynamic range for a set of data, signals, or images to avoid mental distraction or fatigue. There are mainly two types of normalization used in image processing such as – Channel-wise local response normalization and Batch Normalization. In this research, we used Batch Normalization.

 A batch normalization function normalizes each input across a mini-batch. To establish a fast training DCNN and to reduce the sensitivity of network initialization, batch normalization layers is used between convolutional layers and pooling layers. A batch normalization normalizes its inputs \(X_{i}\) by first calculating the mean \(\mu_{B}\) and variance \(\sigma_{B}^{2}\) over a mini-batch and over each input channel. Then, it calculates the normalized activations as

 \(\hat{x}_{i}=\frac{x_{i}-\mu_{B}}{\sqrt{\sigma_{B}^{2}+\epsilon}}\)       (1)

 Here, \(\epsilon\) improves numerical stability when the mini-batch variance is very small. To allow for the possibility that inputs with zero mean and unit variance are not optimal for the layer that follows the batch normalization layer, the layer shifts the input by a learnable offset \(\beta\) and scales it by a learnable scale factor \(\gamma\). Basically, \(\beta\) and \(\gamma\) are themselves learnable parameters which get updated during network training.

\(y_{i}=\gamma \widehat{x}_{\iota}+\beta\)       (2)

 Use of Batch normalization layers normalizes the activations and gradients propagating through a neural network which leads the network training to an easier optimization problem. Batch normalization allows us to increase the learning rate. Since the optimization problem is at ease, the parameter updates can become larger and the network can learn faster.

 So, the whole operation starts by taking images as the input of size 227×227×3 in Input Layer. From the Input Layer, the first Convolution Layer takes those images and applies convolution function with a stride of 4×4. The output of first Convolution Layer is 128 feature maps of dimension 5x5 which transfers to the first Batch Normalization Layer through first ReLU Layer. The output of the first Batch Normalization Layer is 128 normalized features. The first Max Pooling Layer takes the output of first Batch Normalization layer and applies max pooling function with a stride of 2×2 and a dimension of 3×3. Then the second Convolutional Layer takes the output from the first Max Pooling Layer and applies convolution function with a stride of 2×2. The output of this layer is 384 feature maps with a dimension of 3×3. This output travels to the second Max Pooling Layer via second ReLU Layer and second Batch Normalization Layer. The second Max Pooling Layer applies pooling function with a stride of 1×1. The output of second MaxPooling Layer is 384 feature maps with a dimension of 3×3. Here finishes the learning process and starts the classification process. The first Fully Connected Layer takes the output of the second Max Pooling Layer and turns it into a vector. The size of the vector is 384 which goes to the second Fully Connected Layer through the third ReLULayer. The second Fully Connected Layer then takes only 10 most activated features and send sit to the Softmax Layer. Finally, the Output Layer decides the most activated features and labels it with its corresponding class name. The whole operation has been shown in the above Fig. 7.

MTMDCW_2019_v22n5_558_f0006.png 이미지

MTMDCW_2019_v22n5_558_f0006.png 이미지

Fig. 6. Our proposed DCNN structure.

MTMDCW_2019_v22n5_558_f0007.png 이미지

MTMDCW_2019_v22n5_558_f0007.png 이미지

Fig. 7. Our proposed DCNN architecture.

3.3 Morphing Process

 Image morphing is a popular image processing technique that produces transitions between images. Based on interpolating the positions and colors of pixels in two images, there are a variety of morphing methods. At present, there appears to be no universal criterion for evaluating the quality or realism of a morph, let alone of a morphing method. But, due to the less deformation and small transform coefficients between neighboring triangles, we are proposing Triangulation based Morphing. Triangulation based Morphing process involves three processes such as Control Points Selection, Warping and Color Transition [8].

 Any image morphing technique is realized by coupling image warping with color interpolation. The idea is to find a warping method that changes the source image into the destination image. During the morphing process, the source image is faded out and is gradually changed, while the destination image slowly comes in as distorted form towards the source image and is faded in. In this whole process, the beginning image in the sequence is similar to the source image. The middle image in the sequence is the average or fifty percent distorted form of source image fading out and destination image fading in. The last images in the sequence are similar to the destination image. If the middle frame looks good then it's assumed that the entire morphing sequence will look good.

 The morphing process begins by taking the corresponding control points. The smoothness and deformation mostly depend on the correspondence of the control points. There is a conventional control point selection process but in this paper, we are proposing a new Viola-Jones [15] algorithm based control points selection process. Both of these systems are described below as well as followed by the Triangulation based Morphingprocess.

 3.3.1 Conventional Control Point Selection process

 Conventional Control Point selection starts by opening an animator. The animator displays the source image along with the destination image. The points which will be selected in the source image are known as moving points and on the other hand, the points which will be selected in the destination image will be known as fixed points. The process of selection follows correspondence. To keep the correspondence, one should plot the same number of selection point on both the images. The plotting of the points has to be in a similar region of the faces. Such as, for centers and corners of both the eyes of the images, must contain the same number of selection point. As well as, for both of the image, the centers, the corners and all the edges of nose, mouth and whole face have to be selected with the same number of selection points. After finishing this process, these points have to be extracted and forwarded for further morphing process.

MTMDCW_2019_v22n5_558_f0008.png 이미지

MTMDCW_2019_v22n5_558_f0008.png 이미지

Fig. 8. Conventional Control Point Selection process.

3.3.2 Proposed Control Point Selection process

 In the conventional Control Point Selection system, one has to extract the points manually. This process comes at the beginning of the morphing process and the operator has to pick them up manually during the morphing process. Since our purpose was to build a fully compact system of the morphing process, we had to get rid of this manual Control Point extraction system. So, we built an automatic system which takes these control points without an operator’s assistance.

 Our proposed Control Point selection system uses the Viola-Jones [15] algorithm to locate the facial feature of the source image and extracts corresponding points following a built-in point extraction algorithm. For the moving points of the source image, we first assigned the Viola-Jones [15] algorithm to locate the facial features, such as face, eyes, nose, and mouth. After detecting facial features, our system draws circles around those features. Then, from those circles, our algorithm collects the moving control points. The order with which these circles are drawn and points are taken is-face, eyes, nose, and mouth. Firstly, after locating face, the system takes points in a circular motion at every 30-degree interval. Then serially for the right eye, left eye, nose, and mouth, the system takes points in a circular motion at every 90-degree interval. So, overall, the system will collect thirty-two control points maintaining the seriality of the facial features. This process is followed every time we take a new human face as input image or source image. This whole process has been shown in Fig. 9. For the destination image, we followed a different strategy. Due to the permanent animal classes, our destination images were fixed. So, we predefined ten destination images from ten animal classes and we took the fixed points from those destination images manually. To keep the correspondence between moving points and fixed points, we took the fixed points from the destination images following the order of the moving points. These points were fixed and stored. Every time we tried to do morphing, the system took those fixed points from the store. This whole process has been portrayed in Fig. 10. After collecting all the moving and fixed points, it is forwarded to the morphing process.

MTMDCW_2019_v22n5_558_f0009.png 이미지

MTMDCW_2019_v22n5_558_f0009.png 이미지

Fig. 9. Control Points Selection from the source image.

MTMDCW_2019_v22n5_558_f0010.png 이미지

MTMDCW_2019_v22n5_558_f0010.png 이미지

Fig. 10. Control Points Selection from destination images.

3.3.3 TriangulationbasedMorphingprocess

 Triangulation based Morphing method consists of first slicing up the designated space into a suitable set of triangles with the given data points as the corners of the triangles. In our model, we took sixteen peripheral points along the edges of the images and fixed them as corners points for both the source image and destination image. Then, each of the triangles is inserted independently.

 Among all the triangulation method, Delaunay triangulation maximizes the minimum inner angle of all triangles to avoid thin triangles. The fundamental property is the Delaunay criterion also known as the empty circumcircle criterion. For a set of points in 2-D, a Delaunay triangulation of these points ensures the circumcircle associated with each triangle contains no other point in its interior. In the illustration (Fig. 11) below, the circumcircles associated with T1 and T2 do not contain any point in their interior. This triangulation is a Delaunay triangulation.

 So, according to our proposed model, at the beginning of the morphing process, the control points are drawn on the source image and the destination image. After that following the Delaunay triangulation method, triangles are drawn with the control points. This triangulation process has been portrayed in Fig. 12.

 We fixed sixty frames (α) to complete the whole morphing process. There are intermediate points produced which will be used during the morphing process in all the frames. The formula for intermediate points is

\(IntermPoints =(1-\alpha) * movingPoints +\alpha * fixedPoints\)       (4)

 For every single morphing frame, transformation matrices were calculated from the source image and destination image. Then triangle indices were calculated along with transformation coordinates. Later cross dissolve pixel values were acquired from transformed coordinates with the following equation portrayed in Eq. (5)

\(Morphing =(1-\alpha) * im1Warp +\alpha * im 2 Warp\)       (5)

 Here, im1 represents source image and im2 represents the destination image. This formula was applied through every single frame over all the sixty frames. The range of the value of α is 0 to 1. When α = 0, the frame will be similar with the source image and when α = 1, the frame will be similar with the destination images.

MTMDCW_2019_v22n5_558_f0011.png 이미지

MTMDCW_2019_v22n5_558_f0011.png 이미지

Fig. 11. Delaunay triangulation.

MTMDCW_2019_v22n5_558_f0012.png 이미지

MTMDCW_2019_v22n5_558_f0012.png 이미지

Fig. 12. Delaunay triangulation applied on source and destination image.

4. RESULTS AND DISCUSSION

 Our system starts by taking a human face as an input image or source image. Then with the help of Convolution Neural Network, we find the most matching animal class. Then from the predefined store, the system collects the matching animal image and proceed for morphing operation. Here we will be discussing our results for both the Convolution Neural Network and the morphing process. This whole operation was done in a system which has Intel Core i7CPU 3.40 GHz, 8 GB RAM and a single NVIDIA GeForce GTX 1060 3GB GPU.

 As from the above, we have discussed our network and two more related networks proposed by Tibor Trnovszky et al. [9] and Guobin Chen et al [10]. For the simulation of all these networks, we followed the same principle. We did simulations on all of these networks with our built-in dataset. The dataset was divided into eighty percent to twenty percent for training and validation. For all the simulation, we used Stochastic Gradient Descent with Momentum and set an initial learning rate of 0.001. We set max epoch to 70 and mini-batch size to256. Shuffling also used for the dataset during training. To boost the performance of the networks, we used image augmentation. Below, the simulation results of all the networks are shown and also compared with our built-in network.

 In Table 1, we can see that the accuracy of our proposed network is way higher compared to the proposed network by Goubin Chen et al. and Tibor Trnovszky et al. These results were acquired by doing cross-validation where the ratio of training dataset and validation dataset is eighty percent by twenty percent. The cross-validation was prepared by ten sets of the randomly chosen training set and validation set while keeping the same ratio as mentioned above [16]. The results that have been shown in Table 1 was the average or cross-validation outcome of all the ten sets. Later, we ran our proposed network on the whole dataset. Below, in Fig. 13, we can see the number of successfully recognized animal classes by our proposed network out of the total dataset. We also ran the proposed networks by Goubin Chen et al. and Tibor Trnovszky et al. on the whole dataset. The number of the successfully recognized animal from all the proposed networks including our network has been shown in Table 2. After seeing all the results, we concluded that our network was far better than the similarly proposed networks.

Table 1. Simulation results of all the DCNN networks

MTMDCW_2019_v22n5_558_t0001.png 이미지

MTMDCW_2019_v22n5_558_t0001.png 이미지

Table 2. Number of recognized animals by all the DCNN networks

MTMDCW_2019_v22n5_558_t0002.png 이미지

MTMDCW_2019_v22n5_558_t0002.png 이미지

MTMDCW_2019_v22n5_558_f0013.png 이미지

MTMDCW_2019_v22n5_558_f0013.png 이미지

Fig. 13. Confusion matrix of our proposed network

MTMDCW_2019_v22n5_558_f0014.png 이미지

MTMDCW_2019_v22n5_558_f0014.png 이미지

Fig. 14. Input image and destination image.

 According to our proposed model, we took an example human face as the input image and we used our trained DCNN network to find the most matching animal class. The human face that has been shown in Fig. 16(a), was highly matched with ‘Dog’ class. So, the system took the predefined.

 ‘Dog’ face shown in Fig. 16(b) along with the control points from the store. Then it proceeded towards the morphing process. As, the morphing process described above, it calculated the automatic control points of the human face or the input image and applied Delaunay triangulation on both of the input image and destination image. After getting all the mappings the system applied the morphing algorithm on them. The system produced a morphing video along with sixty frames from beginning till the end. In Fig. 15, the key steps of the morphing process have been shown along with the input image at 0% and destination image at 100%.

 There are some more example outputs displayed in Fig. 16. Fig. 16(A) displays morphing between an Asian girl and a cat. Fig. 16(B) displays morphing between a black girl and a lion. Finally, Fig.16(C) displays morphing between a white man and an elephant.

MTMDCW_2019_v22n5_558_f0015.png 이미지

MTMDCW_2019_v22n5_558_f0015.png 이미지

Fig. 15. Steps of morphing process.

MTMDCW_2019_v22n5_558_f0016.png 이미지

MTMDCW_2019_v22n5_558_f0016.png 이미지

Fig. 16. Some example outputs of our proposed morphing process

5. CONCLUSION

 This paper has discussed the combination of morphing process with a DCNN. The method proposed in this study proves its efficiency in the morphing process. Morphing algorithms all share the same components such as feature specification, warp generation, and transition control. Triangulation based morphing is not different than that. The effectiveness of morphing tools is determined by the manner in which these components are addressed. But, all the conventional morphing process disregard resemblance issue where our method suggests finding similarity first and then use morphing. The use of the Convolution Neural Network gives our method another dimension as well as the ability to build a smooth base for the morphing process. DCNN is the state of the art feature extraction technique. We used DCNN to extract features of animal faces and human faces to compare them and find the most matching animal class to a particular human face. Later, the inclusion of automatic Control Point selection system reduced the human error possibility as well as delivers us an easy to use morphing process. The Viola-Jones [15] algorithm-based automatic Control Point selection system is first by its kind. The use of Viola-Jones [15] algorithm in our system brings uniqueness to it compare to all the present manually Control Point selection based morphing process.

 In the future, it is possible to extend this work by modifying the Convolution Neural Network’s feature extraction ability and use those feature’s coordinate positions as the Control Points for the morphing process. Also, the use of the automatic Control Point selection system in other morphing techniques will surely boost them.

References

  1. T. Prince, Y. Fragoso, and O. Gaye, "Applications of Image Morphing Techniques to Analyze Changes in Our Environment," Open Access Library Journal, Vol. 3, No. 9, pp. 1-13, 2016.
  2. G. Wolberg, "Image Morphing: A Survey," The Visual Computer, Vol. 14, Issue 8-9, pp. 360-372, 1998. https://doi.org/10.1007/s003710050148
  3. P. Visutsak and K. Prachumrak, "The Skeleton Pruning-Smoothing Algorithm for Realistic Character Animation," Journal of Man, Machine and Technology, Vol. 2, No. 1, pp. 21-34, 2013.
  4. P. Visutsak and K. Prachumrak, "Geodesic-Based Skeleton Smoothing," International Journal of Mathematical Models and Methods in Applied Sciences, Vol. 5, Issue 4, pp. 713-721, 2011.
  5. G. Wolberg, "Recent Advances in Image Morphing," Proceeding of Computer Graphics International, pp. 64-71, 1996.
  6. S. Karungaru M. Fukumi, and N. Akamatsu, "Morphing Face Images using Automatically Specified Features," 46th IEEE International Midwest Symposium, Vol. 2, pp. 741-744, 2003.
  7. N. Tomoyuki, T. Fujii, and E. Nakamae, "Metamorphosis Using Bezier Clipping," Proceedings of the First Pacific Conference on Computer Graphics and Applications, pp. 162-173, 1993.
  8. V. Mainkar and N.B. Sambre, "Morphing by Triangulation Method," International Journal of Advance Research in Computer Science and Management Studies, Vol. 3, Issue 2, pp. 249-254, 2015.
  9. T. Trnovszky, P. Kamencay, R. Orjesek, M. Benco, and P. Sykora, "Animal Recognition System Based on Convolutional Neural Network," Advances in Electrical and Electronic Engineering, Vol. 15, No. 3, pp. 517-525, 2017.
  10. G. Chen, T.X. Han, Z. He, R. Kays, and T. Forrester, "Deep Convolutional Neural Network Based Species Recognition for Wild Animal Monitoring," Proceeding of IEEE International Conference on Image Processing, pp. 858-862, 2014.
  11. A. Krizhevsky, I. Sutskever, and G. Hinton, "Imagenet Classification with Deep Convolutional Neural Networks," Advances in Neural Information Processing Systems, pp. 1106-1114, 2012.
  12. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., "Going Deeper with Convolutions," Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 1-9, 2015.
  13. K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-scale Image Recognition," arXiv preprint arXiv: 1409.1556, 2014.
  14. K. O'Shea and R. Nash, "An Introduction to Convolutional Neural Networks," arXiv:1511.08458 [cs.NE] , 2015.
  15. Y.Q. Wang, "An Analysis of the Viola-Jones Face Detection Algorithm," Image Processing on Line, pp. 129-148, 2014.
  16. L.C. Ow Tiong, S. Tae Kim, and Y. Man Ro. "Multimodal Face Biometrics by Using Convolutional Neural Network," Journal of Korea Multimedia Society, Vol. 20, No. 2, pp. 170-178, 2017. https://doi.org/10.9717/kmms.2017.20.2.170

Cited by

  1. Animal Face Classification using Dual Deep Convolutional Neural Network vol.23, pp.4, 2019, https://doi.org/10.9717/kmms.2020.23.4.525