A Decision Tree based Real-time Hand Gesture Recognition Method using Kinect

  • Chang, Guochao (Department of Electronics Computer Engineering, Chonnam National University) ;
  • Park, Jaewan (Department of Electronics Computer Engineering, Chonnam National University) ;
  • Oh, Chimin (Department of Electronics Computer Engineering, Chonnam National University) ;
  • Lee, Chilwoo (Department of Electronics Computer Engineering, Chonnam National University)
  • Received : 2012.11.29
  • Accepted : 2013.10.28
  • Published : 2013.12.30


Hand gesture is one of the most popular communication methods in everyday life. In human-computer interaction applications, hand gesture recognition provides a natural way of communication between humans and computers. There are mainly two methods of hand gesture recognition: glove-based method and vision-based method. In this paper, we propose a vision-based hand gesture recognition method using Kinect. By using the depth information is efficient and robust to achieve the hand detection process. The finger labeling makes the system achieve pose classification according to the finger name and the relationship between each fingers. It also make the classification more effective and accutate. Two kinds of gesture sets can be recognized by our system. According to the experiment, the average accuracy of American Sign Language(ASL) number gesture set is 94.33%, and that of general gestures set is 95.01%. Since our system runs in real-time and has a high recognition rate, we can embed it into various applications.


In the field of the interaction between human and computer, the human-computer interface (HCI) becomes more and more important. The use of hand gestures has become an important part of HCl in recent years. The research of hand gesture recognition has a wide range of applications, such as the added communication between the deaf and the normal [1,2], the aided recognition of voice recognition, the control of virtual reality (VR) [3], and the study of robot [4]. There are two methods on hand gesture recognition: recognition based on data glove and recognition based on vision. Especially, vision-based Hand Gesture Recognition (HGR) becomes a research hotspot, many scholars have invested a great deal of enthusiasm for research, while vision-based gesture recognition system is one of main development trends in the current and future periods.

Many researches have been turned to research of human Hand Gesture Recognition with different methods, those methods can be divided into appearance based approaches and model based approaches. Appearance based approaches usually learn the nonlinear mapping on the features extracted directly from images or other input data to the hand configuration. Model-based approaches created a geometric hand model, compare the current hand state by matching the model to the observed image features.

The appearance based approaches avoid the direct search problem which is generally quicker if mapping can be learned. The main structure of appearance based approaches is shown in figure 1. The popular features used in appearance include hand color and shapes, local hand features, optical flow and so on. Hand features extract certain local image features such as fingertips or hand edges, and use some heuristics to find configurations or combinations of these features specific to an individual hand gesture.

Fig. 1.The general processing of appearance based approaches.

In this paper, we use the Kinect as the input device, and propose a vision-based system to recognize the hand gestures. Microsoft Kinect is a motion based peripheral, as a general purpose and low-cost 3D input device Kinect is an ideal device to develop the HCI system. Since Kinect launch, lots of HCI systems have been developed by using it. However, few if any hand gesture recognition systems were developed, and only a few gestures can be recognized by these hand gesture recognition systems. The Kinect is used to acquisition the depth information which is useful to improve the efficiency and robustness of segmentation result. The finger detected and finger labeling result are used as prior information in our system. A decision tree classifier has designed to recognize the hand gesture according to the number of finger, finger labeling result and angles between two fingers.

This paper is organized as follows. Section 2 introduces some related works related to this paper. Section 3 describes our hand gesture recognize method and the experimental results have shown in section 4. In the end of this paper, section 5 is given in the conclusion and described some future work of our research.



Hand gesture recognition has developed rapidly in recent years, with many visual analysis methods been proposed , but it's still a challenging problem in human-computer interaction field. This section will review some of the existing related works.

2.1 Hand Detection

In order to achieve the hand gesture recognition, we need to accurately segment the hand from the input image. Robust hand detection is the most difficult problem in building a hand gesture-based interaction system [5]. A lot of methods have been proposed that detect bare-hand in uncontrolled environments. These methods use appearance, shape, color, depth, and context as cues. It is difficult to achieve good results due to high number of degrees of freedom, the resulting shapes and shadows, complex background etc. Due to those complexity problems, the detection of hand is a hard nut to crack, which remains a suspending problem to be solved throughout the world.

In order to achieve hand segmentation, previous studies used the way of limiting environment. Rehg and Kanade [6] used a special background where hand is the only object in a very simple background. Some of other researchers used an alternate way which requires users to use a special marker. Since these special markers are usually gloves which very distinct from other objects in the environment or imprinted with a custom pattern, these methods gain good effect on hand detection and tracking. A color glove used in Wang and Popović [3] and a glove with attached LEDs used in Park and Yoon [7].

Recently many researchers have developed more effective hand detection systems with depth data. These methods usually use a set of cameras or a 3D camera to produce 3D image. Liu and Fujimura proposed a hand gesture recognition method by using a sequence of real-time depth image data acquired by an active sensing hardware [8]. Van den Bergh and Van Gool [5] present an improved real-time gesture interaction system by using a Time-of-Flight(ToF) camera. The color-based detection achieves 92.0% correct detection, while the depth-based detection achieves 99.2% correct detection. Similar to that, An et al.[9] also used a ToF camera to detect hands and fingertips. As a common 3D input device, Kinect receives a lot of focus due to its high performance and reasonable price. A lot of researchers have used Kinect to develop their systems which include some hand detection methods [10-12].

2.2 Hand Gesture Recognition System

So far many gesture recognition systems have been proposed, the approaches to vision based hand gesture recognition can be divided into two categories: 3D hand model based approaches and appearance based approaches [13]. The details of various approaches were referred in [14,15] and a review of depth image based hand gesture recognition was referred refer in [16]. In this paper we will simply introduce some methods that related to our work.

Van den Bergh and Van Gool [5], that system can recognize six different key postures : open hand, fist, pointing up, L-shape, pointing at the camera, and thumb up; the results showed that RGB-based recognition achieved 99.54% correct recognition rate, depth-based recognition achieved 99.07% and combined both achieved 99.54%. We found that depth-based recognition performed well on the construction of HGR system. However, combining RGB-based and depth-based recognition methods may not always improve accuracy. A speed hand gesture recognition system has been proposed in [10], which was based on the Histogram of oriented gradients (HOG) features and Adaboost training algorithm. For the situations like hands covered in front of body or objects that similar to hands, there is still some high missing and false rate. Raheja et al. [11] proposed a method to identify fingertips and centre of palm. The accuracy for fingertips detection was near to 100% when all fingers were open, and in the case of centre of palm detection, the accuracy were around 90% correct. Ren et al.[12], their system can recognize 10 gestures, but it can not run in real time. The mean accuracy of near-convex decomposition based Finger-Earth Mover’s Distance(FEMD) is 93.9% and that of threshold decomposition is 90.6%.


3. Decision Tree Classifier based Hand Gesture Recognition Method

Similar to other gesture recognition methods, our method mainly consist of hand detection and gesture recognition. Candescent NUI is an open source project under Berkeley Software Distribution(BSD) License. It has been used to make many Nature User Interface(NUI) applications. Our system is built on it, figure 2 shows an overview of our method.

Fig. 2.The framework of our hand gesture recognition system.

3.1 Hand Detection

Hand detection is the preliminary work of hand gesture recognition. At this stage, we need to separate hands from background, and the segmentation results will directly influence the average recognition rate. To improve segmentation result, we set the depth threshold as 50cm to 80cm. Depth information beyond this range will be ignored, so only put hands in this range can be detected. All of the hand pixel under this range will be projected to a 2D space for subsequent analysis. P1(x1,y1) and P2(x2,y2) is two pixels belong to hand as shown in Eq.1, we use the euclidean distance to define the distance between two pixels.

The K-means clustering algorithm is a method of cluster analysis that is used to partition n observations (x1,x2,x3,...,xn) into k clusters (k ≤ n), { C=C1,C2,C3,...,Ck }. Each observation belongs to the cluster with the nearest mean μi(x,y), which is calculated as the mean of points in Ci. K-means cluster minimizes the within-cluster sum of squares:

In our system, we use k-mean algorithm to distinguish pixels of left or right hand. Hence, the value of k is 2, according to the calculated distance, the pixels are divided into two groups. Since our system works in real-time, the k-mean result is always changing. The input depth is data updated by 30 frames per second, at the beginning of each frame we initialize the cluster with a random point as the mean. In case of distance between the two clusters which is less than the default value, system will be judged as there is only one hand. Therefore, users can use our system with one hand or both hands. With this, the pixels belong to each hand are clustered as shown in figure 3, and the hand detection stage is completed.

Fig. 3.The result of hand detection by K-mean algorithm.

3.2 Finger Recognition

3.2.1 Hand Convex Hull and Contour

Due to the system’s real-time requirement, it is difficult to design an algorithm for finger detection in an image of 640×480 pixels with global search. We need to find convex hull and to detect hand contour. Convex hull of the detected hands are computed by Graham Scan algorithm [17]. Contour tracing method is generally carried out by finding the next pixel on a contour in a 4 or 8-neighborhood of the previous pixel. Moore Neighborhood of a pixel, P is the set of 8 pixels which shares a vertex or an edge with that pixel. The basic idea is: When the current pixel p is black, the Moore neighborhood of p is examined in clockwise direction starting with the pixel from which p was entered and advancing pixel by pixel until a new black pixel in P is encountered. The algorithm terminates when the start pixel is visited for second time. The black pixel walked over will be the contour of the hand. The Moore-Neighbor Tracing algorithm is used to detect hand contour.

3.2.2 Fingertip Detection

The next work is to detect fingertips by using k-curvature algorithm. This implement by find curves on the hand contour and determine them if they are fingertip. There are three parameters employed in this algorithm: a set of contour points C, a constant k and an angle θ. The constant k was found by trial and error, we set k as 20. The angle θ was found by measuring fingertip angles in depth frames, θ is set as 90˚-100˚. In order to improve the operation efficiency and reduce the computational cost, we define all points which simultaneously belong to convex hull and the hand contour as set C. For each point in C, we take two vector and , they are points to a contour point k points in the two different directions along the contour. After the vectors are created we need to find the angle between and , if this angle is in θ range we have a fingertip point. In addition, we created another vector and it can represent the pointing direction of the finger. We also use this method to find the finger valley by set the C as all points between two fingertips. Figure 4 shown an example.

Fig. 4.Fingertip detection.

3.2.3 Finger Labeling

After detecting the fingertips, we can label each of by calculating distances between each two fingertips. So this stage requires users to open their hand. The simplest step is to locate thumb and index finger, since the distance between them is much further than other adjacent fingers. After locating thumb and index finger, the finger that farthest to thumb is recognized as little finger and the finger that nearest to index finger is recognized as middle finger. The last finger is recognized as ring finger. The result of this stage is shown in Figure 5.

Fig. 5.The result of fingertip labeling processing.

As mentioned earlier, the proposed system runs at 30 frames per second. The process of finger definition is performed every time when receiving a new frame. If a same finger still exisst in next frame, it will inherent all properties from the previous frame.

3.3 Hand Gesture Recognition

After the finger labeling, we are ready for hand gesture recognition. We designed a decision tree classifier to recognize hand gesture. Decision tree is a simple and probably the most widely used classification approach. Our decision tree classifier performs three stages of classifications: the number of fingers, the name of fingers and the angle of fingers.

We have two groups of gesture sets, as shown in figure 6(a), one group is number 0-9 of American Sign Language(ASL); the other group consists of 7 gestures with special meanings, which we called general gestures set and shown in figure 6(b). Since some gestures exist in both gesture sets, users need to choose gesture set firstly.

Fig. 6.The Gestures sets.

We use decision tree classifier to make classification as follows:

Stage 1: Computer recognizes the number of fingers, and makes first classification according to this. The result is sent to the corresponding second layer of classifier.

Stage 2: Identify the recognized fingers by the results of finger identification and to determine whether the gesture is the unique one among all gestures. If so, the meaning and picture of the gesture will be shown; and if not, the gesture will be sent to the corresponding third layer of classifier.

Stage 3: Use Eq.3 to calculate angles between the recognized fingers and to determine whether the gesture is a significant one according to compare the angles with the default value in Gestures Set. If the gesture is a significant one, the meaning and picture of it will be shown.

where andare two direction of fingers.

The process of hand gesture recognition is shown in figure 7.

Fig. 7.The process of decision tree classifier. This figure show recognition process of number 3 in ASL Gesture set.



The experiments are executed in C# on a PC with Intel core i5 CPU @ 3.30GHz and 4GB memory. Depth image is captured with a Microsoft Kinect at a rate of 30 frames per second.

In order to test the performance of our system, we conducted a test with six people which included four male and two female, with one male dark-skinned. The participants are asked to make every gesture 90 times, 30 times with left hand, 30 times with right hand, and the last 30 times with both hands(an example is shown in figure 8). Although colors and shapes of the signers' hands are different, it doesn't has significant influence on recognition rate of the system we proposed, which verify the correctness and generality of the algorithm. Moreover, since our system use the depth information to recognition hands, it has good performance also in dark or background illumination changes.

Fig. 8.Experiments. (a) recognition by right hand; (b)recognition by left hand;(c)recognition by both hands.

The average accuracy of each gesture was calculated, the results were shown in table 1 and 2. According to the results, we found that the average accuracy of ASL number gesture set was 94.33%; the average accuracy of general gestures set was 95.01%. The highest recognition rate of each gesture was 100%, with the lowest was 87.47%。 When both hands make a same gesture, recognition rate was relatively improved, especially some gesture with a low recognition rate. The average improve rate of ASL number gesture set was 1.59% and that of general gesture set was 1.65%. Recognition rate was mainly influenced by the result of Hand Detection and Fingertip Labelin g. For example, in the ASL number gesture set, since gestures of number 3, 6, 7, 8, 9 were made by three fingers, the results of finger recognition were not stable, which made more difficulty for decision tree classifier to classify. The system of [10] shows that the mean accuracy in real-time video is 90.2%, and the mean accuracy of [12] is 93.9%. Compared with them and other similar system, our system has shown a high recognition rate while run in real-time.

Table 1.The average accuracy of ASL number gesture set

Table 2.The average accuracy of general gestures set



In this paper we proposed a method for hand gesture recognition using Kinect. Our system run in real-time and had a high recognition rate. We used Kinect as an input device and separated hands from background by depth information. To obtain two clusters of hand pixels, we used k-mean algorithm did the classification. After that, we found convex hull and detected hand contour by Graham Scan algorithm and Moore Neighborhood algorithm. And then the center of palm and detect fingertips were calculated. After labeling each finger, we did hand gesture recognition. A decision tree classifier was designed for recognizing hand gestures. We had two groups of gesture sets: ASL Gestures Set and General Gestures Set. According to the experiment, the average accuracy of ASL number gesture set was 94.33%, and that of general gestures set was 95.01%.

Our system provide a new approach for hand gesture recognition at the level of individual fingers. It could be used in various kinds of applications, such as sign language recognition, game controlling, human robot interaction, etc. In the future, we will improve the performance of Hand Detection and Fingertip Labeling processing. Since it has a great influence on recognition rate, gesture recognition algorithm will also be improved in order to increase recognition rate. We will expand gesture set in the future to let our system recognize more gestures, and different gestures made by two hands will also be taken under consideration. In addition, we will attempt to incorporate Support Vector Machine(SVM) algorithm and Hidden Markov Model(HMM) algorithm to our system.


  1. T. Starner and A. Pentland, "Real-time American Signlanguage Recognition from Video using Hidden Markov Models," IEEE International Symposium on Computer Vision, pp. 265-270, 1995.
  2. F. Ullah, "American Sign Language Recognition System for Hearing Impaired People Using Cartesian Genetic Programming," Proc. the 5th International Conference on Automation, Robotics and Applications, pp. 96-99, 2011.
  3. R.Y. Wang and J. Popović, "Real-Time Hand- Tracking with a Color Glove," ACM Transaction on Graphics, Vol. 28, Issus 3, pp. 1-8, 2009.
  4. L. Brethes, P. Menezes, F. Lerasle, and J. Hayet, "Face Tracking And Hand Gesture Recognition for Human-Robot Interaction," International Conference on Robotics and Automation, Vol. 2, pp. 1901-1906, 2004.
  5. M. Van den Bergh and L. Van Gool, "Combining RGB and ToF Cameras for Real-time 3D Hand Gesture Interaction," Applications of Computer Vision, 2011 IEEE Workshop on, pp. 66-72, 2011.
  6. J.M. Rehg and T. Kanade, "Visual Tracking of High DOF Articulated Structures: An Application to Human Hand Tracking," European Conference on Computer Vision, Vol 801, pp. 35-46, 1994.
  7. J. PARK and Y.L. YOON, "LED-glove based Interactions in Multi-modal Displays for Teleconferencing," International Conference on Artificial Reality and Telexistence- Workshops, pp. 395-399, 2006.
  8. X. Liu and K. Fujimura, "Hand Gesture Recognition using Depth Data," Proc. of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 529-534, 2004.
  9. H.J. An, J.S. Lee, and D.J. Kim, "Hand Gesture Recognition System using TOF Camera," HCI Korea, pp. 531-534, 2011.
  10. H. Li, L. Yang, X.Y. Wu, S.M. Xu, and Y.W. Wang, "Static Hand Gesture Recognition Based on HOG with Kinect," 4th International Conference on Intelligent Human-Machine Systems and Cybernetic, pp. 271- 273, 2012.
  11. l.L. Raheja, A. Chaudhary, and K. Singal, "Tracking of Fingertips and Centers of Palm using KINECT," Third International Conference on Computational Intelligence Modelling Simulation, pp. 248-252, 2011.
  12. Z. Ren, J. Yuan, and Z. Zhang, "Robust Hand Gesture Recognition Based on Finger- Earth Mover's Distance with a Commodity Depth Camera," Proc. the 19th ACM International Conference on Mulitimedia, pp. 1093-1096, 2011.
  13. H. Zhou and T.S. Huang, "Tracking Articulated Hand Motion with Eigen Dynamics Analysis," Proc. International Conference on Computer Vision, Vol. 2, pp. 1102-1109, 2003.
  14. S.S. Rautaray and A. Agrawal, "Vision based Hand Gesture Recognition for Human Computer Interaction: A Survey," Artificial Intelligence Review, pp.1-54, 2012.
  15. G. Simion, V. Gui, and M. Otesteanu, "Vision Based Hand Gesture Recognition: A Review," International Journal of Circuits Systems and Signal Processing, Vol. 6, Issue 4, pp. 275-282, 2012.
  16. J. Suarea and R.R. Murphy, "Hand Gesture Recognition with Depth Images: A Review," The 21st International Symposium on Robot and Human Interactive Communication, pp. 411-417, 2012.
  17. R.L. Graham, "An Efficient Algorithm for Determining the Convex Hull of a Finite Planar Set," Information Processing Letters, Vol. 1, No. 4, pp. 132-133, 1972.

Cited by

  1. Finger-Pointing Gesture Analysis for Slide Presentation vol.19, pp.8, 2016,
  2. Hand Language Translation Using Kinect vol.18, pp.2, 2014,
  3. Hand Segmentation Using Depth Information and Adaptive Threshold by Histogram Analysis with color Clustering vol.17, pp.5, 2014,