1. Introduction
Image-based object tracking technolgies have been widely used in various application fields such as broadcasting, intelligent surveillance system, traffic control system, millitary security system, etc. Among various approaches, the kernel-based mean-shift tracking[1] has attracted many attentions as an effective target object tracking method, since it requires lower computational cost compared to other tracking approaches such as particle filtering and optical flow.
In the kernel-based mean-shift tracking, the target model is defined in the first frame and the target candidate is estimated in the subsequent frame. The target model means the probability density function (pdf) to characterize the target object such as the color pdf. In the subsequent frame, the target candidates are represented by the color pdf that depicts the target candidate regions. To achieve the low-computational cost, the discrete densities, i.e., m-bin histograms are frequently used. Furthermore, a similarity function between the target model and the target candidate is defined. The local maxima of the similarity function in the image indicate the presence of objects in the subsequent frame that looks similar to the target model defined in the first frame. To track a target object in the subsequent frame, we find the best target candidate by using the local maxima of the similarity function in each frame.
A large amount of researches has been conducted to increase accuracy and performance of tracking in the sequence of images of real-life situation. Those images include various obstacles, such as the cluttered background, illumination changes, occlusions, etc, that cause the degradation of the tracking accuracy. To improve the tracking accuracy, various approaches in the target model construction[2-3] have been proposed, where they reduce impacts of the background clutters by considering the background colors. Target model update schemes[4-5] have been attempted to reflect changes of the target object colors due to the illumination changes. Especially, in the outdoor environment, the importance of the target model update has been increased due to widely changeable illumination conditions.
Wu et al.[4] presented the model update method based on the probabilistic covariance tracking. Li et al.[5] proposed the model update method in the kernel-based mean-shift tracking without background-weighted histogram. In their approach, the target model is updated in every frame by weighted averaging the target model and the current target candidate. Approches using the multi-mode kernel or hybrid approaches combined with particle filtering have been proposed for the robust object tracking even in the occlusion circumstance[6-7]. Tracking multiple objects[8] has been also researched as a major issue of the image-based object tracking in the last decade.
This paper focuses on designing the effective target model update method, which can be used in kernel-based mean-shift tracking with background weighted histograms [2-3] to robustly track a target object in the various real-life situations. In the case of applying background weighted histograms for constructing the initial target model[1-3], the target model represents the color pdf having high probabilities to unique colors of the target object. On the other hand, the target candidate is the color pdf with high probabilities not only to unique colors of the target object but also to the background colors within the kernel. Therefore, Li et al.[5]’s target model update method which has been widely used in the kernel-based object tracking can not be applied to kernel-based tracking with background weighted histograms. It is necessary to extract the unique colors of the target object from the target candidate in each frame image in order to utilize the target candidate for the target model update. In this paper, we propose a novel target model update scheme which uses the corrected target candidates based on the back-projection weight for the kernel-based tracking with background weighted histograms.
The remains of the paper are organized as follows. Section 2 introduces the mean-shift object tracking with background-weighted histogram that has shown the robust tracking results. The proposed model update scheme for the kernel-based mean-shift tracking with background-weighted histogram is addressed in section 3. Experiment results using the proposed method are given in section 4 and we conclude this paper in section 5.
2. Mean-shift object tracking with background weighted histogram
2.1 Target model construction with background weighted histogram
The kernel-based mean-shift tracking is used to track an ellipsoidal region of a target object based on a kernel weighted histogram of colors[2]. In an initial frame, a target model q={qu}u=1..m with background weights is defined by the probability of the feature u=1..m computed as
where {xi}i=1..n mean the normalized pixel locations in the region defined as the target model and b(xi) is the index of the histogram bin corresponding to the color of the pixel xi . k(|xi|) is a convex and monotonic decreasing kernel profile which assigns a smaller weight to locations that are farther from the center of the target. δ is the Kronecker delta function and Constant C is expressed as
vu is a background weight for reducing background interference. vu is computed using the background histogram and its smallest nonzero entry o∗. The background weights are computed in a region around the target as shown in Fig. 1. We assumed that the background region has three times bigger area than that of the target object. The weights vu is defined by
Fig. 1.The background region around the target
In the target model, the probabilities of major background colors are suppressed while those of the unique colors of the target object are enhanced by using background weight vu. Fig. 2 shows the target models - color histograms for the kernel region before and after applying background weight vu. It represents that the colors of the target object are emphasized after applying background weight vu.
Fig. 2.Target model before (left) and after (right) applying background weight vu
2.2 Target candidate representation
The target is tracked by comparing the similarity between the target model q and a target candidate p ={pu}u=1..m in a sequence of video frames. Let {xi}i=1..nh be the pixel locations of the target candidate, centered at y in the current frame. Using the same kernel profile k(x), but with bandwidth h, the probability of the feature u=1..m in the target candidate is given by
where
is the normalization constant.
2.3 Mean-shift algorithm
The search for the new target location y1 in the current frame starts at the estimated location y0 of the target in the previous frame and is repeatedly computed as Eq. (6).
where g(x) is the shadow of the kernel profile k(x), i.e. g(x)= −k′(x), and wi is the back-projection weight given as
If ║y1−y0║<ε, y1 is defined as the center of the most similar region to the target object in the new frame. Otherwise, y1 is assigned to y0 and y1 is recomputed using Eq. (6).
3. Target model update based on back-projection weights
3.1 Overall tracking procedure
The visual object tracking procedure consists of four processing steps as shown in Fig. 3. In the initialization step, the target model is constructed as explained in section 2.1. In the second step, the best target candidate is found by repeated computations of Eq. (4) and Eq. (6) in each video frame. The changed size of the target object is estimated in the third step and the target model update is finally conducted based on the best target candidate region which is determined by step 2 and step 3. For the scale estimation (step 3), we applied Jeyaker’s scale estimation method[3] and this paper focuses on how to robustly update the target model in order to enhance the tracking correctness. The proposed target model update method is based on target candidate correction using the back-projection weight. The algorithm details are presented in the next section.
Fig. 3.Overview of the tracking procedure
3.2 Target candidate correction and target model update
The target object is tracked by comparing the similarity between the target model q and a target candidate p in a sequence of video frames. After the final location of the kernel for the target candidate is determined in each frame, the target candidate p* is reconstructed by selecting the kernel pixels that have larger back-projection weight wi than a threshold ε. The back-projection weight wi for each pixel xi in the kernel is defined as Eq. (7). Fig. 4 shows a sample back-projection image, where white pixels having large weight wi. As shown in Fig. 4, the pixels having large back-projection weight wi are included in the target object region, which can be used to separate a target object from the background.
Fig. 4.The kernel (left) and back-projection image (right) for the 298th frame of the baseball 2 test sequence.
The proposed target model update method was basically devised based on the fact that the target candidates in each frame have high probabilities to both colors of target object and the background colors within the kernel. Therefore, it is necessary to attenuate the probabilities to background colors in the target candidate in order to update the target model by weighted average between the target model and the target candidate. In the proposed scheme, we select the pixels that have larger back-projection weight wi than a threshold ε is to exclude the background pixels in the kernel.
Algorithm 1 presents the procedure for the correction of the target candidate. Each pixel in the target candidate is checked whether its colors corresponds to the major object color in the target model qu and its weight wi is greater than the threshold. The only pixels that are satisfied with both two conditions are used in correcting the target candidate (Refer line 10). In line 14, the corrected target candidate p∗ is normalized.
Finally, the target model q is updated by weighted average with the corrected target candidate p*. That is, the updated target model qu* is computed by Eq. 8.
where the weight τ is determined heuristically.
4. Experimental Results
In the experiments, we compared the tracking errors of our target model update scheme with those of the previous scheme using five test video sequences. Fig. 5 shows five test sequences. Four video sequences (Baseball1, Baseball2, Cam24, Xeron2) were produced by authors using Sony CX560 camcorder and AXIS PTZ camera. Corridor test sequences are publicly available from CAVIAR project site1. Each test sequence has different features which could affect tracking performance. In the case of Baseball1 test sequence, colors of the target object (player) and the background are similar. Baseball2 includes a fast moving target object and Cam24 sequence has a complex background environment. In Xeron2 and Corridor test sequences, the size of the target object (a walking person) is largely changed.
Fig. 5.Test video sequences. From top to bottom row: Baseball1, Baseball2, Cam24, Corridor and Xeron2 sequences
In order to measure tracking errors, we segmented a target object region manually and extracted a tight bounding box for the target object in each video frame. Fig. 6 shows two bounding boxes – The red one is a bounding box of a target object extracted from the ground truth data. The blue one is a tracking result kernel. Let GP1(xg1,yg1), GP2(xg2,yg2), TP1(xt1,yt1) and TP1(xt2,yt2) be the left-top, right-bottom points of two bounding boxes as shown in Fig. 6, respectively. The tracking error E in an image frame is defined by Eq. (9).
Fig. 6.Bounding box (in red) of a target object extracted from the ground truth data and that (in blue) determined by the tracking result.
Fig. 7 compares the tracking results by showing the tracking kernels and back-projection images after conducting tracking algorithms without target model update, with ours and with Li’s scheme, respectively. In tracking result kernel image (the leftmost) of each row, blue, red, green boxes represent the tracking results without target model update, with ours and with Li’s method, respectively. As shown in Fig. 7, the tracking method with our target model update method tracked the target more correctly and robustly. In back-projection images, brighter pixels represent the higher values of back-projection weight. After applying our method, white pixels on the background region were more clearly removed. These changes help our method successfully track the given target object until the end of the test sequence.
Fig. 7.Tracking result kernels and back-projection images after applying tracking methods without target model update, with ours and with Li’s scheme, respectively (From left to right column).
Fig. 8 depicts tracking errors in all video frames of five test sequences. Blue, red and green lines represent tracking errors of methods without target model update, with our target model update, and with Li’s, respectively. Fig. 8 shows that our method significantly improves the tracking accuracy and robustness for every test sequence.
Fig. 8.Tracking errors without target model update (blue), with Li’s target model update scheme (green), with ours (red).
Table 1 compares the tracking errors of test sequences, where each number represents the tracking error averaged over all the tested frames of each video. In Table 1, the accuracy improvement (AI) ratio was computed as follows:
where Eno and Eours are the average tracking errors before and after applying our model update scheme, respectively. The proposed target model update increased the tracking accuracy with 72.45% in average.
Table 1.Comparison of tracking errors before and after applying the target model update
5. Conclusions
Kernel-based mean-shift tracking with background weighted histogram is known to be stable to various tracking obstacles such as background clutters and illumination changes. However, tracking is still failed easily in the cases of rapid illumination changes and the cluttered background, which happen frequently in the outdoor environment. In order to overcome those problems, the accuracy and robustness of the target model update scheme is required. Therefore, in this paper, a novel target model update method was proposed for the kernel-based mean-shift tracking with background weighted histogram.
Using the comparative experiments, we validated that the proposed target model update enhanced the tracking accuracy and robustness. For five test sequences, the experiments showed that the accuracy of the proposed method was improved by 72.45% in average.
For more accurate tracking, the efficient and robust scaling schemes are also needed. Therefore, we are going to analyze previous scaling estimation schemes and propose an efficient and robust scaling scheme as a future work.
References
- D. Comaniciu, V. Ramesh, P. Meer, “Kernel-Based Object Tracking,” IEEE Trans. of Pattern Analysis and Machine Intelligence, vol 25, no. 5, pp.563-577, 2003. Article (CrossRef Link) https://doi.org/10.1109/TPAMI.2003.1195991
- J. Ning, L. Zhang, D. Zhang, C. Wu, “Robust mean-shift tracking with corrected background-weighted histogram,” IET Comput. Vis., vol. 6, no. 1, pp. 62-69, 2012. Article (CrossRef Link) https://doi.org/10.1049/iet-cvi.2009.0075
- J. Jeyakar, R. V. Babu, K. R. Ramakrichna, “Robust object tracking with background-weighted local kernels,” Computer Vision and Image Understanding, vol. 112, no. 3, pp. 296-309, December 2008. Article (CrossRef Link) https://doi.org/10.1016/j.cviu.2008.05.005
- Y.Wu, J. Cheng, J. Wang, H. Lu, et al., “Real-time Probabilistic Covariance Tracking with Efficient Model Update,” IEEE Trans. Image Process. vol. 21,no. 5, pp.2824-2837, 2012. Article (CrossRef Link) https://doi.org/10.1109/TIP.2011.2182521
- L. Li, Z. Feng, “An efficient object tracking method based on adaptive nonparametric approach,” Opto-electronics Review, vol. 13, no. 4, pp. 325-330, 2005. Article (CrossRef Link)
- Z. Khan, I. Y.-H. Gu, and A. Backhouse, "Joint particle filters and multi-mode anisotropic mean shift for robust tracking of video objects with partitioned areas," in Proc. of IEEE Int. Conf. Image Process, pp. 4077-4080, November 2009. Article (CrossRef Link)
- Z. Khan, I. Y.H.Gu, An G. Backhouse, “Robust Visual Object Tracking Using MultiMode Anisotropic Mean Shift and Particle Filters,” IEEE Transactions on Circuits and Systems for Video Technology, vol.21, no. 1, pp. 74-87, January 2011. Article (CrossRef Link) https://doi.org/10.1109/TCSVT.2011.2106253
- H. Fang, J.W. Kim, and J.W. Jang, “A Fast Snake Algorithm for Tracking Multiple Objects,” Journal of Information Processing Systems, vol. 7, no. 3, pp. 519-530, September 2011. Article (CrossRef Link) https://doi.org/10.3745/JIPS.2011.7.3.519