Multi-feature local sparse representation for infrared pedestrian tracking

Wang, Xin;Xu, Lingling;Ning, Chen;

doi:10.3837/tiis.2019.03.020

KSII Transactions on Internet and Information Systems (TIIS)

Volume 13 Issue 3
/
Pages.1464-1480
/
2019
/
1976-7277(pISSN)
/
1976-7277(eISSN)

Korean Society for Internet Information (한국인터넷정보학회)

DOI QR Code

Multi-feature local sparse representation for infrared pedestrian tracking

Wang, Xin (College of Computer and Information, Hohai University) ;
Xu, Lingling (College of Computer and Information, Hohai University) ;
Ning, Chen (School of Physics and Technology, Nanjing Normal University)

Received : 2017.12.19
Accepted : 2018.10.09
Published : 2019.03.31

https://doi.org/10.3837/tiis.2019.03.020 Citation PDF KSCI HTML

Download PDF

⟨ Previous Next ⟩

Abstract

Robust tracking of infrared (IR) pedestrian targets with various backgrounds, e.g. appearance changes, illumination variations, and background disturbances, is a great challenge in the infrared image processing field. In the paper, we address a new tracking method for IR pedestrian targets via multi-feature local sparse representation (SR), which consists of three important modules. In the first module, a multi-feature local SR model is constructed. Considering the characterization of infrared pedestrian targets, the gray and edge features are first extracted from all target templates, and then fused into the model learning process. In the second module, an effective tracker is proposed via the learned model. To improve the computational efficiency, a sliding window mechanism with multiple scales is first used to scan the current frame to sample the target candidates. Then, the candidates are recognized via sparse reconstruction residual analysis. In the third module, an adaptive dictionary update approach is designed to further improve the tracking performance. The results demonstrate that our method outperforms several classical methods for infrared pedestrian tracking.

Keywords

1. Introduction

Infrared (IR) pedestrian tracking is a vital problem in infrared image analysis, and is important for a great number of practical applications, e.g. human motion analysis, video surveillance and monitoring. However, the infrared pedestrian image sequences usually have complex backgrounds, making the tracking task much difficult [1].

Decades of study on this issue have generated a series of approaches [2-13]. Thereinto, particle filter (PF) has gotten particular attention for the capability of solving non-linear and non-Gaussian questions [2-4]. Also, Gaussian mixture model (GMM) has been exploited for extracting foreground candidates from background [5]. Mean shift-based tracking technique has been put forward as an expeditous technique [6-10]. In [11], spatial-temporal filters have been designed to track infrared target. The dense structural learning has been proposed to train a classifier with dense samples through Fourier techniques for infrared object tracking [12]. In [13], generative and discriminative ideas have been adopted.

Currently, sparse representation (SR) based tracking methods have gained substantial interest [14-16]. Its main idea is that, for current frame, object candidates are sparsely represented and that having the lowest reconstruction error is thought to be the real target [17-19]. Many works have shown the effectiveness of such methods, but there still exist two critical problems. (1) Targets to be tracked are always thought to be holistic entities by SR. Consequently, when they face the difficulties of appearances changes, illumination variations, etc., they cannot guarantee the tracking performance and tend to fail. (2) At present, most SR based methods only rely on widely used feature, i.e., the gray feature for infrared videos, since gray is thought to be the most salient feature for infrared targets. Nevertheless, it may fail while encountering interferences with similar gray values.

In this paper, we solve the above challenges by proposing a novel infrared pedestrian tracking method. This method involves three important contributions. (1) Unlike most existing SR approaches, the addressed algorithm is to utilize local sparse representation to model the target locally. Compared with holistic description, local representation is more robust to variations. (2) Instead of only using gray cue, our method also extracts the edge feature for infrared pedestrians to enhance the robustness of the target model. (3) For robust tracking, researchers have proposed various approaches with regard to target model update, most of which update the model via the current frame tracking results. However, if the results are contaminated, the updated model will be inaccurate and some errors may be introduced in the tracking process. When the errors are accumulated to a certain extent, serious drifting may occur. To prevent the drifting problem, an adaptive dictionary update approach is designed, which judge whether the present target is dirtied before target feature set renovation. The current target feature set is only updated when the result is not dirtied. This scheme is very helpful for improving tracking robustness.

The rest is arranged as below. SR theory is reviewed in Section 2. Our technique is introduced in Section 3. Section 4 gives the experimental results. Section 5 draws the conclusion.

2. Sparse Representation

The aim of SR is to seek sparsely representations for signals [20-24]. Given signals \(Y=\left[y_{1}, y_{2}, \cdots, y_{N}\right] \in R^{n \times N}\), a reconstructive dictionary \(D=\left[d_{1}, d_{2}, \cdots d_{K}\right] \in R^{n \times K}(K>n)\) is learnt as below in SR [25]:

\(\langle D, X\rangle=\arg \min _{D, X}\|Y-D X\|_{2}^{2} \quad \text { s.t. } \quad \forall i,\left\|x_{i}\right\|_{0} \leq T\) (1)

where \(X=\left[x_{1}, x_{2}, \cdots x_{N}\right] \in R^{K \times N}\) represent sparse codes, while \(\|Y-D X\|_{2}^{2}\) is sparse error. T is a constraint factor. \(\|\cdot\|_{0}\) denotes L₀ norm [26].

Suppose D is fixed, the sparse representation x_i of y_i can be calculated as [27-29]:

\(x_{i}=x^{*}\left(y_{i}, D\right) \equiv \arg \min _{x}\left\|y_{i}-D x\right\|_{2}^{2} \text { s.t. }\|x\|_{0} \leq T\) (2)

Then, X and D are updated as \(\widetilde{X}_{i}\) and \(\widetilde{D}_{i}\) consistently, and thus:

\(E_{i}=Y-\widetilde{D}_{i} \widetilde{X}_{i}\) (3)

3. Presented Method

The overall framework of our technique is shown in Fig. 1. The first step is to develop a multi-feature local SR model for target to be tracked. Second, a tracker is developed. Furthermore, an adaptive dictionary update approach is designed for further robustness.

Fig. 1. Framework of our method.

3.1 Multi-feature Local Sparse Appearance Model

3.1.1 Training Samples Construction

First, we present to sample a number of templates for object to be tracked by using a patch-based scheme.

As shown in Fig. 2, a sliding window with the size of m× n is used to sample N target templates \(T=\left[t_{1}, t_{2}, \cdots, t_{N}\right]\) from the first frame I of an IR video.

Fig. 2. Training samples construction.

3.1.2 Target Gray Feature Extraction

Gray feature is the most widely selected characteristic for infrared image sequences [6], since infrared targets often possess higher gray values than static background areas.

For a target template t_i , we utilize the gray histogram to describe the gray characteristics of it. Suppose pixel locations in target area are \(\left\{x_{i}\right\}_{i=1, \cdots, M}\), gray histogram \(p=\{p(u)\}_{u=1}^{L_{G}}\) of target can be calculated by:

\(p(u)=\sum_{i=1}^{M} \delta\left(b\left(x_{i}\right)-u\right), \quad u=1, \cdots, L_{G}\) (4)

where (b)x_i is the gray mapping function of the pixel point x_i. L_G is the gray mapping level. Normally, the gray histogram needs to be normalized as \(\sum_{u=1}^{L_{G}} p(u)=1\).

By observing the Eq. (4), we find that the traditional gray histogram lacks of spatial position information of the pixel. In fact, different pixels in the target area make different contributions to the description of the target gray. The traditional gray histogram will cause the pixels that are closer to the target center and have greater contribution to gray description are not very prominent. Therefore, we use a weighting function [4] to include the spatial distribution of pixels in the histogram. The weighting function is described as:

\(k(r)=\left\{\begin{array}{cc} 1-r^{2}, & r<1 \\ 0, & r \geq 1 \end{array}\right.\) (5)

where r denotes the distance between the pixel and the center of target. By using such kernel function, we can obtain the modified gray histogram of the target template:

\(p(u)=C_{1} \sum_{i=1}^{M} k\left(\left\|\frac{x_{0}-x_{i}}{h}\right\|^{2}\right) \delta\left(b\left(x_{i}\right)-u\right), \quad u=1, \cdots, L_{G}\) (6)

where x_i denotes the position of a pixel in target area. x₀ denotes the central location of the target area. h denotes the size of target area. M represents the total number of pixels of target. C₁ is the normalization constant.

Thus, the probability density at each gray level of the target template can be computed by Eq. (6). And the corresponding gray feature vector of the target template can be contained. Then, the gray feature vectors of all target templates are quantified respectively, and a gray feature set with spatial location information can be formed:

\(P_{G}=\left[p_{1}, p_{2}, \cdots, p_{N}\right]\) (7)

where \(p_{j} \in R^{L_{G} \times 1} \quad(j=1,2, \cdots, N)\) denotes the gray feature vector of the j th target template.

3.1.3 Target Edge Feature Extraction

Although the gray feature, which is not sensitive to pedestrian translation, postural changes and partially occlusion, is an effective method for infrared target modeling, it has strong dependence on illumination and is easily affected by the background disturbances with its similar gray, which may bring about unsatisfactory results. Hence, edge feature is also utilized here to model the object structure.

For a template t_i, we design the edge direction histogram to describe its edge characteristics. Suppose the gray value of target is I(x) . G(x) and α(x) denote the edge strength and direction. \(\alpha(x) \in\left[0,360^{\circ}\right]\) is used to define the edge direction histogram \(q=\{q(v)\}_{v=1}^{L_{E}}\) of the target template:

\(q(v)=\sum_{i=1}^{M} \delta\left(b^{*}\left(x_{i}\right)-v\right), v=1,2, \cdots, L_{E}\) (8)

where \(b^{*}\left(x_{i}\right)\) denotes the edge direction mapping function. L_E denotes the edge direction mapping level [30],.

Similar to the modified gray histogram, the edge direction histogram is improved by using the kernel function in Eq. (5), so that anti-noise performance of it can be improved. The modified edge direction histogram is described by:

\(q(v)=C_{2} \sum_{i=1}^{M} k\left(\left\|\frac{x_{0}-x_{i}}{h}\right\|^{2}\right) \delta\left(b^{*}\left(x_{i}\right)-v\right), v=1, \cdots, L_{E}\) (9)

where C₂ is a normalization constant.

Consequently, we can extract the edge feature vector of each target template. Then, the edge feature vectors of all target templates are quantified respectively, and an edge feature set with spatial location information can be formed:

\(Q_{E}=\left[q_{1}, q_{2}, \cdots, q_{N}\right]\) (10)

where \(q_{j} \in R^{L_{E} \times 1} \quad(j=1,2, \cdots, N)\) denotes the edge feature vectors of the j th target template.

3.1.4 Target Combined Feature Generation

From the Eq.(7) and Eq.(10), we can find that, the j th column vectors of these two matrices represent the gray feature and edge feature of a target model, respectively. Subsequently, the gray feature and edge feature are vertical connected, so that the gray feature and edge feature of the same target model can be represented in the one column vector. Ultimately, a combined feature set can be formed:

\(\text {featset}=\left[\text {feature}_{1}, \cdots, \text {feature}_{N}\right]=\left[\begin{array}{l} P_{G} \\ Q_{E} \end{array}\right]=\left[\begin{array}{l} p_{1}, p_{2}, \cdots, p_{N} \\ q_{1}, q_{2}, \cdots, q_{N} \end{array}\right]\) (11)

where \(\text { featset } \in R^{L \times N}, L=L_{G}+L_{E}\) denotes the combined feature set of all target templates, \(\text { feature }_{j} \in R^{L \times 1}\) denotes the combined gray and edge feature vector of the j th target template. It is worth pointing out that the combination scheme is simple but effective. In the one aspect, the computational load of simple arraying in the form of vertical rows is very light. In the other aspect, since after obtaining the fusion results, the following step is to use these results to learn a reconstructive dictionary. Ultimately, the learned dictionary can well represent the infrared pedestrian objects.

3.1.5 Target Dictionary Learning

In this paper, we utilize the simple and efficient K-singular value decomposition approach to learn the target dictionary.

The corresponding objective function is shown in Eq. (1). It uses the iterative approach to update the sparse coding and dictionary. When the dictionary D is fixed, we use the OMP algorithm to calculate the sparse coefficients X of the feature set featset under the dictionary. When the sparse coefficient is fixed, we use the SVD method to update the dictionary D by column. The processes are iterated until the number of iterations reaches the preset value. Finally, the learned reconstructive dictionary \(D \in R^{L \times S}\) can be obtained.

Since the dictionary contains both of the gray feature and edge feature of the target templates, it can be used effectively to overcome the difficulties, such as posture changes, background noise, illumination and partial occlusion, when it models a tracked target.

3.2 Infrared Pedestrian Tracking

Based on the multi-feature local sparse appearance model learned above, we propose an infrared pedestrian tracker subsequently. The robust tracking of the target is to search the image regions which have the highest similarity in each frame. Sparse reconstruction residual analysis is applied to measuring the similarity.

(1) Sample a series of candidates for current frame.

• First, suppose that R_fr denotes the target region located in the last frame at position O^*. R_se is a region around the location, As shown in Fig. 3, red point denotes the O^*, red box indicates R_fr, and yellow box indicates R_se. It is worth pointing out that for the first frame, the target to be tracked is labeled manually and its location is recorded as O^*.

• Second, sample a number of candidates from the search neighborhood R_se . To handle the target scale variation problem, a multi-scale window scheme is applied in this process. The scales are set as β ∈[0.8,1.2], in steps of 0.1, of the previous target size.

• Finally, put h candidate targets that are obtained by the multi-scale window scheme into F :

\(F=\left[f_{1}, f_{2}, \cdots, f_{h}\right]\) (12)

where \(f_{g}(1 \leq g \leq h)\) denotes the g th candidate target.

Fig. 3. Candidates search region.

(2) Extract gray and edge features of each candidate target.

• First, normalize the size of each candidate target to the same size µ × µ , so as to get the unified feature dimension.

• Second, extract the gray feature vector and edge feature vector of each candidate target in the set F . Each of the two characteristic vectors is quantified and vertically connected to form \(f e a_{g}=\left[\begin{array}{l} p_{g} \\ q_{g} \end{array}\right]\), where 1≤ g ≤ h . p_g and q_g are gray and edge cues of the g th candidate. fea_g is the combined feature vector.

(3) Recognize the candidates using sparse reconstruction residual analysis.

• First, calculate sparse coding coefficients X_g for fea_g under dictionary D .

• Second, calculate the reconstruction error of each candidate target by:

\(\varepsilon_{g}=f_{g}-D_{g} X_{g}\) (13)

where ε_g denotes the reconstruction residual of the g th candidate target.

• Finally, compare the h reconstruction errors to screen out the minimum reconstruction error ε_m :

\(\varepsilon_{m}=\min \left[\varepsilon_{1}, \varepsilon_{2}, \cdots, \varepsilon_{h}\right]\) (14)

The candidate target corresponding to the minimum reconstruction error ε_m is then identified as the tracked target in the current frame image.

3.3 Adaptive Dictionary Update

In most tracking situations, target to be tracked may not remain the same. It may undergo illumination or appearance changes during the tracking process. Therefore, it is essential to update the dictionary while tracking, which will help the tracker work steadily. In fact, for target tracking, researchers have proposed various approaches with regard to target model update, most of which update the model by using the current frame’s tracking result [31]. However, if the result is contaminated, the updated model will be inaccurate and some errors may be introduced in the tracking process. When the errors are accumulated to a certain extent, serious drifting may occur.

To prevent the drifting problem, an adaptive dictionary update approach is designed. If the current target is not dirtied, the current target feature set is updated with the tracking result of the current frame; otherwise, it is not updated.

(1) Calculate the gray and edge features of the tracked target in the current frame.

• First, extract the tracked target for the current frame.

• Second, calculate the gray feature vector and edge feature vector of the tracked target, which are denoted by \(p_{c u r}=\left[p_{c u r}(u)\right]_{u=1}^{L_{G}}\) and \(q_{c u r}=\left[q_{c u r}(v)\right]_{v=1}^{L_{E}}\), respectively

(2) Judge whether the current target is dirtied.

• First, in Sections 3.1.2 and 3.1.3, the gray feature set \(P_{G}=\left[p_{1}, p_{2}, \cdots, p_{N}\right] \in R^{L_{G} \times N}\) and the edge feature set \(Q_{E}=\left[q_{1}, q_{2}, \cdots, q_{N}\right] \in R^{L_{E} \times N}\) of N target templates have been obtained, where \(p_{j}=\left[p_{j}(u)\right]_{u=1}^{L_{C}}(j=1,2, \cdots, N)\) denotes the gray feature vector of j th target template, and \(q_{j}=\left[q_{j}(v)\right]_{v=1}^{L_{E}}(j=1,2, \cdots, N)\) denotes the edge feature vector of j th target template.

• Second, compute the Bhattacharyya coefficients [6] ρ_gray,j between p_cur and p_j, and the Bhattacharyya coefficients ρ_edge,j between q_cur and q_j :

\(\rho_{g r a y, j}=\rho_{g r a y, j}\left[p_{c u r}, p_{j}\right]=\sum_{u=1}^{L_{G}} \sqrt{p_{c u r}(u) p_{j}(u)}\) (15)

\(\rho_{e d g e, j}=\rho_{e d g e, j}\left[q_{c u r}, q_{j}\right]=\sum_{v=1}^{L_{E}} \sqrt{q_{c u r}(v) q_{j}(v)}\) (16)

where \(j=1,2, \cdots, N\). Note that the Bhattacharyya coefficients ρ_gray,j is related to the gray feature, while the Bhattacharyya coefficients ρ_edge,j is related to the edge feature.

• Third, in Eq. (15) and Eq. (16), the larger ρ_gray,j or ρ_edge,j is, the more likely the current target is to be the j th target template. However, in different scenes, the discrimination ability of gray features and edge features may be different. So we propose to combine them together to determine their similarity:

\(\rho_{ {sum}, j}=w_{\ {gray}, j} \rho_{ {gray}, j}+w_{ {edge}, j} \rho_{ {edge}, j}\) (17)

where ρ_sum,j is the fused Bhattacharyya coefficient. w_gray,j and w_edge,j are the weights of ρ_gray,j and ρ_edge,j, respectively:

\(w_{g r a y, j}=\frac{\rho_{g r a y, j}}{\rho_{g r a y, j}+\rho_{e d g e, j}}\) (18)

\(w_{e d g e, j}=\frac{\rho_{e d g e, j}}{\rho_{g r a y, j}+\rho_{e d g e, j}}\) (19)

After such processing, we can draw a more precise conclusion that the higher ρ_sum,j is, the more likely the current target is to be the j th target template.

• Fourth, according to the above steps, we can obtain N Bhattacharyya coefficients:

\({Sim}=\left[\rho_{{sum}, 1}, \cdots, \rho_{{sum}, N}\right]\) (20)

• Finally, seek the maximum value \(\rho_{s u m, m a}=\max \left(\rho_{s u m, 1}, \cdots, \rho_{s u m, N}\right)\) from Sim. Compare ρ_sum,ma with a preset threshold th∈[0,1]. If ρ_sum,ma<th , it means that the current target is not similar to any template. In this case, the current result is thought to be dirtied and we do not use it for update.

(3) If the current result is not dirtied, the dictionary is updated.

• First, seek the minimum value \(\rho_{s u m, m i}=\min \left(\rho_{s u m, 1}, \cdots, \rho_{s u m, N}\right)\) from Sim .

• Second, replace the gray feature vector p_mi and edge feature vector q_mi of the mi th target template by p_cur and q_cur to get the updated feature set.

• Finally, update the dictionary every γ frames with the updated feature set.

4. Experimental Results

4.1 Experimental Setup

Experiments are done by MATLAB R2013b on an Intel Dual Core 2.3 GHz laptop with 4 GB RAM. The proposed multi-feature local sparse representation algorithm is tested and also compared with several classical tracking methods. The test infrared pedestrian sequences are gotten from the public OTCBVS database [32]. The size of each image is 120×160. This paper illustrates the experimental results of four representative infrared pedestrian video sequences that have various challenging factors in video tracking, including illumination change, occlusion, background disturbance and posture change. The specific information of the four image sequences is shown in Table 1.

Table 1. Infrared sequences information.

In our experiments, besides the qualitative evaluation, we also make the quantitative evaluation by using two criteria: tracking success rate as well as center location error [33, 34].

First, center location error for a frame i is used to measure the distance between the centers of the ground truth and tracking result (i.e., O_G and O ):

\(C L E_{i}=d_{i}\left(O, O_{G}\right)\) (21)

where \(d_{i}\left(O, O_{G}\right)\) denotes the Euclidean distance between O and O_G. For a whole image sequence, the center position error is calculated by:

\(C L E=\frac{1}{U} \sum_{i=1}^{U} C L E_{i}\) (22)

where U is the total number of frames in a video. From Eq. (22), we can see that a lower CLE means a higher tracking accuracy.

Second, the tracking success rate, which describes the percentage of frames precisely tracked in a sequence, is defined as:

\(T S R=\frac{U_{s u}}{U}\) (23)

where U_su denotes the number of frames that are processed successfully. A larger TSR means a better performance. The following measure is utilized to judge whether the tracking is successful in a frame:

\(\frac{\Omega_{T} \cap \Omega_{G}}{\Omega_{T} \cup \Omega_{G}} \geq \eta\) (24)

where Ω_T is the tracked box and Ω_G is the ground truth box. If Eq. (24) is met, the object is thought to be tracked successfully. η is a threshold controlling the tracking success rate.

4.2 Evaluation of Target Combined Feature Generation

The effectiveness of feature generation module is validated at first. The experimental sequence is Sequence Q3. Fig. 4 gives the tracking results of frames 11, 48, 61, 87 and 167 by the proposed method with gray, edge, and combined features, respectivley. As can be seen, the combined feature can enhance the robustness compared to gray or edge feature.

Fig. 4. Evaluation of combined feature generation.

4.3 Evaluation of Adaptive Dictionary Update

The effectiveness of adaptive dictionary update module is evaluated subsequently. Results with and without using the update step are compared, as shown in Fig. 5. As can be seen, our technique produces better tracking results, while drifting occurs when the method is without the dictionary update step.

Fig. 5. Evaluation of adaptive dictioanry update scheme.

4.4 Qualitative Evaluation

Our approach is compared with two classical tracking algorithms in this section. The first one is the sparse representation based method [15]. We refer to it as SRT. The second one is the classical particle filter based tracking algorithm [2]. We refer to it as PFT. Both of these two comparing methods are based on gray characteristic for infrared target tracking.

Fig. 6 shows some tracking results of Q1, in which a pedestrian target is walking outdoors on campus. As the target moves on, it undergoes illumination variation. From Fig. 6 (c), we find that PFT drifts from the 27th frame, and there is no correction in the following tracking process. SRT tracks the target a little better as shown in Fig. 6 (b). But there is a sign of drifting starting from the 70th frame, and then the tracking fails as the errors increase. On the contrary, the proposed algorithm overcomes the influence of the illumination variation and presents satisfactory tracking performance for infrared pedestrian target (Fig. 6 (a)).

Fig. 6. Comparison results of Q1 by different algorithms.

Q2’s results are given in Fig. 7. In this sequence, an infrared pedestrian target is walking under a forest background. During tracking, it is occluded by tree. From Fig. 7 (a), it can be seen that our algorithm recovers from occlusion. However, SRT and PFT fail in the tracking process (Fig. 7 (b) and Fig. 7 (c)).

Fig. 7. Comparison results of Q2 by different algorithms.

Fig. 8 gives the results of Q3, where the pedestrian is moving from middle to left. The target is first disturbed by another pedestrian with similar gray and then occluded by tree. Under such circumstances, both of SRT and PFT can hardly handle the problems, and then lose the target (Fig. 8 (b) and Fig. 8 (c)), while our method tracks the target successfully (Fig. 8 (a)) for it fuses gray and edge features into the local sparse representation framework.

Fig. 8. Comparison results of Q3 by different algorithms.

In sequence Q4, the target is moving from right to left under a dense forest background. As shown in Fig. 9, the target of interest meets a person who comes from the left side, and then they separate. This adds difficulties to the target tracking. Moreover, the target also undergoes the size and posture variations during tracking. Our technique tracks the object accurately throughout the video (Fig. 9 (a)). On the other hand, the traditional SRT and PFT methods lose the target completely (Fig. 9 (b) and Fig. 9 (c)).

Fig. 9. Comparison results of Q4 by different algorithms.

4.5 Quantitative Evaluation

Quantitative results are also compared for the above four sequences in Table 2 and Table 3.

First, from Table 2, we can see that the CLE values of our algorithm are lower than those of the other algorithms, which indicates that our method has higher tracking accuracy.

Second, from Table 3, it can be seen that the TSR values of our algorithm are much higher than those of the other algorithms, which further reveals that our algorithm has a better tracking performance.

Table 2. Center location errors (pixles) of three tracking methods for four different image sequences.

Table 3. Tracking Success Rates (%) of three tracking methods for four different image sequences.

5. Conclusion

A multi-feature local sparse representation scheme is proposed for infrared pedestrian tracking problems. Frist, we extract the gray and the edge features for the tracked infrared pedestrian target and fuse them together to learn an effective multi-feature local sparse appearance model, which is well used for describing the characteristics of the tracked target. Then, based on the learned model, a robust tracker with an adaptive dictionary learning technique is presented to track the object over time. The results show that our algorithm works well for infrared pedestrian target tracking problems. Future area for research includes the investigation of alternative features and tracking multiple infrared targets.

References

Masahiro Yasuno, Noboru Yasuda and Masayoshi Aoki, "Pedestrian Detection and Tracking in Far Infrared Images," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition Workshop, Washington, pp. 125, June 27-31, 2004.
Xin Wang and Zhenmin Tang, "Modified particle filter-based infrared pedestrian tracking", Infrared Physics & Technology, vol. 53, no. 4, pp. 280-287, July, 2010. https://doi.org/10.1016/j.infrared.2010.04.002
M. Sanjeev Arulampalam, Simon Maskell, Neil Gordon and Tim Clapp, "A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking," IEEE Transactions on Signal Processing, vol. 50, no. 2, pp. 174-188, February, 2002. https://doi.org/10.1109/78.978374
Katja Nummiaro, Esther Koller-Meier and Luc Van Gool, "An adaptive color-based particle filter," Image and Vision Computing, vol. 21, no. 1, pp. 99-110, January, 2003. https://doi.org/10.1016/S0262-8856(02)00129-4
Jiangtao Wang, Debao Chen, Haiyan Chen and Jingyu Yang, "On pedestrian detection and tracking in infrared videos", Pattern Recognition Letters, vol. 33, no. 6, pp. 775-785, April, 2012. https://doi.org/10.1016/j.patrec.2011.12.011
Xin Wang, Lei Liu and Zhenmin Tang, "Infrared human tracking with improved Mean Shift algorithm based on multi-cue fusion", Applied Optics, vol. 48, no. 21, pp. 4201-4212, July, 2009. https://doi.org/10.1364/AO.48.004201
Dorin Comaniciu, Visvanathan Ramesh and Peter Meer, "Real-time tracking of non-rigid objects using mean shift," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 142-149, June 15, 2000.
Changjiang Yang, R. Duraiswami and L. Davis, "Efficient mean-shift tracking via a new similarity measure," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 176-183, June 20-25, 2005.
Fabrizio Lamberti, Andrea Sanna and Gianluca Paravati, "Improving robustness of infrared target tracking algorithms based on template matching", IEEE Transactions on Aerospace and Electronic Systems, vol. 47, no. 2, pp. 1467-1480, April, 2011. https://doi.org/10.1109/TAES.2011.5751271
Suk Jin Lee, Gaurav Shah, Arka Aloke Bhattacharya and Yuichi Motai, "Human tracking with an infrared camera using a curve matching framework", EURASIP Journal on Advances in Signal Processing, vol. 2012, pp. 99, May, 2012. https://doi.org/10.1186/1687-6180-2012-99
Xin Wang, Chen Ning and Lizhong Xu, "Spatiotemporal Difference-of-Gaussians filters for robust infrared small target tracking in various complex scenes," Applied Optics, vol. 54, no. 7, pp. 1573-1586, July, 2015. https://doi.org/10.1364/AO.54.001573
Xianguo Yu, Qifeng Yu, Yang Shang and Hongliang Zhang, "Dense structural learning for infrared object tracking at 200+ Frames per Second," Pattern Recognition Letters, vol. 100, pp. 152-159, December, 2017. https://doi.org/10.1016/j.patrec.2017.10.026
C. S. Asha and A. V. Narasimhadhan, "Robust infrared target tracking using discriminative and generative approaches," Infrared Physics & Technology, vol. 85, pp. 114-127, June, 2017. https://doi.org/10.1016/j.infrared.2017.05.022
Xue Mei and Haibin Ling, "Robust visual tracking using l1 minimization," in Proc. Of IEEE Conference on Computer Vision, pp. 1436-1443, 2009.
Xu Jia, Huchuan Lu and MingHsuan Yang, "Visual tracking via adaptive structural local sparse appearance model," in Proc. Of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1822-1829, June 16-21, 2012.
John Wright, Yi Ma, Julien Mairal, Guillermo Sapiro, Thomas S. Huang and Shuicheng Yan, "Sparse Representation for Computer Vision and Pattern Recognition," Proceedings of the IEEE, vol. 98, no. 6, pp. 1031-1044, June, 2010. https://doi.org/10.1109/JPROC.2010.2044470
Guang Han, Xingyue Wang, Jixin Liu, Ning Sun and Cailing Wang, "Robust object tracking based on local region sparse appearance model," Neurocomputing, vol. 184, pp. 145-167, April, 2016. https://doi.org/10.1016/j.neucom.2015.07.122
Bohan Zhuang, Huchuan Lu, Ziyang Xiao and Dong Wang, "Visual tracking via discriminative sparse similarity map," IEEE Transactions on Image Processing, vol. 23, no. 4, pp. 1872-1881, April, 2014. https://doi.org/10.1109/TIP.2014.2308414
Xue Mei and Haibin Ling, "Robust visual tracking and vehicle classification via sparse representation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no.11, pp. 2259-2272, April, 2011. https://doi.org/10.1109/TPAMI.2011.66
Michael Elad and Michal Aharon, "Image denoising via sparse and redundant representations over learned dictionaries," IEEE Transactions on Image Processing, vol. 15, no. 12, pp. 3736-3745, November, 2006. https://doi.org/10.1109/TIP.2006.881969
Xin Wang, Siqiu Shen, Chen Ning, Mengxi Xu and Xijun Yan, "A sparse representation-based method for infrared dim target detection under sea-sky background," Infrared Physics & Technologu, vol. 71, pp. 347-355, July, 2015. https://doi.org/10.1016/j.infrared.2015.05.014
Tanaya Guha and Rabab K Ward, "Learning sparse representations for human action recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 8, pp. 1576-1588, August, 2012. https://doi.org/10.1109/TPAMI.2011.253
Xin Wang, Siqiu Shen, Chen Ning, Fengchen Huang and Hongmin Gao, "Multi-class remote sensing object recognition based on discriminative sparse representation," Applied Optics, vol. 55, no. 6, pp. 1381-1394, 2016. https://doi.org/10.1364/AO.55.001381
Jian Zhang, Debin Zhao and Wen Gao, "Group-based sparse representation for image restoration," IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 3336-3351, May, 2014. https://doi.org/10.1109/TIP.2014.2323127
Julien Mairal, Francis Bach and Jean Ponce, "Task-driven dictionary learning," IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4): 791-804, 2012. https://doi.org/10.1109/TPAMI.2011.156
Michal Aharon, Michael Elad and Alfred Bruckstein, "K-svd: An algorithm for designing overcomplete dictionaries for sparse representation," IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311-4322, November, 2006. https://doi.org/10.1109/TSP.2006.881199
StCphane G. Mallat and Zhifeng Zhang, "Matching pursuits with time-frequency dictionaries," IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397-3415, 1993. https://doi.org/10.1109/78.258082
Joel A. Tropp and Anna C. Gilbert, "Signal recovery from random measurements via orthogonal matching pursuit," IEEE Transactions on Information Theory, vol. 53, no. 12, pp. 4655-4666, December, 2007. https://doi.org/10.1109/TIT.2007.909108
Baiyang Liu, Junzhou Huang, Casimir Kulikowski and Lin Yang, "Robust visual tracking using local sparse appearance model and K-selection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2968-2981, December, 2013. https://doi.org/10.1109/TPAMI.2012.215
Paul Brasnett, Lyudmila Mihaylova, David Bull and Nishan Canagarajah, "Sequential Monte Carlo tracking by fusing multiple cues in video sequences," Image and Vision Computing, vol. 25, no. 8, pp. 1217-1227, August, 2007. https://doi.org/10.1016/j.imavis.2006.07.017
David A. Ross, Jongwoo Lim, RueiSung Lin and MingHsuan Yang, "Incremental learning for robust visual tracking," International Journal of Computer Vision, vol. 77, no. 1-3, pp. 125-141, May, 2008. https://doi.org/10.1007/s11263-007-0075-7
J. Davis and M. Keck, "A two-stage approach to person detection in thermal imagery, " in Proc.of Workshop on Applications of Computer Vision, January, pp. 364-369, 2005.
Dilip K. Prasad and Michael S. Brown, "Online tracking of deformable objects under occlusion using dominant points," Journal of the Optical Society of America A, vol. 30, no. 8, pp. 1484-1491, 2013. https://doi.org/10.1364/JOSAA.30.001484
Xin Wang, Siqiu Shen, Chen Ning, Yuzhen Zhang and Guofang Lv, "Robust object tracking via local discriminative sparse representation," Journal of the Optical Society of America A, vol. 34, no. 4, pp. 533-544, 2017. https://doi.org/10.1364/josaa.34.000533

KSII Transactions on Internet and Information Systems (TIIS)

Multi-feature local sparse representation for infrared pedestrian tracking

Abstract

Keywords

1. Introduction

2. Sparse Representation

3. Presented Method

3.1 Multi-feature Local Sparse Appearance Model

3.1.1 Training Samples Construction

3.1.2 Target Gray Feature Extraction

3.1.3 Target Edge Feature Extraction

3.1.4 Target Combined Feature Generation

3.1.5 Target Dictionary Learning

3.2 Infrared Pedestrian Tracking

3.3 Adaptive Dictionary Update

4. Experimental Results

4.1 Experimental Setup

4.2 Evaluation of Target Combined Feature Generation

4.3 Evaluation of Adaptive Dictionary Update

4.4 Qualitative Evaluation

4.5 Quantitative Evaluation

5. Conclusion

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)