# 1. Introduction

Gait Recognition is a technique to identify the people by the way they walk. The science of identifying people by physical characteristics is called biometrics. Biometrics is inherently a substantial field of science and technology for many reasons, not the least of which is the heightened need for security in ubiquitous applications. In the recent past, biometrics has been exploited for avant-garde explosive device forensics and for identifying suspected insurgents and terrorists. One of the most indispensable forms of biometrics is the recognition of gait and footprints of people. Unimodal and multimodal biometrics are the key modes of building a biometric system design. The design of biometric system is accountable to issues such as accuracy and speed, the acceptability of a person to use the system and strength of the system in preventing fraudulent approaches. Moreover, the system must have a relatively reasonable scale of universality and novelty. It is now apparent that a lone biometric is not sufficient to meet the requirements imposed by most of the large scale authentication systems. Multibiometric systems are introduced to alleviate some of the drawbacks of the unibiometric systems by combining the evidence generated by multiple biometric traits. Security and integrity is well preserved in multimodal biometrics. The problem of inaccuracy is dealt with and improved for better using multimodal biometrics [1].

Human gait is a spatio-temporal phenomenon that typifies the motion characteristics of an individual. A binary silhouette based gait representation using Gait Flow Image (GFI) is efficient for human identification. GFI generated by using an optical flow field is stronger in resisting the difference of the carrying condition and improves recognition rates compared with the other representations, such as Gait Energy Image (GEI) [2]. Background verification techniques like frame differencing, median filtering, kalman filter and mixture of Gaussian methods are used to identify background from foreground objects. Mixture of Gaussians provides good performance which are robust much enough to handle noise, climatic conditions and issues with segmentation of moving objects [3]. Radon Transform (RT) with Principle Component Analysis (PCA) has been applied for gait recognition purposes to detect features within an image and PCA is used to reduce dimension of the images without much loss of information [4-6]. An Image-based Visual Hull (IBVH) was computed from a set of monocular views and used to render virtual views for tracking and recognition. A canonical viewpoint was derived by examining the 3-D structure, appearance and motion of the moving person. Multiple cameras rendered simultaneously, and camera position helped in dynamic update as the person moved through the workspace. Each canonical view generated an image sequence which was passed to an unmodified face or gait recognition algorithm [7]. A new feature level fusion scheme to integrate information from side face and gait for recognizing individuals at a distance in video, carry more discriminating power than individual biometric features. Multiple Discriminant Analysis (MDA) is applied after the concatenation of face and gait features to generate discriminating features for improved recognition performance [8]. A gait recognition system for human identification by extracting two features, width vector of the binary silhouette and the MPEG-7-based region-based shape descriptors was also employed where, human and other moving objects were extracted by estimating background information by a Gaussian Mixture Model (GMM) and subsequently, median filtering operation being performed for removing noises in the background subtracted image [9]. Gait recognition approaches such as Wavelet Descriptor with ICA and Hough transform with PCA have also been compared and discussed in the past [10]. A gait recognition algorithm was proposed that fuses static spatiotemporal templates of sequences of silhouette images and their motion, the motion silhouette contour templates (MSCTs) and static silhouette templates (SSTs). MSCTs and SSTs captured the motion and static characteristic of gait whose templates could be computed from the silhouette sequence. SOTON Data Set and USF Data Set were used to evaluate the same [11]. Another method suggested the detecting of binary silhouette of a walking person from each frame, the feature of which was extracted using image processing operation. Neural network was used for training and testing purpose with all experiments done on CASIA gait database. Gait recognition with 96.32% was observed using this method [12]. Gait energy image combines frames of one gait cycle together to enhance the relevance among them, to reduce the noise interference. A Fourier transform was carried out on the gait energy image. The corresponding low frequency components were utilized to perform multi-view gait identification. The algorithm was applied to CASIA database. The experiment results demonstrated that a fast and an effective response [13]. Footprint image is unique, easy to capture and invariant. The image enhancement using wiener filter provides the highest accuracy among all other filtering techniques. The algorithms used in footprint recognition are categorized into three stages namely image pre-processing, extraction of feature and matching of templates. The footprint images are extracted using Gabor filter and Discrete Wavelet Transform (DWT). The footprint features obtained from the multiple feature extraction techniques are classified and found to be very accurate [14-16]. Recent years foot step based recognition has attracted many researchers as it enables unobtrusive user authentication. Firstly the ground reaction force data was extracted using a force plate and then discrete wavelet transform for data processing. Finally the histogram was used to identify individual footsteps [17]. Footprint images are measured using a pressure- sensing mat and normalized both in direction and position for robustness image matching [18]. The modified Haar wavelet features are extracted from preprocessed footprint image. The high matching scores obtained increases the performance of the recognition system [19].

In this paper, the method elaborated has proven to set out valuable and efficient results by preserving the security, integrity and accuracy of the feature extracted. The proposed method makes a simple, yet efficient attempt at gait recognition. For each video file, an elevated background subtraction procedure is first used to accurately extract spatial silhouettes of a walker from the background using Gaussian Mixture Model (GMM). Here GMM is used as a parametric probability density function represented as a weighted sum of Gaussian component densities. Eventually, the relevant features are extracted from the silhouette tracked from the given video file using the Principal Component Analysis (PCA) method. The Fisher Linear Discriminant Analysis (FLDA) classifier is used in the classification of dimensional reduced image derived by the PCA method for gait recognition. The minutiae is extracted from the footprint and then fused with silhouette image using the Discrete Stationary Wavelet Transform (DSWT). The experimental result shows that the efficiency of proposed fusion algorithm works well and attains better result while comparing with other fusion schemes.

# 2. Gait Recognition

Gait is the peculiar way one walks and is a complex spatio-temporal behavioral biometrics. Video-based gait recognition is mainly applicable in surveillance systems such as recognizing an unlawful person from a security camera video. The gait of an individual is known to differ from person to person and to be fairly stable; whereas intentional imitation of another person’s gait is complicated

**Fig. 1.**Block diagram of proposed gait recognition system

## 2.1 Background subtraction

The pre-processing of the input video file is done by isolating the foreground or moving objects from the background scene. Typically, the usual approach for discriminating moving object from the background scene is background subtraction. The Gaussian Mixture Model (GMM) is used to extract the silhouette images using the background subtraction model. The GMM is the probability density function where the density is composed of a sum of Gaussians. It is also described that the background modelling by Gaussian mixtures is a pixel based approach. Hence the pixel distribution is modelled as a mixture of N Gaussians. Let k be a random process which represents the value of a given pixel in time. Then the probability density function p(k) of a Gaussian mixture comprising N component densities is described as

Where Wn denotes the weights and N(k; μn σn), η is a Gaussian probablility density function describinges the normal density of mean μn and covariance matrix σn. The mixture of Gaussian algorithm estimates these parameters over time to obtain a robust representation of the background. Initialization of the parameters is done with wn = wo , μk = μo and σn = σo . The matching of foreground and background objects is described as

If the threshold value τ is positive τ(>0), then Parameters of the Gaussian mixture is updated as follow:

For the matching component j, Mn (t) is equal to 1 and 0 otherwise. If the component does not match, then the lowest weight is re-initialized with, Wn = W0 , μn = k and σn = σ0 . However, the learning rate α is constant and β is defined as:

The normalization of weights wn is performed to add up to 1. To find the set {1... B} of Gaussians modelling the background, a threshold λ T is applied to the cumulative sum of weights defined as:

GMM can cope with multimodal background distribution and it is able to filter noise during image differentiation which in turn provides a selective level of detail for the contour of the moving shapes.

## 2.2 Feature extraction using PCA

The Principal Component Analysis (PCA) is an appearance based statistical method used for feature extraction, data compression, redundancy removal and prediction. PCA allows us to compute a linear transformation that maps data from a high dimensional space to a lower dimensional space. PCA is used to identify the patterns in data, and express it in a way to highlight their similarities and differences. The data cannot be represented graphically as patterns in data are in high dimensional space. Simplifying the data structure will account for as much of the total variation in the original data as possible. The Empirical mean is given by

Where, the mean vector u is of dimension Mx1. The deviations from the mean can be calculated, by subtracting the empirical mean vector u from each column of the data matrix X which is described as X=K − μh , where K be the M*N matrix and h is a 1*N row vector of all 1’s.

The goal of using PCA is to represent most of the variation of the original variables using only a few “principle components”. To calculate the PCA, a set of feature vectors are created by placing all the original data for a given configuration in a single vector correlation matrix C. The correlation matrix C is a symmetric matrix that helps to reduce the computation when calculating the eigenvectors and eigenvalues.

The mean value of the vector is defined as

A set of special vectors is then used to represent the correlation matrix, which satisfies the following equation:

These vectors are described as eigenvectors and each eigenvector ex has an associated eigenvalue λx . The representation of the largest inherent variation in the original data set is given by the largest eigenvalues of the correlation matrix which tells us most about the original data. At the end process if required, the eigenvectors can be rearranged back into the form they were derived from. The limitation of PCA is that it eliminates the dimension that is best for discriminating positive from negative cases, because it is an unsupervised algorithm.

## 2.3 Classification

FLDA tries to find a mapping from the high-dimensional space to a low-dimensional space in which the most discriminant features are preserved. It accomplishes this by minimizing the variation within the same class and maximizing the variation between classes. It achieves this by minimizing the variation within the same class and maximizing the variation between classes [20]. The Between Scatter Matrix is expressed in mathematical terms as follows:

Consider that each pattern in the learning set belongs to one of n patterns (m1m2,...,mn ). From the patterns given above, the Within Scatter Matrix is defined as:

Where, Sw is the mean of class i, m is the number of cases and the superscript T indicates a transpose action. The objective of FLDA is then to find Uopt maximizing the ratio of the between-class scatter to the within-class scatter

Finding the maximum Uopt could be tricky, but fortunately it is known that the solution can be found in a relatively simple method. Where Uopt is the solution of the following conventional eigenvalue problem:

Where Q is known as a diagonal matrix and its elements are the eigenvalues. The column vectors of matrix U are eigenvectors corresponding to the eigenvalues.

# 3. Footprint Recognition

The footprint recognition can be carried out by using any one of the two important features namely static and dynamic. In our proposed method, the footprint images are acquired using web Camera. Dynamic footprint Images can also be obtained using mat-type pressure sensor.

Static feature requires stand-up posture at fixed position every time from the subject, whereas the dynamic feature deals with the walking behavior. The constraint on user posture can be released when dynamic feature is taken into account. Fig. 2 shows the working flow of footprint recognition system.

**Fig. 2.**Flow diagram of footprint recognition

## 3.1 Pre-processing of footprint image

It is first necessary to apply several pre-processing steps to the original footprint image to produce consistent results in the classic minutiae extraction procedure. ROI of the footprint image is based on square-based segmentation. The footprint images are oriented and translated before the segmentation process. The algorithm followed to acquire the square-based segmentation is given below.

1) Median filter is used to remove noise from footprint image. 2) Thresholding is done to convert the binary image. 3) Using the algorithm of contour tracing, get the contour of the footprint image and locate the two valley points. 4) Join the points using two point line forms.

5) Plot the points on the contour of footprint at an angle of 45° and 55° with the straight line L1 and L2 as reference. The orientation can be varied to extract the center point at different dimensions.

**Fig. 3.**Square based segmented footprint image

**Fig. 4.**Binarized and skeletonized footprint image

Obtain the midpoint by taking a mean between the coordinate point and reference point on both sides. Using the strel command in MATLAB, get the region of interest of footprint and crop the center part of the footprint image.

## 3.2 Minutiae extraction and matching

After the image pre-processing is complete, the minutiae information can be extracted and stored. This information consists of the following location within the image, orientation angle, termination or bifurcation. Ridge thinning is used to eliminate the redundant pixels of ridges till the ridges are about one pixel wide. A parallel thinning algorithm which is iterative in nature is used. In each scan of the footprint image, the algorithm marks down redundant pixels in each small image window (3×3). And finally removes all those marked pixels after several scans. After the footprint ridge thinning, marking minutiae points is carried out by the concept of Crossing Number (CN) to extract the minutiae [21]. The false minutiae are then checked for to retain the accuracy. The matching process employs comparison of one set of minutiae data to another. Most often, this process compares an input data set to a previously stored data set with a known identity, referred to as a template.

# 4. Image Fusion using DSWT

Feature level fusion is the process of combining two feature vectors using Discrete Stationary Wavelet Transform (DSWT). It fuses two multi-focused images into a single image. Wavelet transforms are easy to use, with each transform producing a symbolic transform representation that makes it easy to access, manipulate, and visualize the transform coefficients. Wavelet transforms present a scheme in which a signal is decomposed, where every level corresponds to a coarser resolution or lower frequency band. The Stationary Wavelet Transform (SWT) is a wavelet transform algorithm founded to compensate for the void of translation invariance of the DWT. Shift-invariance is substantial to correlate and combine wavelet coefficient images. Without shift-invariance, slight shifts in the input signal will invoke variations in the wavelet coefficients that might introduce artifacts in the reconstructed image. Decimation process causes shift variance and can be resolved by employing the un-decimated algorithm. The SWT is a redundant scheme as the output of each level of it carries the same number of samples as the input, hence for a decomposition of N levels there is a redundancy of N in the wavelet coefficients. Fig. 5 shows working of the fusion algorithm in a systematic way. In this paper both the images undergo three levels of transformations and four coefficients are produced in each level. After three levels of decomposition every image generates twelve coefficients. The coefficients of both images which are decomposed are fused by various arithmetic operations in order to obtain the fused coefficients which are required to construct the final result. Finally, the multilevel 2D Stationary Wavelet reconstruction is performed by either a specific orthogonal wavelet or specific reconstruction filters.

**Fig. 5.**Fusion of gait and footprint biometric

# 5. Results and Discussion

Experimental simulations were conducted to evaluate the performance of the proposed system using our own database. we use a high resolution digital video camera to capture the gait sequence. The footprint images are acquired using web camera. The input video stream contains frame width and height of about 320 and 240 respectively to capture the gait sequence. The frame rate is chosen as 30 frames per second. GMM is used to extract the binary silhouettes from the gait sequence. Fig. 6 shows the isolated foreground objects from input video file which is obtained with the help of Gaussian parameters. PCA is then used to extract the silhouette image features and transforms a high dimensional data space into a lower dimensional data space. FLDA classifier combined with PCA provides optimal linear dimensionality reduction and finally results in minimal process time.

**Fig. 6**Samples of silhouette images extracted

The ridge endings and bifurcations are extracted from the skeleton image by examining the local neighbourhood of each ridge pixel using a 3×3 window. Fig. 7 shows the extracted feature of footprint representing the foot of the sole. Then the matching of minutiae is performed by detecting the edges and centroid of bifurcations. The image is set at a particular gray level to obtain the binary image and then an iterative parallel thinning algorithm is applied until all lines are one pixel wide. The threshold value used here to binarize the image is 95 and then the thinning iteration is applied until all lines are one pixel wide.

**Fig. 7.**Minutiae of footprint image

**Fig. 8.**Fusion of gait and footprint using DSWT

The proposed fused scheme of wavelet transform is compared with score level fusion to attain the better result of about 97. 6% as shown in Table 1.

**Table 1.**Performance comparison of multimodal system

The classification rate measured with the Cumulative Match Score (CMS) achieved above 96% as shown in the Fig. 9. The feature level fusion is compared with the matching score level fusion scheme of Sum rule and Max rule[8]. The comparison on different recognition systems shows that the proposed method delivers maximum accuracy as shown in the Fig. 10

**Fig. 9.**The CMS curve of gait and footprint biometric

**Fig. 10.**Comparison on different recognition systems

# 6. Conclusion

The fusion of gait and footprint is promising in real world application because of their individual characteristics. Compared to gait, footprint images are readily interpretable by humans that allow people to confirm whether a biometrics system is functioning correctly. However, the problem is the minutiae of footprint that should be extracted with accuracy. But, the fusion system is reasonably more robust compared to a system that employs only one biometric parameter. In this paper, the Discrete Stationary Wavelet Transform (DSWT) is introduced to integrate the information from footprint and gait. The DSWT also overcomes the lack of translation invariance of the Discrete Wavelet Transform. The experimental result shows that the DSWT generates efficient results in multimodal biometric system, considering the advantages that a multimodal biometric system lays. The efficiency of this work can be improved by increasing the 3-D output arrays in fusion algorithm or by adding any other biometrics in future.