DOI QR코드

DOI QR Code

A Model Stacking Algorithm for Indoor Positioning System using WiFi Fingerprinting

  • JinQuan Wang (School of Electronic and Information Engineering, Changchun University of Science and Technology) ;
  • YiJun Wang (School of Electronic and Information Engineering, Changchun University of Science and Technology) ;
  • GuangWen Liu (School of Electronic and Information Engineering, Changchun University of Science and Technology) ;
  • GuiFen Chen (School of Electronic and Information Engineering, Changchun University of Science and Technology)
  • Received : 2023.01.04
  • Accepted : 2023.03.23
  • Published : 2023.04.30

Abstract

With the development of IoT and artificial intelligence, location-based services are getting more and more attention. For solving the current problem that indoor positioning error is large and generalization is poor, this paper proposes a Model Stacking Algorithm for Indoor Positioning System using WiFi fingerprinting. Firstly, we adopt a model stacking method based on Bayesian optimization to predict the location of indoor targets to improve indoor localization accuracy and model generalization. Secondly, Taking the predicted position based on model stacking as the observation value of particle filter, collaborative particle filter localization based on model stacking algorithm is realized. The experimental results show that the algorithm can control the position error within 2m, which is superior to KNN, GBDT, Xgboost, LightGBM, RF. The location accuracy of the fusion particle filter algorithm is improved by 31%, and the predicted trajectory is close to the real trajectory. The algorithm can also adapt to the application scenarios with fewer wireless access points.

Keywords

1. Introduction

In this era of rapid development of Internet of Things, artificial intelligence and data mining technology, location-based services are becoming more and more important. At the same time, the wide application of robots and unmanned aerial vehicles also promotes the demand for obtaining location information. In an outdoor scene, the global satellite navigation system has excellent performance and can meet location-based services. However, in the indoor positioning scene of the building, the satellite signal will be seriously blocked, which will affect the continuous and accurate position measurement capability[1].With the continuous updating of Wireless Local Area Network(WLAN)technology, the fingerprint location technology based on WIFI has the characteristics of low cost and high accuracy, and has gradually become the main location technology in the indoor location system. Fingerprint positioning technology is mainly divided into offline and online phase[2]. In the offline phase, an offline fingerprint database with mapping relationship between signal strength values and location coordinates is established. The fingerprint database is matched by the signal strength values collected in real time[3].When WLAN covers the whole room, the location information can be obtained by software only, and no additional hardware facilities need to be laid out. Nevertheless, it is susceptible to non-line-of-sight interference and multi-path effects, which may cause some fingerprints and locations in the database to have a non-linear relationship and affect the positioning accuracy. Therefore, some machine learning algorithms such as KNN[4], Random Forest[5] are widely used in fingerprint matching, which can effectively improve the positioning accuracy and robustness.

In [6], ZHANG fused the encoder feature extraction algorithm with the features extracted by the gradient boosting decision tree (GBDT). Constructing a hybrid model improves the robustness of the system. In [7], Own distinguishes between NLOS or LOS environments by using a support vector machine model. It also improves the positioning accuracy by predicting the user's location through WiFi 2.4G and 5G signals using the capsule network. In [8], Luo used KNN algorithm combined with linear discriminant analysis method to obtain information of the target location, which reduces the system complexity. In [9], Ma proposed an improved weighted fusion WiFi indoor localization algorithm. The algorithm was based on the traditional location fingerprinting. It includes two phases: offline acquisition and online location. The offline acquisition process selects the optimal parameters to complete the signal acquisition and forms a fingerprint database through classification and processing. Finally, the location information was obtained by KNN.

Although machine learning provides a new way to improve indoor positioning systems, current research methods still have some limitations in improving the localization accuracy and generalization ability as a database matching. And there is a certain error accumulation problem when it needs acquiring position information of indoors moving targets. Based on the above analysis, to effectively improve the positioning accuracy and solve the dynamic target error accumulation problem, this paper proposes a fusion method. In the offline phase, the fingerprint database is trained using an improved stacking (model stacking) algorithm. In predicting the target phase, the stacking algorithm is used to obtain the location information of the target. Finally, the stacking algorithm cooperates with particle filtering to achieve precise localization.

2. WIFI-Based Location Fingerprinting System

The establishment of the correspondence between location and fingerprint is usually performed in the offline phase. In Fig. 1, the geographical area is covered by a rectangular grid, which in this scene is a grid of 4×8. Two wireless access points are arranged in Fig. 1, and the signal strength values from the two wireless access points are collected at each grid point[10]. In the phase of location information acquisition, a mobile device is in the middle of the geographical area and without exact location, it is not even on a grid point. When this mobile device measures the signal strength value from 2 Aps, the measurement results of the received signal strength values are transmitted to the network. To determine the location of the mobile device, it is necessary to find the fingerprint that matches the signal strength vector optimally in the fingerprint database. When the best matching grid point is found, the location of the mobile device is estimated as the location corresponding to this matching fingerprint.

E1KOBZ_2023_v17n4_1200_f0001.png 이미지

Fig. 1. Fingerprint localization method based on WIFI signal strength

3. Indoor Positioning Algorithm Based on Model Stacking

3.1 Positioning Framework

The indoor positioning framework of this paper is shown in Fig. 2. First, an offline fingerprint database is collected in the desired location scenario. Then the collected fingerprint database is trained by the model stacking method. The mapping relationship between the location in the database and the collected fingerprints (signal strength values) is obtained. In the localization phase, the signal strength values received in real time are matched in the model and obtain the location information predicted by the stacking. After that, Accurate estimation is performed based on the particle filtering algorithm.

E1KOBZ_2023_v17n4_1200_f0002.png 이미지

Fig. 2. Stacking-pf algorithm-based positioning framework

3.2 Bayesian Optimization Based Model Stacking

Stacking first trains the primary learner from the original dataset [11]. Then a new dataset is generated for training the secondary learner. In this new dataset, the output of the primary learner is treated as features of the new data, while the tags of the original dataset are still treated as labels of the new dataset. The stacking framework is used to generalize the output of multiple models to get an overall improvement in prediction accuracy. The stacking implementation framework is shown in Fig. 3.

E1KOBZ_2023_v17n4_1200_f0003.png 이미지

Fig. 3. Model Stack Implementation Framework

In the training stage, the secondary training set is generated by the primary learning device. If the training set of the primary learner is used to generate the secondary training set, the model will face the risk of overfitting. Therefore, the original dataset is partitioned by crossvalidation. According to extensive experiments, the primary model in this paper uses RF (Random Forest), GBDT (Gradient Boosting Decision Tree) and Xgboost (eXtreme Gradient Boosting), and the secondary model uses LightGBM (Light Gradient Boosting Machine). The secondary model is optimized with the Bayesian optimizer. We use the fingerprint dataset collected online for position prediction using the trained stacking model. The position of the model stack prediction is used as an observation position, and the observation location is input into the particle filter to get the final location information. The fusion framework is shown in Fig. 4.

E1KOBZ_2023_v17n4_1200_f0004.png 이미지

Fig. 4. Model stacking approach to fusion particle filtering overall framework

3.2.1 Primary Modeling

For the collected offline fingerprint dataset D={(𝑠𝑖, 𝑇𝑖)}(|𝐷| = 𝑁, 𝑠𝑖𝑅M, 𝑇𝑖𝑅2), where 𝑠𝑖=[𝑠𝑖1, 𝑠𝑖2, … , 𝑠𝑖𝑀] is a vector of M RSS measurements, denotes the RSS fingerprint of the 𝑖𝑖th RP, 𝑇𝑖=[𝑥𝑖, 𝑦𝑖] is the physical coordinate of the 𝑖th RP position. The fingerprint dataset is input into the master model according to the cross-validation firstly. The primary model modeling is as follows:

1)RF

The Random Forest (RF) model is a decision tree based on classifier [12]. It uses the bagging method to get the base classifier after sampling the training dataset N times [13]. When the dependent variable of the training sample set is a continuous variable, RF modeling is represented:

𝑓(𝑠𝑡) = majority{𝑇𝑖(𝑠𝑡)}𝑖=1Ntree       (1)

where 𝑇𝑖 is the set of trees, 𝑖 = 1,2, … , Ntree, and𝑠𝑡is the sample data to be tested.

2) GBDT

GBDT is an integrated machine learning algorithm composed of a Boosting framework combined with decision trees as primary learners, which can be used for regression and classification [14]. The algorithm can handle data with outliers by using a loss function. And the GBDT algorithm does not require high parameters [15]. The model is expressed using regression tree modeling as:

\(\begin{aligned}\mathrm{F}_{0}(\mathrm{r})=\operatorname{argmin} \sum_{\mathrm{i}=1}^{\mathrm{N}} \mathrm{L}\left(\mathrm{T}_{\mathrm{i}}, \tau\right)\end{aligned}\)       (2)

where τ is expressed as the predicted value of the position coordinates of the ith reference point. L is the loss function of the model.

3) XGBOOST

XGBoost (eXtreme Gradient Boosting) is a GBDT-based algorithm [16-17]. The predicted value of the i-th position in the M-th tree can be expressed as:

\(\begin{aligned}\widehat{P}_{i}^{(M)}=\sum_{M=1}^{M} f_{M}\left(s_{i}\right), f_{M} \in F\end{aligned}\)       (3)

where fM(si) is the predicted value of the kth tree for sample si. The objective function can be modeled as:

\(\begin{aligned}Y=\sum_{i=1}^{N} L\left(P_{i}, \widehat{P}_{i}^{(M)}\right)+\sum_{M=1}^{M} \Omega\left(f_{M}\right)\end{aligned}\)       (4)

where L denotes the loss function, it is the training error of the sample, which indicates how well the model matches the training set. Ω(fM) is used to control the complexity of the model and prevent overfitting.

3.2.2 Secondary Model Modeling

1) Sub-model based on Bayesian optimization

The three models will get a set of predictions under the method of cross -verification. Then, we form a new data set T = (hi, pi) in the location label of this set of prediction values and original data sets. hi = (result1, result2, result3) is the prediction value of the primary model. Finally, the new dataset S uses the secondary model LightGMB training[18].

The specific process is as follows:

Step 1:The objective function in this model is a combination of the loss function and the complexity of the tree. And the objective function is subjected to a second-order Taylor expansion. Finally, the first-order derivatives and second-order derivatives of the objective function with respect to the sample eigenvalues are obtained.

Step 2:Based on the label values and the historical decision tree leaf node scores, the first-order derivatives and second-order derivatives are calculated for each sample point corresponding to each feature.

Step 3:The feature values of all samples are used as segmentation points of each layer node, and the score of each segmentation point is calculated, and the feature with the largest score is used as the optimal segmentation point of that layer.

Step 4:Loop step 3, the score of each leaf node is calculated, and when the preset value is reached, the construction of this decision tree is completed.

Step 5:Accumulate the scores of the leaf nodes of the historical decision tree, update the predicted values, and loop steps 2~5 to complete the construction of the next tree.

Step 6:The construction of the model is stopped when a preset minimum precision is reached. Finally, all decision tree scores corresponding to the leaf nodes of this sample are accumulated.

Because three models in the primary models are independent of each other, the data distribution predicted by the primary model is very different. It is hard to achieve optimal parameters in the secondary model. Therefore, we use Bayes to optimize the secondary model LightGBM. In the model LightGMB, the relationship between input and output can be represented by a function. Suppose there is a function f:x→R, expressed in x ⊆ X as:

𝑥∗ = argmin 𝑓(𝑥)       (5)

where 𝑥 denotes the model LightGBM hyperparameter.

The pseudo code of Bayes optimized the secondary model is as follows:

Algorithm: Bayesian algorithm implementation process

Input: f, X, s, M

D⬅InitSamples(f, x)

For i⬅|D| to T do

p(y|x, D)⬅FitModel(M, D)

xi⬅[argmax]xX S(x, p(y|x, D))

yi⬅f(xi) △ Expensive step

D ⬅D∪ (x, yi)

End for

f denotes the LightGBM model, i.e., a set of hyperparameters are input to obtain the output values. X stands for hyperparameter space. D is the data set consisting of (x,y), where x is a hyperparameter combination and y denotes the output. S stands for acquisition function, which is used to pick x. M denotes the model obtained after training by D. In this paper, the Gaussian model is chosen. D←InitSamples(f,x) means to add the input and output to the data D. T denotes a fixed number of evaluations to prevent excessive computation which is form causing significant resource consumption.

There are two major elements of Bayesian optimization. It is expressed as a probabilistic model and a collection function. Probabilistic models are used to evaluate the objective function. The acquisition function is used to select the evaluation points. The set function is the optimal super parameter combination after a post-testing probability of the data set. Expected improvement is chosen as the acquisition function. This function belongs to the lifting-based strategy. The position where the optimal function value decreases at the current observation moment is used as the next evaluation point. LightGMB parameters based on Bayesian optimization are as shown in Table 1.

Table 1. Bayesian optimization based sub-model LightGBM parameters

E1KOBZ_2023_v17n4_1200_t0001.png 이미지

2)Collaborative Positioning

When the target is moving indoors, it generates cumulative errors due to its irregular motion. Therefore, we use particle filtering to further optimize the model stacking method.

The particle filtering algorithm is an approximate representation of the probability density function by finding a set of random samples that propagate through the state space. The minimum variance estimate of the system state is obtained by replacing the integration operation with the sample mean. It gets rid of the constraint that random quantities must satisfy a Gaussian distribution when solving nonlinear filtering problems. It can express a wider range of distributions than the Gaussian model and also has a stronger ability to model the nonlinear properties of the variable parameters. Thus, particle filtering can express the posterior probability distribution based on the observed and control quantities more accurately. It can be a good solution to the nonlinear problem in indoor positioning.

First, when the target is moving, the model stacking predicts the position based on the signal strength values received by the moving target in real time. Secondly, in this paper, the position predicted by the model stacking method is used as the position observation for particle filtering. Particle filtering is an approximate Bayesian recursive filtering algorithm based on Monte Carlo simulation [19]. The core idea is to approximate the probability density function of the system random variables with some discrete random sampling points. The sample mean is used instead of the integration operation. Thus, the minimum variance estimate of the state is obtained. Particle filtering is a filtering method with high adaptive capability. Not only it can handle linear Gaussian noise, but also nonlinear and non-Gaussian noise. The model stacking fusion particle filtering flow chart is shown in Fig. 5.

E1KOBZ_2023_v17n4_1200_f0005.png 이미지

Fig. 5. Model stacking fusion particle filtering algorithm flow chart

4. Experiments and Discussion

4.1 Experimental Environment

This paper simulates the creation of a room with 6 wireless access points. The experimental room parameters are shown in Fig. 6. The fingerprint data set is the signal strength values of six wireless access points collected from all grid points in the room. In the positioning phase, moving targets make random movements inside the room.

E1KOBZ_2023_v17n4_1200_f0006.png 이미지

Fig. 6. Simulation of positioning experimental scenes

4.2 Results Analysis

Fig. 7 shows the performance comparison of the proposed stacking method with KNN, Random Forest, GBDT, Xgboost, and LightGBM localization algorithms for the estimating Cumulative Distribution Function (CDF) of the error [20-22]. All algorithms are trained under Bayesian optimization. The specific parameters are shown in Table 2. It can be seen from Fig. 7 that the stacking localization algorithm obtains the best localization results. In terms of positioning accuracy, the stacking algorithm is greater than 80% within 2m for positioning error. And KNN, Random Forest, GBDT, Xgboost, and LightGBM are all less than 80%. The random forest-based localization algorithm is the least effective. The average localization accuracy of the machine learning algorithms is given in Table 3. The positioning accuracy of the stacking method is 1.99m. It increases 17.4%, 26.6%, 13.9%, 13.1%, and 13.5% compared with 2.41m of KNN localization algorithm, 2.71m of Random Forest localization algorithm, 2.31m of GBDT localization algorithm, 2.29m of Xgboost localization algorithm, and 2.30m of LightGBM localization algorithm, respectively. Based on the above, the model stacking method works well.

E1KOBZ_2023_v17n4_1200_f0007.png 이미지

Fig. 7. CDF comparison of model stacking, KNN, random forest, GBDT, Xgboost, and LightGBM localization algorithms​​​​​​​

Table 2. Parameters after Bayesian optimization

E1KOBZ_2023_v17n4_1200_t0002.png 이미지

Table 3. Average accuracy comparison of model stacking, KNN, random forest, GBDT, Xgboost, and LightGBM localization algorithms

E1KOBZ_2023_v17n4_1200_t0003.png 이미지

Fig. 8 compares the cumulative distribution function of the estimation error of the stacking collaborative particle filter localization algorithm with that of the stacking algorithm. Stacking-pf (Model Stacking Particle Filtering) can control the positioning error within 2m at more than 90% percentile, which is better than the stacking positioning algorithm. And the error in the positioning process does not exceed 3m. A comparison of the average positioning accuracy of Stacking-pf and the stacking method is shown in Table 4. The average localization accuracy of the stacking-pf algorithm is 31% higher than that of the stacking algorithm.

E1KOBZ_2023_v17n4_1200_f0008.png 이미지

Fig. 8. CDF comparison of stacking-pf and model stacking methods​​​​​​​

Table 4. Comparison of average accuracy of algorithms​​​​​​​

E1KOBZ_2023_v17n4_1200_t0004.png 이미지

Fig. 9 compares the algorithm proposed in this paper with KNN, Random Forest, GBDT, Xgboost, and LightGBM[23]. The analysis shows that the Stacking-pf method and the stacking method can control the error within1m by maximum probability, which is better than other localization algorithms. The main reason is that the model stacking algorithm can predict the position very accurately, while the particle filter has the data processing capability of a non-Gaussian nonlinear system.

E1KOBZ_2023_v17n4_1200_f0009.png 이미지

Fig. 9. Comparison graph of probability density functions​​​​​​​

Comparing the predicted trajectory with the real trajectory can show the advantage of the algorithm more clearly. Thus, four randomly intercepted run traces are shown in Fig. 10. The figure shows the motion trajectories of the stacking method and the Stacking-pf method. Analysis of the trajectory diagram shows that the Stacking-pf method proposed in this paper is more consistent with the real trajectory. It also shows that the Stacking-pf algorithm is superior in nonlinear, non-Gaussian systems, so it has a wide range of applications.

E1KOBZ_2023_v17n4_1200_f0010.png 이미지

Fig. 10. Positioning track comparison chart​​​​​​​

The number of wireless access points is also an important factor in positioning accuracy. Therefore, we further test the localization performance by reducing the number of wireless access points to three. The same arrangement is made on the walls around the room to create a new fingerprint data set. Fig 11 shows the performance comparison of the estimation error cumulative distribution function (CDF) based on Stacking-pf method and KNN, Random Forest, GBDT, Xgboost, and LightGBM. The proposed method has good localization performance in the case of few wireless access points.

E1KOBZ_2023_v17n4_1200_f0011.png 이미지

Fig. 11. Location performance comparison chart based on three wireless access points​​​​​​​

5. Conclusion

In this paper, we use fusion algorithms to improve indoor localization accuracy and generalization capability of localization algorithms. Experiments show that the proposed algorithm can effectively improve the localization accuracy and has strong robustness, which can well meet the requirements of accurate positioning in complex and changing indoor environments. The fusion algorithm is to combine the individual models together. So in the face of the new positioning environment, each model gives full play to its own advantages. They can use different angles to observe the new data, and the individual primary models complement each other to improve the accuracy of positioning.The future research work is to propose the classification algorithm based on model stacking to achieve localization in some large multi-floor buildings.

References

  1. Liu F, Liu J, Yin Y, et al., "Survey on WiFi-based indoor positioning techniques," IET communications, vol. 14, no. 9, pp. 1372-1383, 2020.
  2. Pu Q, Zhou M, Zhang F, et al., "Group power constraint based Wi-Fi access point optimization for indoor positioning," KSII Transactions on Internet and Information Systems (TIIS), vol. 12, no. 5, pp. 1951-1972, 2018.
  3. Ninh D B, He J, Trung V T, et al., "An effective random statistical method for Indoor Positioning System using WiFi fingerprinting," Future Generation Computer Systems, vol. 109, pp. 238-248, 2020. https://doi.org/10.1016/j.future.2020.03.043
  4. Zhang G, Sun X, Ren J, et al., "Research on improved indoor positioning algorithm based on WiFi-pedestrian dead reckoning," International Journal of Distributed Sensor Networks, vol. 15, no. 5, pp. 1-10, 2019. https://doi.org/10.1177/1550147719851932
  5. Ninh D B, He J, Trung V T, et al., "An effective random statistical method for Indoor Positioning System using WiFi fingerprinting," Future Generation Computer Systems, vol. 109, pp. 238-248, 2020. https://doi.org/10.1016/j.future.2020.03.043
  6. Zhang H, Hu B, Xu S, et al., "Feature fusion using stacked denoising auto-encoder and GBDT for Wi-Fi fingerprint-based indoor positioning," IEEE Access, vol. 8, pp. 114741-114751, 2020.  https://doi.org/10.1109/ACCESS.2020.3004039
  7. Own C M, Hou J, Tao W, "Signal fuse learning method with dual bands WiFi signal measurements in indoor positioning," IEEE Access, vol. 7, pp. 131805-131817, 2019. https://doi.org/10.1109/ACCESS.2019.2940054
  8. Luo J, Zhang Z, Wang C, et al., "Indoor multifloor localization method based on WiFi fingerprints and LDA," IEEE Transactions on Industrial Informatics, vol. 15, no. 9, pp. 5225-5234, 2019.  https://doi.org/10.1109/TII.2019.2912055
  9. Ma R, Guo Q, Hu C, et al., "An improved WiFi indoor positioning algorithm by weighted fusion," Sensors, vol. 15, no. 9, pp. 21824-21843, 2015. https://doi.org/10.3390/s150921824
  10. Karimi, Hassan A., "Advanced location-based technologies and services," Taylor & Francis, 2013.
  11. Wolpert D H, Stacked generalization, Boston: Springer, 2017, pp.6-10.
  12. Wang Y, Xiu C, Zhang X, et al., "WiFi indoor localization with CSI fingerprinting-based random forest," Sensors, vol. 18, no. 9, pp. 2869, 2018.
  13. Lee S, Kim J, Moon N, "Random forest and WiFi fingerprint-based indoor location recognition system using smart watch," Human-centric computing and information sciences, vol. 9, no. 1, pp. 1-14, 2019. https://doi.org/10.1186/s13673-018-0162-5
  14. Zhang C, Zhang Y, Shi X, et al., "On incremental learning for gradient boosting decision trees," Neural Processing Letters, vol. 50, no. 1, pp. 957-987, 2019. https://doi.org/10.1007/s11063-019-09999-3
  15. Zhang Z, Jung C, "GBDT-MO: gradient-boosted decision trees for multiple outputs," IEEE transactions on neural networks and learning systems, vol. 32, no. 7, pp. 3156-3167, 2021. Article(CrossRef Link) https://doi.org/10.1109/TNNLS.2020.3009776
  16. Barnwal A, Cho H, Hocking T, "Survival regression with accelerated failure time model in XGBoost," Journal of Computational and Graphical Statistics, vol. 31, no. 4, pp. 1292-1302, 2022.  https://doi.org/10.1080/10618600.2022.2067548
  17. Ben Jabeur, S., Stef, N., Carmona, "Bankruptcy prediction using the XGBoost algorithm and variable importance feature engineering," Computational Economics, 61, pp. 715-741, 2023.  https://doi.org/10.1007/s10614-021-10227-1
  18. Tian L, Feng L, Yang L, "Stock price prediction based on LSTM and LightGBM hybrid model," The Journal of Supercomputing, vol. 78, no. 9, pp. 11768-11793, 2022. https://doi.org/10.1007/s11227-022-04326-5
  19. Lu X, Yang K, Liu J, "Indoor collaborative positioning with adaptive particle-pair filtering based on dynamic user pairing," IEEE Access, vol. 7, pp. 5795-5807, 2018.
  20. Hao Z, Dang J, Cai W, "A multi-floor location method based on multi-sensor and WiFi fingerprint fusion," IEEE Access, vol. 8, pp. 223765-223781, 2020. https://doi.org/10.1109/ACCESS.2020.3039394
  21. Cao H, Wang Y, Bi J, "Indoor positioning method using WiFi RTT based on LOS identification and range calibration," ISPRS International Journal of Geo-Information, vol. 9, no. 11, pp. 627, 2020.
  22. Zhang Z, Liu J, Wang L, "An enhanced smartphone indoor positioning scheme with outlier removal using machine learning," Remote Sensing, vol. 13, no. 6, pp. 1106, 2021. 
  23. Zhao Z, Lou Z, Wang R, "I-WKNN: Fast-speed and high-accuracy WIFI positioning for intelligent sports stadiums," Computers & Electrical Engineering, vol. 98, pp. 107619, 2022.