DOI QR코드

DOI QR Code

Time Series Data Cleaning Method Based on Optimized ELM Prediction Constraints

  • Guohui Ding ( School of Computer Science, Shenyang Aerospace University) ;
  • Yueyi Zhu ( School of Computer Science, Shenyang Aerospace University) ;
  • Chenyang Li ( School of Computer Science, Shenyang Aerospace University) ;
  • Jinwei Wang ( Beijing Aerospace Ares Equipment Installation Co. Ltd.) ;
  • Ru Wei ( School of Computer Science, Shenyang Aerospace University) ;
  • Zhaoyu Liu ( School of Computer Science, Shenyang Aerospace University)
  • Received : 2021.08.02
  • Accepted : 2022.08.07
  • Published : 2023.04.30

Abstract

Affected by external factors, errors in time series data collected by sensors are common. Using the traditional method of constraining the speed change rate to clean the errors can get good performance. However, they are only limited to the data of stable changing speed because of fixed constraint rules. Actually, data with uneven changing speed is common in practice. To solve this problem, an online cleaning algorithm for time series data based on dynamic speed change rate constraints is proposed in this paper. Since time series data usually changes periodically, we use the extreme learning machine to learn the law of speed changes from past data and predict the speed ranges that change over time to detect the data. In order to realize online data repair, a dual-window mechanism is proposed to transform the global optimal into the local optimal, and the traditional minimum change principle and median theorem are applied in the selection of the repair strategy. Aiming at the problem that the repair method based on the minimum change principle cannot correct consecutive abnormal points, through quantitative analysis, it is believed that the repair strategy should be the boundary of the repair candidate set. The experimental results obtained on the dataset show that the method proposed in this paper can get a better repair effect.

Keywords

References

  1. M. Volkovs, F. Chiang, J. Szlichta, and R. J. Miller, "Continuous data cleaning," in Proceedings of 2014 IEEE 30th International Conference on Data Engineering, Chicago, IL, 2014, pp. 244-255. 
  2. M. S. Grewal, A. P. Andrews, and C. G. Bartone, "Kalman filtering," in Global Navigation Satellite Systems, Inertial Navigation, and Integration, 4th ed. Hoboken, NJ: John Wiley & Sons Inc., 2020, pp. 355-417. 
  3. A. Harvey, K. Laskey, and K. C. Chang, "Machine learning applications for sensor tasking with non-linear filtering," 2021 [Online]. Available: https://www.researchgate.net/publication/350358429_Machine_Learning_Applications_for_Sensor_Tasking_with_Non-Linear_Filtering. 
  4. A. Nielsen, Practical Time Series Analysis: Prediction with Statistics and Machine Learning. Sebastopol, CA: O'Reilly Media, 2019.
  5. H. Li, "Time works well: dynamic time warping based on time weighting for time series data mining," Information Sciences, vol. 547, pp. 592-608, 2021.  https://doi.org/10.1016/j.ins.2020.08.089
  6. M. H. P. Swari, I. P. S. Handika, and I. K. S. Satwika, "Comparison of simple moving average, single and modified single exponential smoothing," in Proceedings of 2021 IEEE 7th Information Technology International Seminar (ITIS), Surabaya, Indonesia, 2021, pp. 1-5. 
  7. W. Li, L. Li, Z. Li, and M. Cui, "Statistical relational learning based automatic data cleaning," Frontiers of Computer Science, vol. 13, no. 1, pp. 215-217, 2019. https://doi.org/10.1007/s11704-018-7066-4
  8. H. Ren, B. Xu, Y. Wang, C. Yi, C. Huang, X. Kou, et al. "Time-series anomaly detection service at Microsoft," in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, 2019, pp. 3009-3017. 
  9. M. Khatami and F. Akbarzadeh, "Algorithms for segmenting time series," Global Analysis and Discrete Mathematics, vol. 3, no. 1, pp. 65-73, 2018. 
  10. X. Wang and C. Wang, "Time series data cleaning with regular and irregular time intervals," 2020 [Online]. Available: https://arxiv.org/abs/2004.08284. 
  11. A. A. Alzou'bi and K. H. Gan, "Discovering informative features in large-scale landmark image collection," Journal of Information Science, vol. 48, no. 2, pp. 237-250, 2022.  https://doi.org/10.1177/0165551520950653
  12. X. Wang and C. Wang, "Time series data cleaning: a survey," IEEE Access, vol. 8, pp. 1866-1881, 2019. https://doi.org/10.1109/ACCESS.2019.2962152
  13. M. M. Liu, Q. C. Hu, J. F. Guo, and J. Chen, "Link prediction algorithm for signed social networks based on local and global tightness," Journal of Information Processing Systems, vol. 17, no. 2, pp. 213-226, 2021.  https://doi.org/10.3745/JIPS.04.0210
  14. R. P. Shetty, A. Sathyabhama, and P. S. Pai, "An efficient online sequential extreme learning machine model based on feature selection and parameter optimization using cuckoo search algorithm for multi-step wind speed forecasting," Soft Computing, vol. 25, pp. 1277-1295, 2021. https://doi.org/10.1007/s00500-020-05222-x
  15. G. B. Huang, Q. Y. Zhu, and C. K. Siew, "Extreme learning machine: a new learning scheme of feedforward neural networks," in Proceedings of 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), Budapest, Hungary, 2004, pp. 985-990. 
  16. S. Huang, B. Wang, Y. Chen, G. Wang, and G. Yu, "An efficient parallel method for batched OS-ELM training using MapReduce," Memetic Computing, vol. 9, pp. 183-197, 2017.  https://doi.org/10.1007/s12293-016-0190-5
  17. R. Hecht-Nielsen, "Theory of the backpropagation neural network," in Proceedings of the International Joint Conference on Neural Networks, Washington, DC, 1989. 
  18. D. Lahoz, B. Lacruz, and P. M. Mateo, "A multi-objective micro genetic ELM algorithm," Neurocomputing, vol. 111, pp. 90-103, 2013.  https://doi.org/10.1016/j.neucom.2012.11.035
  19. S. Song, A. Zhang, J. Wang, and P. S. Yu, "SCREEN: stream data cleaning under speed constraints," in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 2015, pp. 827-841. 
  20. P. Bohannon, W. Fan, M. Flaster, and R. Rastogi, "A cost-based model and effective heuristic for repairing constraints by value modification," in Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, MD, 2005, pp. 143-154. 
  21. J. Van den Broeck and L. T. Fadnes, "Data cleaning," Epidemiology: Principles and Practical Guidelines. Dordrecht, Netherlands: Springer, 2013, pp. 389-399. 
  22. D. Cervo and M. Allen, Master Data Management in Practice: Achieving True Customer MDM. Hoboken, NJ: John Wiley & Sons, 2011. 
  23. H. Liu, A. K. Tk, J. P. Thomas, and X. Hou, "Cleaning framework for bigdata: an interactive approach for data cleaning," in Proceedings of 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService), Oxford, UK, 2016, pp. 174-181. 
  24. M. S. Chen, J. Han, and P. S. Yu, "Data mining: an overview from a database perspective," IEEE Transactions on Knowledge and data Engineering, vol. 8, no. 6, pp. 866-883, 1996.  https://doi.org/10.1109/69.553155
  25. L. Wang, L. D. Xu, Z. Bi, and Y. Xu, "Data cleaning for RFID and WSN integration," IEEE Transactions on Industrial Informatics, vol. 10, no. 1, pp. 408-418, 2014.  https://doi.org/10.1109/TII.2013.2250510
  26. X. Shen, X. Fu, and C. Zhou, "A combined algorithm for cleaning abnormal data of wind turbine power curve based on change point grouping algorithm and quartile algorithm," IEEE Transactions on Sustainable Energy, vol. 10, no. 1, pp. 46-54, 2019. https://doi.org/10.1109/TSTE.2018.2822682