Anomaly Detection in Sensor Data

  • Kim, Jong-Min (Statistics Discipline, Division of Science and Mathematics, University of Minnesota at Morris) ;
  • Baik, Jaiwook (Department of Information Statistics, Korea National Open University)
  • Received : 2017.10.12
  • Accepted : 2018.01.09
  • Published : 2018.03.25

Abstract

Purpose: The purpose of this study is to set up an anomaly detection criteria for sensor data coming from a motorcycle. Methods: Five sensor values for accelerator pedal, engine rpm, transmission rpm, gear and speed are obtained every 0.02 second from a motorcycle. Exploratory data analysis is used to find any pattern in the data. Traditional process control methods such as X control chart and time series models are fitted to find any anomaly behavior in the data. Finally unsupervised learning algorithm such as k-means clustering is used to find any anomaly spot in the sensor data. Results: According to exploratory data analysis, the distribution of accelerator pedal sensor values is very much skewed to the left. The motorcycle seemed to have been driven in a city at speed less than 45 kilometers per hour. Traditional process control charts such as X control chart fail due to severe autocorrelation in each sensor data. However, ARIMA model found three abnormal points where they are beyond 2 sigma limits in the control chart. We applied a copula based Markov chain to perform statistical process control for correlated observations. Copula based Markov model found anomaly behavior in the similar places as ARIMA model. In an unsupervised learning algorithm, large sensor values get subdivided into two, three, and four disjoint regions. So extreme sensor values are the ones that need to be tracked down for any sign of anomaly behavior in the sensor values. Conclusion: Exploratory data analysis is useful to find any pattern in the sensor data. Process control chart using ARIMA and Joe's copula based Markov model also give warnings near similar places in the data. Unsupervised learning algorithm shows us that the extreme sensor values are the ones that need to be tracked down for any sign of anomaly behavior.

Keywords

References

  1. Akyildiz, I. F., Su, W., Sankarasubramaniam, Y., and Cayirci, E. (2002). "Wireless sensor networks: A survey". Computer Networks, Vol. 38, pp. 393-422. https://doi.org/10.1016/S1389-1286(01)00302-4
  2. Amitai, A., Gilad, W., and Lev, F. (2016). "Change and anomaly detection framework for Internet of Things data streams". Intel, https://software.intel.com/en-us/articles/change-and-anomaly-detection-framework-for- internet-of-things-data-streams.
  3. Anderson, T. W. (2011). "The statistical analysis of time series". John Wiley & Sons, Vol. 19.
  4. Anscombe, F. J. (1960). "Rejection of outliers". Technometrics, Vol. 2, No. 2, pp. 123-146. https://doi.org/10.1080/00401706.1960.10489888
  5. Arampatzis, T., Lygeros, J., and Manesis, S. (2005). "A survey of applications of wireless sensors and wireless sensor networks". In Proceedings of the 2005 IEEE international symposium on intelligent control and 13th Mediterranean conference on control and automation, Limassol, Cyprus, 27-29 June 2005, pp. 719-724.
  6. Atzori, L., Iera, A., and Morabito, G. (2010). "The Internet of Things: A survey". Computer Networks, Vol. 54, pp. 2787-2805. https://doi.org/10.1016/j.comnet.2010.05.010
  7. Bishop, C. M. (2006). "Pattern recognition and machine learning". Springer: New York, NY.
  8. Bolton, R. J. and Hand, D. J. (2001). "Unsupervised profiling methods for fraud detection". In Proceedings of Credit Scoring and Credit Control VII, Edinburgh, UK, 5-7.
  9. Bulut, A., Singh, A. K., Shin, P., Fountain, T., Jasso, H., Yan, L., and Elgamal, A. (2005). "Real-time non- destructive structural health monitoring using support vector machines and wavelets". In Meyendorf, N., Baakline, G.Y., Michel, B. (Eds.), Proceedings of the SPIE 5770, Advanced Sensor Technologies for Nond- estructive Evaluation and Structural Health Monitoring, pp. 180-189.
  10. Chakraborty, K., Mehrotra, K., Mohan, C. K., and Ranka, S. (1992). "Forecasting the behavior of multi- variate time series using neural networks". Neural networks, Vol. 5, No. 6, pp. 961-970. https://doi.org/10.1016/S0893-6080(05)80092-9
  11. Chandola, V., Banerjee, A., and Kumar, V. (2009). "Anomaly detection: A survey". ACM Computing Surveys (CSUR), Vol. 41, No. 3, pp. 1-58.
  12. Chen, Y. K. (2012). "Challenges and opportunities of internet of things", 17th Asia and South Pacific Design Automation conference, pp. 383-388. http://doi.org/10.1109/ASP-DAC.2012.6164978.
  13. Chui, M., Loffler, M., and Roberts, R. (2010). "The Internet of Things". Mckinsey Quarterly.
  14. Dasu, T., Krishnan, S., Venkatasubramanian, S., and Yi, K. (2006). "An information-theoretic approach to detecting changes in multi-dimensional data stream". In Proceedings of the Symposium on the Interface of Statistics, Computing Science and Applications.
  15. Emura, T., Long, T. H., and Sun, L. H. (2017). "R routines for performing estimation and statistical process control under copula-based time series models". Communications in Statistics-Simulation and Computation, Vol. 46, pp. 3067-3087. https://doi.org/10.1080/03610918.2015.1073303
  16. Fox, A. J. (1972). "Outliers in time series". Journal of the Royal Statistical Society. Series B (Methodological), pp. 350-363.
  17. Fritsch, V., Varoquaux, G., Thyreau, B., Poline, J. B., and Thirion, B. (2012). "Detecting outliers in high-dimensional neuroimaging datasets with robust covariance estimators". Medical Image Analysis, Vol. 16, No. 7, pp. 59-1370.
  18. Glazer, A., Lindenbaum, M., and Markovitch, S. (2012). "Learning high-density regions for a generalized Komogorov-Smirnov Test in high-dimensional data". In Advances in neural information processing systems 25 (NIPS 2012).
  19. Haque, S. A., Rahman, M., and Aziz, S. M. (2015). "Sensor anomaly detection in wireless sensor networks for healthcare". Sensors, Vol. 15, pp. 8764-8786. https://doi.org/10.3390/s150408764
  20. Hauskrecht, M., Batal, I., Valko, M., Visweswaran, S., Cooper, G. F., and Clermont, G. (2013). "Outlier de- tection for patient monitoring and alerting". Journal of Biomedical Informatics, Vol. 46, pp. 47-55. https://doi.org/10.1016/j.jbi.2012.08.004
  21. Hill, D. J., Barbara, S., and Minsker, S. (2010). "Anomaly detection in streaming environmental sensor data: A data-driven modeling approach". Environmental modelling and software, Vol. 25, No. 9, pp. 1014-1022. https://doi.org/10.1016/j.envsoft.2009.08.010
  22. Hill, D. J., Minsker, B. S., and Amir, E. (2007). "Real-time Bayesian anomaly detection for environmental sensor data". In Proceedings of the Congress-International Association for Hydraulic Research, Citeseer, Vol. 32.
  23. John, G. H. (1995). "Robust decision trees: removing outliers from databases". In Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining, pp. 174-179.
  24. Kozuma, R., Kitamura, M., Sakuma, M., and Yokoyama, Y. (1994). "Anomaly detection by neural network models and statistical time series analysis". In Neural Networks 1994. IEEE World Congress on Computer Intelligence, Orlando, FL.
  25. Lipsey, M. W. and Wilson, D. B. (2001). "Practical meta-analysis". Sage publications Thousand Oaks, CA, Vol. 49.
  26. Liu, F., Cheng, X., and Chen, D. (2007). "Insider Attacker Detection in Wireless Sensor Networks". In Proceedings of 26th IEEE International Conference on Computer Communications, Anchorage, AK, USA, 6-12 May 2007, pp. 1937-1945.
  27. Long, T. H. and Emura, T. (2014). "A control chart using copula based Markov chain models". Journal of the Chinese Statistical Association, Vol. 52, pp. 466-496.
  28. Middleton, P., Kjeldsen, P., and Tully, H. (2013). "Forecast: the Internet of Things". Worldwide, Gartner.
  29. Moayedi, H. Z. and Masnadi-Shirazi, M. A. (2008). "ARIMA model for network traffic prediction and anomaly detection". In 2008 International Symposium onInformation Technology, Vol. 4, pp. 1-6.
  30. Mourad, M. and Bertrand-Krajewski, J. (2002). "A method for automatic validation of long time series of data in urban hydrology". Water Science and Technology, Vol. 45, pp. 263-270.
  31. Patcha, A. and Park, J. (2007). "An overview of anomaly detection techniques: Existing solutions and latest technological trends". Computer Networks, Vol. 51, pp. 3448-3470. https://doi.org/10.1016/j.comnet.2007.02.001
  32. Pincombe, B. (2005). "Anomaly detection in time series of graphs using ARMA processes". ASOR BULLETIN, Vol. 24, No. 4, pp. 2-10.
  33. Ramaswamy, S., Rastogi, R., and Shim, K. (2000). "Efficient algorithms of mining outliers from large data sets". In Proceedings of the ACM SIGMOD Conference on Management of Data, Dallas TX, pp. 427-438.
  34. Rassam, M. A., Zainal, A., and Maarof, M. A. (2013). "Advancements of data anomaly detection research in wireless sensor networks: A survey and open Issues". Sensors, Vol. 13, pp. 10087-10122. https://doi.org/10.3390/s130810087
  35. Rousseeuw, P. J. and Leroy, A. M. (2003). "Robust regression and outlier detection". John Wiley & Sons, New York.
  36. Rupert, D. and Matteson, D. S. (2015). "Statistics and data analysis for financial engineering with R examples". Springer.
  37. Salem, O., Guerassimov, A., Mehaoua, A., Marcus, A., and Furht, B. (2013). "Sensor fault and patient anomaly detection and classification in medical wireless sensor networks". In Proceedings of 2013 IEEE International Conference on Communications (ICC), Budapest, Hungary, 9-13 June 2013, pp. 4373-4378.