DOI QR코드

DOI QR Code

적대적 생성 신경망을 활용한 비지도 학습 기반의 대기 자료 이상 탐지 알고리즘 연구

A Study on Atmospheric Data Anomaly Detection Algorithm based on Unsupervised Learning Using Adversarial Generative Neural Network

  • 양호준 (인하대학교 전기컴퓨터공학과) ;
  • 이선우 (인하대학교 전기컴퓨터공학과) ;
  • 이문형 (인하대학교 전기컴퓨터공학과) ;
  • 김종구 (인하대학교 전기컴퓨터공학과) ;
  • 최정무 (인하대학교 컴퓨터공학과) ;
  • 신유미 (인하대학교 컴퓨터공학과) ;
  • 이석채 (인하대학교 행정학과) ;
  • 권장우 (인하대학교 컴퓨터공학과) ;
  • 박지훈 (국립환경과학원 대기환경연구과) ;
  • 정동희 (국립환경과학원 대기환경연구과) ;
  • 신혜정 (국립환경과학원 대기환경연구과)
  • Yang, Ho-Jun (Department of Electric Computer Engineering, Inha University) ;
  • Lee, Seon-Woo (Department of Electric Computer Engineering, Inha University) ;
  • Lee, Mun-Hyung (Department of Electric Computer Engineering, Inha University) ;
  • Kim, Jong-Gu (Department of Electric Computer Engineering, Inha University) ;
  • Choi, Jung-Mu (Department of Computer Engineering, Inha University) ;
  • Shin, Yu-mi (Department of Computer Engineering, Inha University) ;
  • Lee, Seok-Chae (Department of Public Administration, Inha University) ;
  • Kwon, Jang-Woo (Department of Computer Engineering, Inha University) ;
  • Park, Ji-Hoon (Air Quality Research Department, Air Quality Research Division) ;
  • Jung, Dong-Hee (Air Quality Research Department, Air Quality Research Division) ;
  • Shin, Hye-Jung (Air Quality Research Department, Air Quality Research Division)
  • 투고 : 2022.02.20
  • 심사 : 2022.04.20
  • 발행 : 2022.04.28

초록

본 논문에서는 기존에 전문가에 의해서 이루어지던 국가 대기오염 측정망 데이터들의 이상 탐지 작업을 인공지능을 통해 자동화하고자 심층 신경망을 이용한 이상 탐지 모델을 제안하였다. 환경과학원에서 제공받은 기상자료 데이터의 결측치 및 이상치를 분석하여 학습데이터를 생성하였으며 비지도 학습 방식의 BeatGAN 모델에 기반하여 커널 구조 변경과 합성곱 필터층 및 전치 합성곱 필터층의 추가를 통해 새로운 모델을 제안하여 이상 탐지 성능을 높이고자 하였다. 또한 제안하는 모델의 생성적 특징을 활용하여 새로운 데이터를 생성하고 이를 학습에 사용하는 재학습 알고리즘을 구현 및 적용하여 기존 BeatGAN 모델뿐 아니라 다른 비지도 학습 모델인 Iforest, One Class SVM과 비교하였을 때 제안모델의 성능이 가장 높았음을 확인할 수 있었다. 본 연구를 통해 실제 산업현장에서 센서의 이상, 점검 등의 여러 요인으로 인해 학습 데이터가 부족한 상황에서 추가적인 비용없이 과적합을 피하며 제안하는 모델의 이상탐지 성능을 올릴 수 있는 방법을 제시할 수 있었다.

In this paper, We propose an anomaly detection model using deep neural network to automate the identification of outliers of the national air pollution measurement network data that is previously performed by experts. We generated training data by analyzing missing values and outliers of weather data provided by the Institute of Environmental Research and based on the BeatGAN model of the unsupervised learning method, we propose a new model by changing the kernel structure, adding the convolutional filter layer and the transposed convolutional filter layer to improve anomaly detection performance. In addition, by utilizing the generative features of the proposed model to implement and apply a retraining algorithm that generates new data and uses it for training, it was confirmed that the proposed model had the highest performance compared to the original BeatGAN models and other unsupervised learning model like Iforest and One Class SVM. Through this study, it was possible to suggest a method to improve the anomaly detection performance of proposed model while avoiding overfitting without additional cost in situations where training data are insufficient due to various factors such as sensor abnormalities and inspections in actual industrial sites.

키워드

과제정보

This work was supported by a grant from the National Institute of Environmental Research (NIER), funded by the Ministry of Environment (MOE) of the Republic of Korea. This research was supported by the BK21 Four Program funded by the Ministry of Education(MOE, Korea) and National Research Foundation of Korea(NRF).

참고문헌

  1. J. Li, H. Izakian, W. Pedrycz & I. Jamal. (2020). Clustering-based anomaly detection in multivariate time series data. Applied Soft Computing, 100, 106919. DOI : 10.1016/j.asoc.2020.106919
  2. A. Deng & B. Hooi. (2021). Graph Neural Network-Based Anomaly Detection in Multivariate Time Series. Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 5, pp. 4027-4035). ArXiv:2106.06947 DOI : 10.48550/arXiv.2106.06947
  3. J. Zhou et al. (2020). Graph neural networks: A review of methods and applications. AI Open, 1, 57-81. DOI : 10.1016/j.aiopen.2021.01.001
  4. H. Ren et al. (2019). Time-Series Anomaly Detection Service at Microsoft. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (pp.3009-3017). DOI : 10.1145/3292500.3330680
  5. L. Ruthotto & E. Haber. (2021). An Introduction to Deep Generative Modeling. GAMM-Mitteilungen, 44(2), e202100008. ArXiv:2103.05180 DOI : 10.1002/gamm.202100008
  6. Air Korea. (2021). Annual report of the Atmospheric Environment 2020(Online). https://www.airkorea.or.kr/web/detailViewDown?pMENU_NO=125
  7. S. G. K. Patro & K. K. sahu. (2015). Normalization: A Preprocessing Stage. IARJSET, 2(3), 20-22. DOI : 10.17148/IARJSET.2015.2305
  8. B. Zhou, S. Liu, B. Hooi, X. Cheng & J. Ye. (2019). BeatGAN: Anomalous Rhythm Detection using Adversarially Generated Time Series. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, (pp. 4433-4439). DOI : 10.24963/ijcai.2019/616
  9. K. Cho et al. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. ArXiv:1406.1078 DOI : 10.48550/arXiv.1406.1078
  10. J. Davis & M. Goadrich. (2006). The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning - ICML '06, 233-240. DOI : 10.1145/1143844.1143874
  11. F. T. Liu, K. M. Ting & Z.-H. Zhou. (2008). Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining, (pp. 413-422). DOI : 10.1109/ICDM.2008.17
  12. L. M. Manevitz & M. Yousef. (2001). One-Class SVMs for Document Classification. Journal of Machine Learning Research, 2(Dec), 139-154.