DOI QR코드

DOI QR Code

Research on Outlier and Missing Value Correction Methods to Improve Smart Farm Data Quality

스마트팜 데이터 품질 향상을 위한 이상치 및 결측치 보정 방법에 관한 연구

  • Sung-Jae Lee ;
  • Hyun Sim (Dept. Smart Agriculture, Sunchon National University)
  • 이성재 (순천대학교 스마트농업전공) ;
  • 심현 (국립순천대학교 스마트농업전공학과)
  • Received : 2024.08.27
  • Accepted : 2024.10.12
  • Published : 2024.10.31

Abstract

This study aims to address the issues of outliers and missing values in AI-based smart farming to improve data quality and enhance the accuracy of agricultural predictive activities. By utilizing real data provided by the Rural Development Administration (RDA) and the Korea Agency of Education, Promotion, and Information Service in Food, Agriculture, Forestry, and Fisheries (EPIS), outlier detection and missing value imputation techniques were applied to collect and manage high-quality data. For successful smart farm operations, an IoT-based AI automatic growth measurement model is essential, and achieving a high data quality index through stable data preprocessing is crucial. In this study, various methods for correcting outliers and imputing missing values in growth data were applied, and the proposed preprocessing strategies were validated using machine learning performance evaluation indices. The results showed significant improvements in model performance, with high predictive accuracy observed in key evaluation metrics such as ROC and AUC.

본 연구는 AI 기반 스마트팜에서 발생하는 이상치 및 결측치 문제를 해결하여 데이터 품질을 향상시키고, 농업 예측 활동의 정확도를 높이는 것을 목표로 한다. 농진청·농정원에서 제공한 실제 데이터를 활용하여, 이상치 탐지 및 결측치 보정 기법을 적용함으로써 양질의 데이터를 수집하고 관리하고자 하였다. 성공적인 스마트팜 운영을 위해서는 IoT 기반의 AI 자동 생육 측정 모델이 필요하며, 이를 위해 안정적인 데이터 전처리를 통해 높은 데이터 품질 지수를 달성하는 것이 필수적이다. 본 연구에서는 생육 데이터의 이상치 및 결측치를 보정하는 다양한 방법을 적용하였으며, 제시된 데이터 전처리 방안을 머신러닝 기법을 통해 성능 평가 지수로 검증하였다. 연구 결과, 이상치 및 결측치 보정 방법을 적용한 결과 모델 성능이 크게 향상되었고, ROC와 AUC와 같은 평가 지표에서 높은 예측 정확도를 확인할 수 있었다.

Keywords

Acknowledgement

본 논문은 정부(과학기술정보통신부)의 재원으로 정보통신기획평가원의 지원을 받아 수행된 지역지능화혁신인재양성사업임(IITP-2024-RS-2020-II201489).

References

  1. A. B. B. Torres, A. R. da Rocha, T. L. Coelho da Silva, J. N. de Souza, and R. S. Gondim, "Multilevel data fusion for the internet of things in smart agriculture," Computers and Electronics in Agriculture, vol. 171, 2020, pp. 105309. 
  2. S. Mandic-Rajcevic and C. Colosio, "Methods for the Identification of Outliers and Their Influence on Exposure Assessment in Agricultural Pesticide Applicators: A Proposed Approach and Validation Using Biological Monitoring," Toxics, vol. 7, no. 3, 2019, pp. 37. DOI: 10.3390/toxics7030037. 
  3. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD-96), 1996. DOI: 10.5555/3001460.3001507. 
  4. S. Kim, "Efficiency of Imputation Methods in Generalized Estimating Equations," Agriculture, vol. 7, no. 3, , Aug. 2018, pp. 56-59. 
  5. K.-R. Shon, "Research of Quality Improvement by Factors Analysis Data Quality Problem: Focus on National R&D Information Linking Structure," Journal of the Korea Contents Association, vol. 9, no. 1, 2009, pp. 23-28. 
  6. S. Lee, "Comparison of Algorithms for the Missing Data Imputation Methods," Report, 2020, pp. 44-45.
  7. Kakao, "Kakao AI Report," Report, Oct. 2017, pp. 1-15. 
  8. J. Yu, K. Jung, Y. Chung, and C. Lee, "A Study on the Prediction Model of the Radius of Curvature of the Subtle Feature of the Automotive Parts for Different Forming Conditions," J. Korean Soc. Precis. Eng., vol. 40, no. 1, 2023, pp. 49-55. DOI: 10.7736/JKSPE.022.101. 
  9. W. Chung, O. Moon, S. Park, and E. Hwang, "An Electrical Load Forecasting Model based on GNN Considering Spectral Similarity and Priori Relationship," in Proc. Korean DataBase Conference, 2022, pp. 3-6. https://www.dbsociety.kr/kdbc/kdbc2022/KDBC2022_Proceedings.pdf. 
  10. S. K. Natarajan, P. Shanmurthy, D. Arockiam, B. Balusamy, and S. Selvarajan, "Optimized machine learning model for air quality index prediction in major cities in India," Scientific Reports, vol. 14, Article no. 6795, 2024. DOI: 10.1038/s41598-024-54807-1.