DOI QR코드

DOI QR Code

A Study on the Prediction Model for Analysis of Water Quality in Gwangju Stream using Machine Learning Algorithm

머신러닝 학습 알고리즘을 이용한 광주천 수질 분석에 대한 예측 모델 연구

  • 정유정 (호남대학교 AI교양대학 ) ;
  • 이정재 (송원대학교 컴퓨터정보학과)
  • Received : 2024.04.18
  • Accepted : 2024.06.12
  • Published : 2024.06.30

Abstract

While the importance of the water quality environment is being emphasized, the water quality index for improving the water quality of urban rivers in Gwangju Metropolitan City is an important factor affecting the aquatic ecosystem and requires accurate prediction. In this paper, the XGBoost and LightGBM machine learning algorithms were used to compare the performance of the water quality inspection items of the downstream Pyeongchon Bridge and upstream BanghakBr_Gwangjucheon1 water systems, which are important points of Gwangju Stream, as a result of statistical verification, three water quality indicators, Nitrogen(TN), Nitrate(NO3), and Ammonia amount(NH3) were predicted, and the performance of the predictive model was evaluated by using RMSE, a regression model evaluation index. As a result of comparing the performance after cross-validation by implementing individual models for each water system, the XGBoost model showed excellent predictive ability.

수질 환경의 중요성이 강조되고 있는 가운데 광주광역시 도시 하천의 수질개선을 위한 수질 지표는 수생 생태계에 영향을 미치는 중요한 요소로 정확한 예측이 필요하다. 본 연구에서는 XGBoost와 LightGBM 머신러닝 알고리즘을 활용하여 광주천의 중요한 지점인 하류 평촌교(PyeongchonBr)와 상류 방학교(BangHakBr_Gwangjucheon1) 수계의 수질 검사 항목 중 통계적 검증 결과 유의미한 항목인 질소(TN), 질산염(NO3), 암모니아 양(NH3) 세 가지 수질 지표를 예측하는 연구를 수행하였고, 회귀 모델 평가 지표인 RMSE를 이용하여 예측 모델의 성능을 평가하였다. 수계별 개별적인 모델을 구현하여 교차 검증 후 성능을 비교한 결과, XGBoost 모델이 뛰어난 예측 능력을 보였다

Keywords

Acknowledgement

본 논문은 2024년도 송원대학교의 학술연구비지원 사업으로 수행되었음.(과제번호:A-2014-49)

References

  1. J. Shin, S. Lee, M. Kim, and H. Park, "Imbalanced data augmentation for algal blooming warning AI model," Journal of the Information Technology and Applied Engineering, vol. 11, no. 1, 2021, pp. 15-23. 
  2. J. Kang, J. Park, S. Han, and K. Kim, "Development of Machine Learning based Flood Depth and Location Prediction Model," Journal of the Korea Institute of Electronic Communication Sciences, vol. 18, no. 1, Feb. 2023, pp. 91-98.  https://doi.org/10.13067/JKIECS.2023.18.1.91
  3. T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System", in Proc. KDD'16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, Aug. 2016, pp. 785-794. 
  4. S. Lee and G. Seok, "A Study Machine Learning Algorithms based on Embedded Processors Using Genetic Algorithm," Journal of the Korea Institute of Electronic Communication Sciences, vol. 19, no. 2, Apr. 2024, pp. 417-426.  https://doi.org/10.13067/JKIECS.2024.19.2.417
  5. The Ministry of Environment, "Operation manual for total water pollution management," Report, 2004. 
  6. G. Kim, "A Study on the Analysis of Water Quality Trends in Rivers Using Nonparametric Statistical Tests," Master's Thesis, Yonsei University, 2015. 
  7. H. Oh, A. Son, and Z. Lee, "Occupational accident prediction modeling and analysis using SHAP", Journal of Digital Contents Society, vol. 22. no. 7, July 2021, pp. 1115-1123.  https://doi.org/10.9728/dcs.2021.22.7.1115
  8. J. Shin, S. Park, and J. Shon, "Prediction of Semiconductor Exposure Process Measurement Results using XGBoost," Journal of the Korean Society for Information Processing, vol. 28, no. 1, May 2021, pp. 505-508. 
  9. H. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine," Institute of Mathematical Statistics, vol. 29, no. 5, Oct. 2001, pp. 1189-1232. 
  10. Y. Lee, H. Kim, D. Lee, C. Lee, and D. Lee, "Validation of Forecasting Performance of Two-Stage Probabilistic Solar Irradiation and Solar Power Forecasting Algorithm using XGBoost," Transactions of the Korean Institute of Electrical Engineers, vol. 68, no. 12, 2019, pp. 1704-1710.  https://doi.org/10.5370/KIEE.2019.68.12.1704
  11. G. Ke, Q. Meng, T. Finely, T. Wnag, W. Chen, W. Ma, Q. Ye, and T. Liu, "Lightgbm: A highly efficient gradient boosting decision tree," Advances in neural information processing systems, vol 30, 2017, pp. 3146-3154. 
  12. X. Ma, J. Sha, D. Wang, Y. Yu, Q. Yang, and X. Niu, "Study on a prediction of P2P network loan default based on the machine learningLightGBM and XGboost algorithms according to different high dimensional data cleaning," Electronic Conference Research and Applications, vol. 31, Sept. 2018, pp. 24-39. 
  13. H. Choi, "Development of Investigation and Management Method for Contamination of Groundwater in Gwangju Metropolitan City," A Study on the Environmental Technology Development Center in Gwangju 2002, Gwangju, Korea, 2002, pp. 60-63. 
  14. J. Kim and H. Yang, "A Study on the Securing of the River Maintenance Water in Gwangju Stream," Technical Report, Aug. 2002. 
  15. J. Kaiser, "Dealing with missing values in data," Journal of Systems Integration, vol. 5, no. 1, Nov. 2014, pp. 42-51. https://doi.org/10.20470/jsi.v5i1.178