DOI QR코드

DOI QR Code

기계학습 기반 모델을 활용한 시화호의 수질평가지수 등급 예측

WQI Class Prediction of Sihwa Lake Using Machine Learning-Based Models

  • 김수빈 (한국해양과학기술원 해양환경연구센터) ;
  • 이재성 (한국해양과학기술원 해양환경연구센터) ;
  • 김경태 (한국해양과학기술원 해양환경연구센터)
  • KIM, SOO BIN (Marine Environmental Research Center, Korea Institute of Ocean Science & Technology (KIOST)) ;
  • LEE, JAE SEONG (Marine Environmental Research Center, Korea Institute of Ocean Science & Technology (KIOST)) ;
  • KIM, KYUNG TAE (Marine Environmental Research Center, Korea Institute of Ocean Science & Technology (KIOST))
  • 투고 : 2022.05.03
  • 심사 : 2022.05.16
  • 발행 : 2022.05.31

초록

해양환경을 정량적으로 평가하기 위해 수질평가지수(water quality index, WQI)가 사용되고 있다. 우리나라는 해양수산부고시 해양환경기준에 따라 WQI를 5개 등급으로 구분하여 수질을 평가한다. 하지만, 방대한 수질 조사 자료에 대한 WQI 계산은 복잡하고 많은 시간이 요구된다. 이 연구는 기존의 조사된 수질 자료를 활용하여 WQI 등급을 예측할 수 있는 기계학습(machine learning, ML) 기반의 모델을 제안하고자 한다. 특별관리해역인 시화호를 모델링 지역으로 선정하였다. AdaBoost와 TPOT 알고리즘을 모델 훈련을 위해 사용하였으며, 분류 모델 평가 지표(정확도, 정밀도, F1, Log loss)로 모델 성능을 평가하였다. 훈련하기 전, 각 알고리즘 모델의 최적 입력자료 조합을 탐색하기 위해 변수 중요도와 민감도 분석을 수행하였다. 그 결과 저층 용존산소(dissolved oxygen, DO)는 모델의 성능에서 가장 중요한 인자였다. 반면, 표층 용존무기질소(dissolved inorganic nitrogen, DIN)와 표층 용존무기인(dissolved inorganic phosphorus, DIP)은 상대적으로 영향이 적었다. 한편, 최적 모델의 시공간적 민감도와 WQI 등급 별 민감도를 비교한 결과 각 조사 정점 및 시기, 등급 별 모델의 예측 성능이 상이하였다. 결론적으로 TPOT 알고리즘이 모든 입력자료 조합에서 성능이 더 우수하여 충분한 자료로 훈련된 최적 모델은 새로운 수질 조사 자료의 WQI 등급을 정확하게 분류할 수 있을 거라 판단된다.

The water quality index (WQI) has been widely used to evaluate marine water quality. The WQI in Korea is categorized into five classes by marine environmental standards. But, the WQI calculation on huge datasets is a very complex and time-consuming process. In this regard, the current study proposed machine learning (ML) based models to predict WQI class by using water quality datasets. Sihwa Lake, one of specially-managed coastal zone, was selected as a modeling site. In this study, adaptive boosting (AdaBoost) and tree-based pipeline optimization (TPOT) algorithms were used to train models and each model performance was evaluated by metrics (accuracy, precision, F1, and Log loss) on classification. Before training, the feature importance and sensitivity analysis were conducted to find out the best input combination for each algorithm. The results proved that the bottom dissolved oxygen (DOBot) was the most important variable affecting model performance. Conversely, surface dissolved inorganic nitrogen (DINSur) and dissolved inorganic phosphorus (DIPSur) had weaker effects on the prediction of WQI class. In addition, the performance varied over features including stations, seasons, and WQI classes by comparing spatio-temporal and class sensitivities of each best model. In conclusion, the modeling results showed that the TPOT algorithm has better performance rather than the AdaBoost algorithm without considering feature selection. Moreover, the WQI class for unknown water quality datasets could be surely predicted using the TPOT model trained with satisfactory training datasets.

키워드

과제정보

이 연구는 한국해양과학기술원 "2022년 시화호 해양환경개선 연구(PG52961)"의 지원을 받아 수행된 연구입니다. 이 논문을 면밀히 심사를 하여 개선될 수 있게 해 주신 두 분의 심사위원님께 감사를 드립니다.

참고문헌

  1. Abba, S.I., R.A. Abdulkadir, S.S. Sammen, Q.B. Pham, A.A. Lawan, P. Esmaili, A. Malik and N. Al-Ansari, 2022. Integrating feature extraction approaches with hybrid emotional neural networks for water quality index modeling. Applied Soft Computing, 114: 108036. https://doi.org/10.1016/j.asoc.2021.108036
  2. Abba, S.I., S.J. Hadi, S.S. Sammen, S.Q. Salih, R.A. Abdulkadir, Q.B. Pham and Z.M. Yaseen, 2020. Evolutionary computational intelligence algorithm coupled with self-tuning predictive model for water quality index determination. Journal of Hydrology, 587: 124974. https://doi.org/10.1016/j.jhydrol.2020.124974
  3. Asadollah, S.B.H.S., A. Sharafati, D. Motta and Z.M. Yaseen, 2021. River water quality index prediction and uncertainty analysis: A comparative study of machine learning models. Journal of Environmental Chemical Engineering, 9(1): 104599. https://doi.org/10.1016/j.jece.2020.104599
  4. Bui, D.T., K. Khosravi, J. Tiefenbacher, H. Nguyen and N. Kazakis, 2020. Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Science of the Total Environment, 721: 137612. https://doi.org/10.1016/j.scitotenv.2020.137612
  5. Deng, T.N., K.W. Chau and H.F. Duan, 2021. Machine learning based marine water quality prediction for coastal hydro-environment management. Journal of Environmental Management, 284: 112051. https://doi.org/10.1016/j.jenvman.2021.112051
  6. Gazzaz, N.M., M.K. Yusoff, A.Z. Aris, H. Juahir and M.F. Ramli, 2012. Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Marine Pollution Bulletin, 64(11): 2409-2420. https://doi.org/10.1016/j.marpolbul.2012.08.005
  7. Gharibi, H., A.H. Mahvi, R. Nabizadeh, H. Arabalibeik, M. Yunesian and M.H. Sowlat, 2012. A novel approach in water quality assessment based on fuzzy logic. Journal of Environmental Management, 112: 87-95. https://doi.org/10.1016/j.jenvman.2012.07.007
  8. Guo, J. and J.H.W. Lee, 2021. Development of predictive models for "very poor" beach water quality gradings using class-imbalance learning. Environ. Sci. Technol., 55: 14990-15000. https://doi.org/10.1021/acs.est.1c03350
  9. Hameed, M., S.S. Sharqi, Z.M. Yaseen, H.A. Afan, A. Hussain and A. Elshafie, 2017. Application of artificial intelligence (AI) techniques in water quality index prediction: a case study in tropical region, Malaysia. Neural Computing & Applications, 28: 893-905. https://doi.org/10.1007/s00521-016-2404-7
  10. Ho, J.Y., H.A. Afan, A.H. El-Shafie, S.B. Koting, N.S. Mohd, W.Z.B. Jaafar, L.S. Hin, M.A. Malek, A.N. Ahmed, W.H.M.W. Mohtar, A. Elshorbagy and A. El-Shafie, 2019. Towards a time and cost effective approach to water quality index class prediction. Journal of Hydrology, 575: 148-165. https://doi.org/10.1016/j.jhydrol.2019.05.016
  11. Imani, M., M.M. Hasan, L.F. Bittencourt, K. McClymont and Z. Kapelan, 2021. A novel machine learning application: Water quality resilience prediction model. Science of the Total Environment, 768: 144459. https://doi.org/10.1016/j.scitotenv.2020.144459
  12. Jang, E.N., J.H. Im, S.H. Ha, S.Y. Lee and Y.G. Park, 2016. Estimation of water quality index for coastal areas in Korea using GOCI satellite data based on machine learning approaches. Korean Journal of Remote Sensing, 32(3): 221-234. https://doi.org/10.7780/KJRS.2016.32.3.2
  13. Jeon, S.B., H.Y. Oh and M.H. Jeong, 2020. Estimation of sea water quality level using machine learning. J. Korean Society for Geospatial Information Sci., 28(4): 145-152. https://doi.org/10.7319/kogsis.2020.28.4.145
  14. Khozani, Z.S., M. Iranmehr and W.H.M.W. Mohtar, 2022. Improving Water Quality Index prediction for water resources management plans in Malaysia: application of machine learning techniques. Geocarto International, Early Access.
  15. Kouadri, S., A. Elbeltagi, A.M.T. Islam and S. Kateb, 2021. Performance of machine learning methods in predicting water quality index based on irregular data set: application on Illizi region(Algerian southeast). Applied Water Science, 11(12): 190. https://doi.org/10.1007/s13201-021-01528-9
  16. Lee, Y., J.K. Kim, S. Jung, J. Eum, C. Kim and B. Kim, 2014. Development of a water quality index model for lakes and reservoirs. Paddy and Water Environment, 12: 19-28. https://doi.org/10.1007/s10333-014-0450-2
  17. Li, J., H.A. Abdulmohsin, S.S. Hasan, K.M. Li, B. Al-Khateeb, M.I. Ghareb and M.N. Mohammed, 2019. Hybrid soft computing approach for determining water quality indicator: Euphrates River. Neural Computing & Applications, 31(3): 827-837. https://doi.org/10.1007/s00521-017-3112-7
  18. MOF (Ministry of Oceans and Fisheries), 2021. Project to improve the marine environment of Sihwa Lake.
  19. Olson, R.S., R.J. Urbanowicz, P.C. Andrews, N.A. Lavender, L.C. Kidd and J.H. Moore, 2016. Automating biomedical data science through tree-based pipeline optimization. Applications of Evolutionary Computation, Evoapplications 2016, PT I, 9597: 123-137.
  20. Prasad, D.V.V., P.S. Kumar, L.Y. Venkataramana, G. Prasannamedha, S. Harshana, S.J. Srividya, K. Harrinei, S. Indraganti, 2021. Automating water quality analysis using ML and auto ML techniques. Environmental Research, 202: 111720. https://doi.org/10.1016/j.envres.2021.111720
  21. Ra, K.T., J.K. Kim, E.S. Kim, K.T. Kim, J.M. Lee, S.K. Kim, E.Y. Kim, S.Y. Lee and E.J. Park, 2013. Evaluation of spatial and temporal variations of water quality in Lake Shihwa and outer sea by using water quality index in Korea: A case study of influence of tidal power plant operation. J. the Korean Society for Marine Environment and Energy, 16(2): 102-114. https://doi.org/10.7846/JKOSMEE.2013.16.2.102
  22. Rho, T.K., T.S. Lee, S.R. Lee, M.S. Choi, C. Park, J.H. Lee, J.Y. Lee and S.S. Kim, 2012. Reference values and water quality assessment based on the regional environmental characteristics. The Sea, 17(2): 45-58. https://doi.org/10.7850/JKSO.2012.17.2.045
  23. Tiyasha, T.M. Tung and Z.M. Yaseen, 2021. Deep Learning for Prediction of Water Quality Index Classification: Tropical Catchment Environmental Assessment. Natural Resources Research, 30(6): 4235-4254. https://doi.org/10.1007/s11053-021-09922-5
  24. Tripathi, M. and S.K. Singal, 2019. Use of Principal Component Analysis for parameter selection for development of a novel Water Quality Index: A case study of river Ganga India. Ecological Indicators, 96: 430-436. https://doi.org/10.1016/j.ecolind.2018.09.025
  25. Uddin, M.G., S. Nash and A.I. Olbert, 2021. A review of water quality index models and their use for assessing surface water quality. Ecological Indicators, 122: 107218. https://doi.org/10.1016/j.ecolind.2020.107218
  26. Yaseen, Z.M., M.M. Ramal, L. Diop, O. Jaafar, V. Demir and O. Kisi, 2018. Hybrid Adaptive Neuro-Fuzzy Models for Water Quality Index Estimation. Water Resources Management, 32(7): 2227-2245. https://doi.org/10.1007/s11269-018-1915-7
  27. Zhu, J., H. Zou, S. Rosset and T. Hastie, 2009. Multi-class AdaBoost. Statistics and its Interface, 2: 349-360. https://doi.org/10.4310/SII.2009.v2.n3.a8