DOI QR코드

DOI QR Code

Comparison of Chlorophyll-a Prediction and Analysis of Influential Factors in Yeongsan River Using Machine Learning and Deep Learning

머신러닝과 딥러닝을 이용한 영산강의 Chlorophyll-a 예측 성능 비교 및 변화 요인 분석

  • Sun-Hee, Shim (Department of Environmental Science and Engineering, Ewha Womans University) ;
  • Yu-Heun, Kim (Department of Environmental Science and Engineering, Ewha Womans University) ;
  • Hye Won, Lee (Department of Environmental Science and Engineering, Ewha Womans University) ;
  • Min, Kim (Severe Storm Research Center, Ewha Womans University) ;
  • Jung Hyun, Choi (Department of Environmental Science and Engineering, Ewha Womans University)
  • 심선희 (이화여자대학교 환경공학과) ;
  • 김유흔 (이화여자대학교 환경공학과) ;
  • 이혜원 (이화여자대학교 환경공학과) ;
  • 김민 (이화여자대학교 국지재해기상예측기술센터) ;
  • 최정현 (이화여자대학교 환경공학과)
  • Received : 2022.09.13
  • Accepted : 2022.11.02
  • Published : 2022.11.30

Abstract

The Yeongsan River, one of the four largest rivers in South Korea, has been facing difficulties with water quality management with respect to algal bloom. The algal bloom menace has become bigger, especially after the construction of two weirs in the mainstream of the Yeongsan River. Therefore, the prediction and factor analysis of Chlorophyll-a (Chl-a) concentration is needed for effective water quality management. In this study, Chl-a prediction model was developed, and the performance evaluated using machine and deep learning methods, such as Deep Neural Network (DNN), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). Moreover, the correlation analysis and the feature importance results were compared to identify the major factors affecting the concentration of Chl-a. All models showed high prediction performance with an R2 value of 0.9 or higher. In particular, XGBoost showed the highest prediction accuracy of 0.95 in the test data.The results of feature importance suggested that Ammonia (NH3-N) and Phosphate (PO4-P) were common major factors for the three models to manage Chl-a concentration. From the results, it was confirmed that three machine learning methods, DNN, RF, and XGBoost are powerful methods for predicting water quality parameters. Also, the comparison between feature importance and correlation analysis would present a more accurate assessment of the important major factors.

Keywords

Acknowledgement

이 논문은 2018년도 정부(교육부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임(2018R1A6A1A08025520)

References

  1. Alizamir, M., Heddam, S., Kim, S., and Mehr, A. D. (2021). On the implementation of a novel data-intelligence model based on extreme learning machine optimized by bat algorithm for estimating daily chlorophyll-a concentration: Case studies of river and lake in USA, Journal of Cleaner Production, 285, 124868.
  2. An, Y. J. and Kampbell, D. H. (2003). Monitoring chlorophyll a as a measure of algae in lake Texoma marinas, Bulletin of Environmental Contamination and Toxicology, 70(3), 606-611. https://doi.org/10.1007/s00128-003-0028-y
  3. Bae, S. W. and Yu, J. S. (2018). Predicting the real estate price index using machine learning methods and time series analysis model, Housing Studies Review, 26(1), 107-133. [Korean Literature]
  4. Breiman, L. (2001). Random forests, Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
  5. Cha, Y., Shin, J., and Kim, Y. (2020). Data-driven modeling of freshwater aquatic systems: Status and prospects, Journal of Korean Society on Water Environment, 36(6), 611-620. [Korean Literature] https://doi.org/10.15681/KSWE.2020.36.6.611
  6. Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794.
  7. Choi, M. S., Kim, C. H., Park, H. M., Cheon, M. A., Yoon, H., Namgoong, Y., and Kim, J. H. (2020). Detecting errors in POS-Tagged corpus on XGBoost and cross validation, KIPS Transactions on Software and Data Engineering, 9(7), 221-228. [Korean Literature] https://doi.org/10.3745/KTSDE.2020.9.7.221
  8. Chun, B., Lee, T., Kim, S., Kim, J., Jang, K., Chun, J., and Shin, Y. (2020). Estimation of DNN-based Soil moisture at mountainous regions, Journal of The Korean Society of Agricultural Engineers, 62(5), 93-103. [Korean Literature] https://doi.org/10.5389/KSAE.2020.62.5.093
  9. Chung, D. H., Yun, J. S., and Yang, S. M. (2021). Machine learning for predicting entrepreneurial innovativeness, Asia-Pacific Journal of Business Venturing and Entrepreneurship, 16(3), 73-86. [Korean Literature] https://doi.org/10.16972/APJBVE.16.3.202106.73
  10. Cui, Y., Meng, F., Fu, P., Yang, X., Zhang, Y., and Liu, P. (2021). Application of hyperspectral analysis of chlorophyll a concentration inversion in Nansi lake, Ecological Informatics, 64, 101360.
  11. Dittman, D. J., Khoshgoftaar, T. M., and Napolitano, A. (2015). The effect of data sampling when using random forest on imbalanced bioinformatics data, 2015 IEEE International Conference on Information Reuse and Integration, IEEE, 457-463.
  12. Friedman, J. H. and Popescu, B. E. (2003). Importance sampled learning ensembles, Journal of Machine Learning Research, 94305, 1-32.
  13. Gnauck, A. (2004). Interpolation and approximation of water quality time series and process identification, Analytical and Bioanalytical Chemistry, 380(3), 484-492. https://doi.org/10.1007/s00216-004-2799-3
  14. Ha, J. E., Shin, H. C., and Lee, Z. K. (2017). Korean text classification using randomforest and XGBoost focusing on Seoul metropolitan civil complaint data, The Journal of Bigdata, 2(2), 95-104. [Korean Literature]
  15. Han, J. H., Ko, D. K., and Choe, H. (2019). Predicting and analyzing factors affecting financial stress of household using machine learning: Application of Xgboost, Journal of Consumer Studies, 30(2), 21-43. [Korean Literature] https://doi.org/10.35736/jcs.30.2.2
  16. Han, S. H., Kim, Y. Y., Sung, Y. G., Park, I. B., Cho, D. H., Nam, W. K., and Oh, J. K. (2015). Characteristics of organics and ammonia nitrogen discharged by pollution source from human living, Journal of Korean Society on Water Environment, 31(4), 377-386. [Korean Literature] https://doi.org/10.15681/KSWE.2015.31.4.377
  17. He, Y., Wang, X., and Xu, F. (2022). How reliable is chlorophyll-a as algae proxy in lake environments? New insights from the perspective of n-alkanes, Science of The Total Environment, 836, 155700.
  18. Jeong, J. H., Jeong, Y. C., and Chae, T. Y. (2021). Feature importance of electricity consumption for highly energy demand commercial buildings in cooling season, Journal of The Korean Society of Living Environmental System, 28(1), 29-38. [Korean Literature] https://doi.org/10.21086/ksles.2021.2.28.1.29
  19. Jung, S. Y. and Kim, I. K. (2017a). Analysis of water quality factor and correlation between water quality and Chl-a in middle and downstream weir section of Nakdong river, Journal of Korean Society of Environmental Engineers, 39(2), 89-96. [Korean Literature] https://doi.org/10.4491/KSEE.2017.39.2.89
  20. Jung, S. Y. and Kim, I. K. (2017b). Analysis of the water quality and correlation of impact factors during summer season in changnyeong-haman weir section, Journal of Korean Society of Water and Wastewater, 31(1), 83-91. [Korean Literature] https://doi.org/10.11001/jksww.2017.31.1.083
  21. Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., and Kolehmainen, M. (2004). Methods for imputation of missing values in air quality data sets, Atmospheric Environment, 38(18), 2895-2907. https://doi.org/10.1016/j.atmosenv.2004.02.026
  22. Kang, B. K. and Park, J. (2021). Effect of input variable characteristics on the performance of an ensemble machine learning model for algal bloom prediction, Journal of Korean Society of Water and Wastewater, 35(6), 417-424. [Korean Literature] https://doi.org/10.11001/jksww.2021.35.6.417
  23. Kang, K. H. and Park, H. J. (2019). Study on the effect of training data sampling strategy on the accuracy of the landslide susceptibility analysis using random forest method, Economic and Environmental Geology, 52(2), 199-212. [Korean Literature] https://doi.org/10.9719/EEG.2019.52.2.199
  24. Karaca, Y. and Baleanu, D. (2020). A novel R/S fractal analysis and wavelet entropy characterization approach for robust forecasting based on self-similar time series modeling, Fractals, 28(08), 2040032.
  25. Kim, C. W. and Seo, Y. G. (2020). Design and performance prediction of ultra-low flow hydrocyclone using the random forest method, Journal of the Korean Society of Manufacturing Technology Engineers, 29(2), 83-88. [Korean Literature] https://doi.org/10.7735/ksmte.2020.29.2.83
  26. Kim, G. H., Jung, K. Y., Yoon, J. S., and Cheon, S. U. (2013). Temporal and spatial analysis of water quality data observed in lower watershed of Nam river dam, Journal of the Korean Society of Hazard Mitigation, 13(6), 429-438. [Korean Literature] https://doi.org/10.9798/KOSHAM.2013.13.6.429
  27. Kim, H. I., Lee, Y. S., and Kim, B. (2021). Real-time flood prediction applying random forest regression model in urban areas, Journal of Korea Water Resources Association, 54(spc1), 1119-1130. [Korean Literature] https://doi.org/10.3741/JKWRA.2021.54.S-1.1119
  28. Kim, J., Kim, J., and Seo, D. (2020). Effect of major pollution sources on algal blooms in the Seungchon weir and Juksan weir in the Yeongsan river using EFDC, Journal of Korea Water Resources Association, 53(5), 369-381. [Korean Literature] https://doi.org/10.3741/JKWRA.2020.53.5.369
  29. Kim, K. M. and Ahn, J. H. (2022). Machine learning predictions of chlorophyll-a in the Han river basin, Korea, Journal of Environmental Management, 318, 115636.
  30. Kim, S. H., Park, J. H., and Kim, B. (2021). Prediction of cyanobacteria harmful algal blooms in reservoir using machine learning and deep learning, Journal of Korea Water Resources Association, 54(spc1), 1167-1181. [Korean Literature] https://doi.org/10.3741/JKWRA.2021.54.S-1.1167
  31. Kim, S. W. and Jun, S. H. (2019). AI technology analysis using variable importance of deep learning, Journal of the Korean Institute of Intelligent Systems, 29, 70-75. [Korean Literature] https://doi.org/10.5391/jkiis.2019.29.1.70
  32. Kim, Y., Kwak, G. H., Lee, K. D., Na, S. I., Park, C. W., and Park, N. W. (2018). Performance evaluation of machine learning and deep learning algorithms in crop classification: Impact of hyper-parameters and training sample size, Korean Journal of Remote Sensing, 34(5), 811-827. [Korean Literature] https://doi.org/10.7780/KJRS.2018.34.5.9
  33. Korea Environment Institute (KEI). (2020). Development and application of algal bloom using artificial intelligence deep learning, https://www.kei.re.kr/elibList.es?mid=a10101000000&elibName=researchreport&act=view&c_id=732914 (accessed Dec. 2020)
  34. Korea Meteorological Administration (KMA). (2022). Open MET Data Portal (OMDP), https://data.kma.go.kr/ (accessed Jun. 2022).
  35. Kriegeskorte, N. and Golan, T. (2019). Neural network models and deep learning, Current Biology, 29(7), R231-R236. https://doi.org/10.1016/j.cub.2019.02.034
  36. Kwak, J. (2021). A study on the 3-month prior prediction of Chl-a concentraion in the Daechong lake using hydrometeorological forecasting data, Journal of Wetlands Research, 23(2), 144-153. [Korean Literature] https://doi.org/10.17663/JWR.2021.23.2.144
  37. K-water (2022). My Water, https:/www.water.or.kr/ (accessed Jun. 2022).
  38. Lee, K. T., Kim, M. S., Kim, H. J., and Kim, J. H. (2021). A model to predict occupational safety and health management expenses in construction applying multi-variate regression analysis and deep neural network, Journal of the Architectural Institute of Korea, 37(9), 217-226. [Korean Literature] https://doi.org/10.5659/JAIK.2021.37.9.217
  39. Lee, S. M. and Kim, I. K. (2021). A study on applying random forest and gradient boosting algorithm for Chl-a prediction of Daecheong lake, Journal of Korean Society of Water and Wastewater, 35(6), 507-516. [Korean Literature] https://doi.org/10.11001/jksww.2021.35.6.507
  40. Lee, S. M., Park, K. D., and Kim, I. K. (2020). Comparison of machine learning algorithms for Chl-a prediction in the middle of Nakdong river (focusing on water quality and quantity factors), Journal of Korean Society of Water and Wastewater, 34(4), 277-288. [Korean Literature] https://doi.org/10.11001/jksww.2020.34.4.277
  41. Lee, Y. and Sun, J. (2020). Predicting highway concrete pavement damage using XGBoost, Korean Journal of Construction Engineering and Management, 21(6), 46-55. [Korean Literature] https://doi.org/10.6106/KJCEM.2020.21.6.046
  42. Lee, Y. G., Oh, J. Y., and Kim, G. (2020). Interpretation of load forecasting using explainable artificial intelligence techniques, The Transactions of the Korean Institute of Electrical Engineers, 69(3), 480-485. [Korean Literature] https://doi.org/10.5370/kiee.2020.69.3.480
  43. Lee, Y. J., Jeong, B. K., Shin, Y. S., Kim, S. H., and Shin, K. H. (2013). Determination of the origin of particulate organic matter at the estuary of Youngsan river using stable isotope ratios (δ13C, δ15N), Korean Journal of Ecology and Environment, 46(2), 175-184. [Korean Literature] https://doi.org/10.11614/KSL.2013.46.2.175
  44. Lepot, M., Aubin, J. B., and Clemens, F. H. (2017). Interpolation in time series: An introductive overview of existing methods, their performance criteria and uncertainty assessment, Water, 9(10), 796.
  45. Lim, J. S., Kim, Y. W., Lee, J. H., Park, T. J., and Byun, I. G. (2015). Evaluation of correlation between chlorophyll-a and multiple parameters by multiple linear regression analysis, Journal of Korean Society of Environmental Engineers, 37(5), 253-261. [Korean Literature] https://doi.org/10.4491/KSEE.2015.37.5.253
  46. Liu, X., Feng, J., and Wang, Y. (2019). Chlorophyll a predictability and relative importance of factors governing lake phytoplankton at different timescales, Science of the Total Environment, 648, 472-480. https://doi.org/10.1016/j.scitotenv.2018.08.146
  47. Ma, J., Qin, B., Paerl, H. W., Brookes, J. D., Hall, N. S., Shi, K., and Long, S. (2016). The persistence of cyanobacterial (M icrocystis spp.) blooms throughout winter in lake Taihu, China, Limnology and Oceanography, 61(2), 711-722. https://doi.org/10.1002/lno.10246
  48. Ministry of Environment (ME). (2022). Water Environment Information System (WEIS), https://water.nier.go.kr/ (accessed Jun. 2022).
  49. Muller, A. C. and Guido, S. (2016). Introduction to machine learning with Python: A guide for data scientists, O'Reilly Media, Inc., 386.
  50. Noh, S., Park, H., Choi, H., and Lee, J. (2014). Effect of climate change for cyanobacteria growth pattern in Chudong station of Lake Daechung, Journal of Korean Society on Water Environment, 30(4), 377-385. [Korean Literature] https://doi.org/10.15681/KSWE.2014.30.4.377
  51. Oh, J. Y., Ham, D. H., Lee, Y. G., and Kim, G. (2019). Short-term load forecasting using XGBoost and the analysis of hyperparameters, The Transactions of the Korean Institute of Electrical Engineers, 68, 1073-1078. [Korean Literature] https://doi.org/10.5370/kiee.2019.68.9.1073
  52. Park, H. K., Byeon, M. S., Choi, M. J., and Kim, Y. J. (2008). The effect factors on the growth of phytoplankton and the sources of organic matters in downstream of South-Han river, Journal of Korean Society on Water Environment, 24(5), 556-562. [Korean Literature]
  53. Park, J. (2022). Development of ensemble machine learning model considering the characteristics of input variables and the interpretation of model performance using explainable artificial intelligence, Journal of Korean Society of Water and Wastewater, 36(4), 239-248. [Korean Literature] https://doi.org/10.11001/jksww.2022.36.4.239
  54. Park, Y., Cho, K. H., Park, J., Cha, S. M., and Kim, J. H. (2015). Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea, Science of the Total Environment, 502, 31-41. [Korean Literature] https://doi.org/10.1016/j.scitotenv.2014.09.005
  55. Savitzky, A. and Golay, M. J. (1964). Smoothing and differentiation of data by simplified least squares procedures, Analytical chemistry, 36(8), 1627-1639. https://doi.org/10.1021/ac60214a047
  56. Schuwirth, N., Borgwardt, F., Domisch, S., Friedrichs, M., Kattwinkel, M., Kneis, D., and Vermeiren, P. (2019). How to make ecological models useful for environmental management, Ecological Modelling, 411, 108784.
  57. Seo, K., Na, J. E., Ryu, H. S., and Kim, K. (2018). Characteristics of nitro-nutrients and phytoplankton dynamics in the Yeongsan river after weir construction, Journal of Korean Society on Water Environment, 34(4), 423-430. [Korean Literature] https://doi.org/10.15681/KSWE.2018.34.4.423
  58. Shin, J. K., Kang, B. G., and Hwang, S. J. (2016). Limnological study on spring-bloom of a green algae, eudorina elegans and weirwater pulsed-flows in the midstream (Seungchon weir pool) of the Yeongsan river, Korea, Korean Journal of Ecology and Environment, 49(4), 320-333. [Korean Literature] https://doi.org/10.11614/KSL.2016.49.4.320
  59. Shin, Y., Lee, H., Lee, Y. J., Seo, D. K., Jeong, B., Hong, S., and Heo, T. Y. (2019). The prediction of diatom abundance by comparison of various machine learning methods, Mathematical Problems in Engineering, 2019, 1-13.
  60. Shin, Y., Yu, H., Lee, H., Lee, D., and Park, G. (2015). The change in patterns and conditions of algal blooms resulting from construction of weirs in the Youngsan river: Long-term data analysis, Korean Journal of Ecology and Environment, 48(4), 238-252. [Korean Literature] https://doi.org/10.11614/KSL.2015.48.4.238
  61. Sim, D., Lee, J. Y., Jang, J., and Lee, M. (2022). Prediction of chloride concentration in groundwater on Jeju Island using XGBoost regression machine learning, Journal of the Geological Society of Korea, 55(2), 243-256. [Korean Literature]
  62. Singha, S., Pasupuleti, S., Singha, S. S., Singh, R., and Kumar, S. (2021). Prediction of groundwater quality using efficient machine learning technique, Chemosphere, 276, 130265.
  63. Song, J. J., Kim, B. B., and Hong, S. G. (2015). Study on water quality change of Yeongsan river's upstream, Journal of Korean Society of Environmental Technology, 16(2), 154-159. [Korean Literature]
  64. Tekile, A., Kim, I., and Kim, J. (2015). Mini-review on river eutrophication and bottom improvement techniques, with special emphasis on the Nakdong river, Journal of Environmental Sciences, 30, 113-121. https://doi.org/10.1016/j.jes.2014.10.014
  65. Tyralis, H., Papacharalampous, G., and Langousis, A. (2019). A brief review of random forests for water scientists and practitioners and their recent history in water resources, Water, 11(5), 910.
  66. Wetzel, R. G. and Likens, G. E. (2013). Limnological Analyses, third ed, Springer Science & Business Media.