DOI QR코드

DOI QR Code

Comparative characteristic of ensemble machine learning and deep learning models for turbidity prediction in a river

딥러닝과 앙상블 머신러닝 모형의 하천 탁도 예측 특성 비교 연구

  • Park, Jungsu (Department of Civil and Environmental engineering, Hanbat National University)
  • 박정수 (국립한밭대학교 건설환경공학과)
  • Received : 2020.12.07
  • Accepted : 2021.01.06
  • Published : 2021.02.15

Abstract

The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.

Keywords

Acknowledgement

이 성과는 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(NRF-2020R1G1A1008377).

References

  1. Asrafuzzaman, M., Fakhruddin, A., and Hossain, M.A. (2011). Reduction of turbidity of water using locally available natural coagulants, ISRN Microbiol., 1-6. https://doi.org/10.1155/2014/129580
  2. Ben-Hur, A., Horn, D., Siegelmann, H.T., and Vapnik, V. (2001). Support vector clustering, J. Mach., 2(Dec), 125-137.
  3. Bennett, N.D., Croke, B.F., Guariso, G., Guillaume, J.H., Hamilton, S.H., Jakeman, A.J., and Perrin, C. (2013). Characterising performance of environmental models, Environ. Modell. Softw., 40, 1-20. https://doi.org/10.1016/j.envsoft.2012.09.011
  4. Breiman, L. (2001). Random forests, Mach. Learn, 45, 5-32. https://doi.org/10.1023/A:1010933404324
  5. Chen, T. and Guestrin, C. (2016). "Xgboost: A scalable tree boosting system", In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17 August, San Francisco, CA, USA. Association for computing Machinery.
  6. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation, 1078.
  7. Friedman, J.H. (2001). Greedy function approximation: A gradient boosting machine, Ann. Stat., 29(5), 1189-1232. https://doi.org/10.1214/aos/1013203451
  8. Genuer, R., Poggi, J.M. and Tuleau-Malot, C. (2010). Variable selection using random forests, Pattern Recognit. Lett., 31, 2225-2236. https://doi.org/10.1016/j.patrec.2010.03.014
  9. Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R., and Schmidhuber, J. (2016). LSTM: A search space odyssey, IEEE Trans. Neural Netw., 28(10), 2222-2232.
  10. Hinton, G.E., Osindero, S., and The, Y.W. (2006). A fast learning algorithm for deep belief nets, Neural Comput., 18(7), 1527-1554. https://doi.org/10.1162/neco.2006.18.7.1527
  11. Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory, Neural Comput., 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  12. Hollister, J.W., Milstead, W.B. and Kreakie, B.J. (2016). Modeling lake trophic state: A random forest approach, Ecosphere, 7, e01321. https://doi.org/10.1002/ecs2.1321
  13. Huang, J., Gao, J., and Zhang, Y. (2015). Combination of artificial neural network and clustering techniques for predicting phytoplankton biomass of Lake Poyang, China, Limnol., 16, 179-191. https://doi.org/10.1007/s10201-015-0454-7
  14. Islam, M.Z., Islam, M.M. and Asraf, A. (2020). A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using X-ray images, Inform. Med. Unlocked, 100412. https://doi.org/10.1016/j.imu.2020.100412
  15. Kim, T.Y., and Cho, S.B. (2019). Predicting residential energy consumption using CNN-LSTM neural networks, Energy, 182, 72-81. https://doi.org/10.1016/j.energy.2019.05.230
  16. Kisi, O. (2012). Modeling discharge-suspended sediment relationship using least square support vector machine, J. Hydrol., 456, 110-120. https://doi.org/10.1016/j.jhydrol.2012.06.019
  17. Liu, M., and Lu, J. (2014). Support vector machine-an alternative to artificial neuron network for water quality forecasting in an agricultural nonpoint source polluted river?, Environ. Sci. Pollut. R., 21, 11036-11053. https://doi.org/10.1007/s11356-014-3046-x
  18. Mikolov, T., Kombrink, S., Burget, L., Cernocky, J., and Khudanpur. S., (2011). "Extensions of recurrent neural network language model", In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), 22-27 May, IEEE.
  19. Moriasi, D.N., Arnold, J.G., Van Liew, M.W., Bingner, R.L., Harmel, R.D., and Veith. T.L. (2007). Model evaluation guidelines for systematic quantification of accuracy in watershed simulations, Am. Soc. Agric. Biol. Eng., 50, 885-900.
  20. Pal, M. (2005). Random forest classifier for remote sensing classification, Int. J. Remote. Sens., 26(1), 217-222. https://doi.org/10.1080/01431160412331269698
  21. Park, J. and Lee, H. (2020). Prediction of high turbidity in rivers using LSTM algorithm, J. Korean Soc. Water Wastewater, 34, 35-43. https://doi.org/10.11001/jksww.2020.34.1.035
  22. Park, H.S., Chung, S.W. and Choung, S.A. (2017). Analyzing the effect of an extreme turbidity flow event on the dam reservoirs in North Han River basin, J. Korean Soc. Water Environ., 33, 282-290. https://doi.org/10.15681/KSWE.2017.33.3.282
  23. Park, J., Park, J.H., Choi, J.S., Joo, J.C., Park, K., Yoon, H.C., Park, C.Y., Lee, W.H., and Heo, T.Y. (2020). Ensemble model development for the prediction of a disaster index in water treatment systems, Water, 12, 3195. https://doi.org/10.3390/w12113195
  24. Park, Y., Cho, K.H., Park, J., Cha, S.M. and Kim, J.H. (2015). Development of early-warning protocol for predicting chlorophyll-a concentration using machine learning models in freshwater and estuarine reservoirs, Korea Sci. Total Environ., 502, 31-41. https://doi.org/10.1016/j.scitotenv.2014.09.005
  25. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R. and Dubourg, V. (2011). Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12, 2825-2830.
  26. Shin, Y., Kim, T., Hong, S., Lee, S., Lee, E., Hong, S., Lee, C., Kim, T., Park, M.S. and Park, J. (2020). Prediction of chlorophyll-a concentrations in the Nakdong River using machine learning methods, Water, 12, 1822. https://doi.org/10.3390/w12061822
  27. Singh, K.P., Basant, N., and Gupta, S. (2011). Support vector machines in water quality management, Anal. Chim. Acta., 703, 152-162. https://doi.org/10.1016/j.aca.2011.07.027
  28. Suttle, K.B., Power, M.E., Levine, J.M., and McNeely, C. (2004). How fine sediment in riverbeds impairs growth and survival of juvenile salmonids, Ecol. Appl., 14(4), 969-974. https://doi.org/10.1890/03-5190
  29. United States Geological Survey (USGS). (2011). Water-quality Data for the Russian River Basin, Mendocino and Sonoma Counties, California, 2005-2010, USGS, Report-data series 610.
  30. Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York, Springer2-Verlag.
  31. Wu, N., Huang, J., Schmalz, B. and Fohrer, N. (2014). Modeling daily chlorophyll a dynamics in a German lowland river using artificial neural networks and multiple linear regression approaches, Limnol., 15, 47-56. https://doi.org/10.1007/s10201-013-0412-1
  32. XGBoost. Available online: https://xgboost.readthedocs.io/en/latest/build.html (February 15, 2020).
  33. Zaremba, W., Sutskever, I. and Vinyals, O. (2014). Recurrent neural network regularization.
  34. Zhang, D., Qian, L., Mao, B., Huang, C., Huang, B. and Si, Y. (2018). A data-driven design for fault detection of wind turbines using random forests and XGboost, IEEE Access, 6:21020-21031. https://doi.org/10.1109/access.2018.2818678
  35. Zhang, L., Tan, J., Han, D., and Zhu, H. (2017). From machine learning to deep learning: progress in machine intelligence for rational drug discovery, Drug Discov. Today, 22(11), 1680-1685. https://doi.org/10.1016/j.drudis.2017.08.010
  36. Zhou, J., Wang, Y., Xiao, F., Wang, Y. and Sun, L. (2018). Water quality prediction method based on IGRA and LSTM, Water, 10, 1148. https://doi.org/10.3390/w10091148