DOI QR코드

DOI QR Code

Short-term Predictive Models for Influenza-like Illness in Korea: Using Weekly ILI Surveillance Data and Web Search Queries

한국 인플루엔자 의사환자 단기 예측 모형 개발: 주간 ILI 감시 자료와 웹 검색 정보의 활용

  • Jung, Jae Un (Department of Management Information Systems, Dong-A University)
  • Received : 2018.08.01
  • Accepted : 2018.09.20
  • Published : 2018.09.28

Abstract

Since Google launched a prediction service for influenza-like illness(ILI), studies on ILI prediction based on web search data have proliferated worldwide. In this regard, this study aims to build short-term predictive models for ILI in Korea using ILI and web search data and measure the performance of the said models. In these proposed ILI predictive models specific to Korea, ILI surveillance data of Korea CDC and Korean web search data of Google and Naver were used along with the ARIMA model. Model 1 used only ILI data. Models 2 and 3 added Google and Naver search data to the data of Model 1, respectively. Model 4 included a common query used in Models 2 and 3 in addition to the data used in Model 1. In the training period, the goodness of fit of all predictive models was higher than 95% ($R^2$). In predictive periods 1 and 2, Model 1 yielded the best predictions (99.98% and 96.94%, respectively). Models 3(a), 4(b), and 4(c) achieved stable predictability higher than 90% in all predictive periods, but their performances were not better than that of Model 1. The proposed models that yielded accurate and stable predictions can be applied to early warning systems for the influenza pandemic in Korea, with supplementary studies on improving their performance.

구글의 인플루엔자 의사환자(ILI) 예측 서비스 시작 이래로 웹 검색 정보를 활용한 ILI 예측 연구들이 급속도로 확산되고 있는 가운데, 본 연구는 ILI 자료와 웹 검색 정보를 활용한 한국 ILI 단기 예측 모형을 개발해 성능을 평가해 보고자 한다. 한국에 특화된 ILI 예측 모형 개발을 위해 한국질병관리본부의 ILI 감시 자료와 구글 및 네이버의 한국어 검색정보를 ARIMA 모형과 함께 사용하였다. 모형1은 ILI 자료만 사용하였으며, 모형 2와 3은 모형1에 구글과 네이버의 검색자료를 각각 추가하였다. 모형4는 모형 2와 3의 공통 검색어를 모형1에 추가하였다. 모형 훈련기간 동안 모든 예측모형들이 95%($R^2$) 이상의 높은 적합도를 보였으며, 예측기간 1과 2에서 모형1이 가장 우수한 예측력(99.98%, 96.94%)을 보였다. 모형 3(a)와 4(b, c)는 전체 예측기간에서 90% 이상의 안정적인 예측력을 보였지만, 모형1의 성능에는 미치지 못하였다. 본 연구에서 정확하고 안정적인 예측력을 보인 모형들은 성능개선에 관한 보완적 연구와 더불어 국내 인플루엔자 유행 조기경보시스템에 활용 가능하다.

Keywords

References

  1. Google Flu Trends (Online). https://www.google.org/ flutrends/about
  2. J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski & L. Brilliant. (2009). Detecting Influenza Epidemics Using Search Engine Query Data. Nature, 457, 1012-1014. DOI: 10.1038/nature07634
  3. A. Alessa & M. Faezipour. (2018). A Review of Influenza Detection and Prediction through Social Netweorking Sites. Theoretical Biology & Medical Modelling, 15(1), 2. DOI : 10.1186/s12976-017-0074-5
  4. D. Lazer, R. Kennedy, G. King & A. Vespignani. (2014). The Parable of Google Flu: Traps in Big Data Analysis. Science, 343(6176), 1203-1205. DOI: 10.1126/science.1248506
  5. J. S. Lee, S. H. Park, J. W. Moon, J. Lee, Y. G. Park & Y. K. Roh. (2011). Modeling for Estimating Influenza Patients from ILI Surveillance Data in Korea. Public Health and Research Perspectives, 2(2), 89-93. DOI : 10.1016/j.phrp.2011.08.001
  6. WHO Surveillance Case Definitions for ILI and SARI. World Health Organization(Online). http://www.who.int/ influenza/surveillance_monitoring/ili_sari_surveillance_case _definition/en/
  7. H. Archrekar, A. Gandhe & R. Lazarus. (2011). Predicting Flu Trends using Twitter data. 2011 IEEE Conference on Computer Communications Workshops. (pp. 702-707). Shanghai : IEEE.
  8. H. Woo, Y. Cho, E. Shim, J. K. Lee, C. G. Lee & S. H. Kim. (2016). Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Medical Data in South Korea. Journal of Medical Internet Research, 18(7), e177. DOI : 10.2196/jmir.4955
  9. C. M. Kwon, S. W. Hwang & J. U. Jung. (2014). Monitoring Seasonal Influenza Epidemics in Korea through Query Search. Journal of the Korea Society for Simulation, 23(4), 31-39. DOI : 10.9709/JKSS.2014.23.4.031
  10. Korea Centers for Disease Control & Prevention (Online). http://www.cdc.go.kr/CDC/eng/main.jsp
  11. G. E. P. Box, G. M. Jenkins, G. C. Reinsel & G. M. Ljung. (2015). Time Series Analysis: Forecasting and Control. Hoboken : Wiley.
  12. Summary of Rules for Identifying ARIMA Models (Online). https://people.duke.edu/-rnau/arimrule.htm
  13. R. Hyundman et al. (2018). Package 'forecast'. R Project(Online). https://cran.r-project.org/web/packages/ forecast/forecast.pdf
  14. Introduction to ARIMA: Nonseasonal Models(Online). https://faculty.fuqua.duke.edu/-rnau/Decision411_2007/411arim.htm#arima110
  15. E. A. Mohammed, C. Naugler & B. H. Far. (2015). Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology: Algorithms and Software Tools. MA : Morgan Kaufmann.
  16. G. N. Boshnakov & J. Halliday. (2018). Package 'sarima'. R Project(Online). https://cran.r-project.org/web/packages/sarima/sarima.pdf
  17. Google Trends(Online). https://trends.google.com
  18. Naver DataLab(Online). https://datalab.naver.com/keyword/trendSearch.naver
  19. D. L. J. Alexander, A. Tropsah & D. A. Winkler. (2015). Beware of R2: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models. Journal of Chemical Information and Modeling, 55(7), 1316-1322. DOI : 10.1021/acs.jcim.5b00206
  20. T. Chai & R. R. Draxler. (2014). Root Mean Square Error (RMSE) or Mean Absolute Error (MAE)? - Arguments against Avoiding RMSE in the Literature. Geoscientific Model Development, 7, 1247-1250. DOI : 10.5194/gmd-7-1247-2014