DOI QR코드

DOI QR Code

Predicting the Number of Confirmed COVID-19 Cases Using Deep Learning Models with Search Term Frequency Data

검색어 빈도 데이터를 반영한 코로나 19 확진자수 예측 딥러닝 모델

  • 정성욱 (서울대학교 언론정보연구소, 성균관대학교)
  • Received : 2023.05.09
  • Accepted : 2023.08.04
  • Published : 2023.09.30

Abstract

The COVID-19 outbreak has significantly impacted human lifestyles and patterns. It was recommended to avoid face-to-face contact and over-crowded indoor places as much as possible as COVID-19 spreads through air, as well as through droplets or aerosols. Therefore, if a person who has contacted a COVID-19 patient or was at the place where the COVID-19 patient occurred is concerned that he/she may have been infected with COVID-19, it can be fully expected that he/she will search for COVID-19 symptoms on Google. In this study, an exploratory data analysis using deep learning models(DNN & LSTM) was conducted to see if we could predict the number of confirmed COVID-19 cases by summoning Google Trends, which played a major role in surveillance and management of influenza, again and combining it with data on the number of confirmed COVID-19 cases. In particular, search term frequency data used in this study are available publicly and do not invade privacy. When the deep neural network model was applied, Seoul (9.6 million) with the largest population in South Korea and Busan (3.4 million) with the second largest population recorded lower error rates when forecasting including search term frequency data. These analysis results demonstrate that search term frequency data plays an important role in cities with a population above a certain size. We also hope that these predictions can be used as evidentiary materials to decide policies, such as the deregulation or implementation of stronger preventive measures.

코로나 19 유행은 인류 생활 방식과 패턴에 큰 영향을 주었다. 코로나 19는 침 방울(비말)은 물론 공기를 통해서도 감염되기 때문에 가능한 대면 접촉을 피하고 많은 사람이 가까이 모이는 장소는 피할 것을 권고하고 있다. 코로나 19 환자와 접촉했거나 코로나 19 환자가 발생한 장소에 있었던 사람이 코로나 19에 감염되었을 것을 염려한다면 구글에서 코로나 19 증상을 찾아볼 것이라고 충분히 예상해 볼 수 있다. 본 연구에서는 과거 독감 감시와 관리에 중요 역할을 했었던 구글 트렌드(Google Trends)를 다시 소환하고 코로나 19 확진자수 데이터와 결합하여 미래의 코로나 19 확진자수를 예측할 수 있을지 딥러닝 모델(DNN & LSTM)을 사용한 탐색적 데이터 분석을 실시하였다. 특히 이 연구에 사용된 검색어 빈도 데이터는 공개적으로 사용할 수 있으며 사생활 침해의 우려도 없다. 심층 신경망 모델(DNN model)이 적용되었을 때 한국에서 가장 많은 인구가 사는 서울(960만 명)과 두 번째로 인구가 많은 부산(340만 명)에서는 검색어 빈도 데이터를 포함하여 예측했을 때 더 낮은 오류율을 기록했다. 이와 같은 분석 결과는 검색어 빈도 데이터가 일정 규모 이상의 인구수를 가진 도시에서 중요한 역할을 할 수 있다는 것을 보여주는 것이다. 우리는 이와 같은 예측이 더 강력한 예방 조치의 실행이나 해제 같은 정책을 결정하는데 근거 자료로 충분히 사용될 수 있을 것으로 믿는다.

Keywords

Acknowledgement

이 논문은 2018년 대한민국 교육부와 한국연구재단 그리고 서울대학교 언론정보연구소의 지원을 받아 수행된 연구임(NRF-2018S1A5B8070398).

References

  1. M. Vallee, "Doing nothing does something: Embodiment and data in the COVID-19 pandemic," Big Data & Society, Vol.7, No.1, pp.1-12, 
  2. Centers for Disease Control and Prevention. How COVID-19 Spreads, 2021. Available at www.cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/how-covid-spreads.html. 
  3. F. Petropoulos and Makridakis, S. "Forecasting the novel coronavirus COVID-19," PloS one, Vol.15, No.3, pp. e0231236, 
  4. J. Stubinger and L. Schneider, "Epidemiology of coronavirus covid-19: Forecasting the future incidence in diferent countries," Healthcare, Vol.8, No.2, pp.99, 
  5. A. Tobias, "Evaluation of the lockdowns for the SARSCoV2 epidemic in Italy and Spain after one month follow up," Science of the Total Environment, Vol.725, pp.138539, 2020. 
  6. Y. Li, M. Liang, X. Yin, X. Liu, M. Hao, Z. Hu, Y. Wang, and L. Jin, "COVID-19 epidemic outside China: 34 founders and exponential growth," Journal of Investigative Medicine, Vol.69, No.1, pp.52-55, 2021.  https://doi.org/10.1136/jim-2020-001491
  7. T. Chakraborty and I. Ghosh "Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: A data-driven analysis," Chaos, Solitons & Fractals, Vol.135, pp.109850, 
  8. M. Perc, N. Gorisek Miksic, M. Slavinec, and A. Stozer, "Forecasting covid-19," Frontiers in Physics, Vol.8, pp.127, 2020. 
  9. S. J. Fong, G. Li, N. Dey, R. G. Crespo, and E. Herrera Viedma, "Finding an accurate early forecasting model from small dataset: A case of 2019-nCoV novel coronavirus outbreak," International Journal of Interactive Multimedia and Artificial Intelligence, Vol.6, No.1, pp.132-140, 2020.  https://doi.org/10.9781/ijimai.2020.02.002
  10. Z. Yang et al., "Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions," Journal of Thoracic Disease, Vol.12, No.3, pp.165, 
  11. B. Pirouz, S. Shaffiee Haghshenas, S. Shaffiee Haghshenas, and P. Piro, "Investigating a serious challenge in the sustainable development process: Analysis of confirmed cases of COVID-19 (new type of coronavirus) through a binary classification using artificial intelligence and regression analysis," Sustainability, Vol.12, No.6, pp.2427, 
  12. L. Qin, Q. Sun, Y. Wang, K. F. Wu, M. Chen, B. C. Shia, and S. Y. Wu, "Prediction of number of cases of 2019 novel coronavirus (COVID-19) using social media search index," International Journal of Environmental Research and Public Health, Vol.17, No.7, pp.2365, 
  13. M. Santillana, E. O. Nsoesie, S. R. Mekaru, D. Scales, and J. S. Brownstein, "Using clinicians' search query data to monitor influenza epidemics," Clinical Infectious Diseases, Vol.59, No.10, pp.1446-1450, 2014a.  https://doi.org/10.1093/cid/ciu647
  14. S. Yang, M. Santillana, and S. C. Kou, "Accurate estimation of influenza epidemics using Google search data via ARGO," Proceedings of the National Academy of Sciences, Vol.112, No.47, pp.14473-14478, 2015.  https://doi.org/10.1073/pnas.1515373112
  15. M. Helft, "Google uses searches to track flu's spread," The New York Times, 11 November, 2008. Available at https://www.nytimes.com/2008/11/12/technology/internet/12flu.html?_r=0#. 
  16. S. Cook, C. Conrad, A. L. Fowlkes, and M. H. Mohebbi, "Assessing Google flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic." PloS one, Vol.6, No.8, pp.e23610, 2011. 
  17. D. Butler, "When Google got flu wrong," Nature News, Vol.494, pp.155-156, 2013.  https://doi.org/10.1038/494155a
  18. M. Santillana, D. W. Zhang, B.M. Althouse, and J. W. Ayers, "What can digital disease detection learn from(an external revision to) Google Flu Trends?" American Journal of Preventive Medicine, Vol.47, No.3, pp.341-347, 2014b.  https://doi.org/10.1016/j.amepre.2014.05.020
  19. World Health Organization, "Transmission of SARS-CoV-2: implications for infection prevention precautions," 2020. Available at https://www. who. int/news-room/commentaries/detail/transmission-of-sars-cov-2-implications-for-infection-prevention-precautions. 
  20. World Health Organization, "Coronavirus disease (COVID19): How is it transmitted?," 2021. Available at https://www.who.int/news-room/q-a-detail/coronavirus-disease-covid-19-how-is-it-transmitted. 
  21. A. de Fatima Cobre et al., "Diagnosis and prediction of COVID-19 severity: Can biochemical tests and machine learning be used as prognostic indicators?," Computers in Biology and Medicine, Vol.134, pp.104531, 2021. 
  22. A. Daoud, R, Kim, and S. V. Subramanian, "Predicting women's height from their socioeconomic status: A machine learning approach," Social Science & Medicine, Vol.238, pp.112486, 
  23. N. L. Bragazzi, H. Dai, G. Damiani, M. Behzadifar, M. Martini, and J. Wu, "How big data and artificial intelligence can help better manage the COVID-19 pandemic," International Journal of Environmental Research and Public Health, Vol.17, No.9, pp.3176, 2020. 
  24. D. W. Seo and S. Y. Shin, "Methods using social media and search queries to predict infectious disease outbreaks," Healthcare Informatics Research, Vol.23, No.4, pp.343-348, 2017.  https://doi.org/10.4258/hir.2017.23.4.343
  25. J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski, and L. Brilliant, "Detecting influenza epidemics using search engine query data." Nature, Vol.457, No.7232, pp.1012-1014, 2009.  https://doi.org/10.1038/nature07634
  26. D. J. McIver and J. S. Brownstein, "Wikipedia usage estimates prevalence of influenzaꠓlike illness in the United States in near real-time," PLoS Computational Biology, Vol.10, No.4, pp.e1003581, 2014. 
  27. E. H. Chan, V, Sahai, C, Conrad, and J. S. Brownstein, "Using web search query data to monitor dengue epidemics: A new model for neglected tropical disease surveillance," PLoS Neglected Tropical Diseases, Vol.5, No.5, pp.e1206, 2011. 
  28. S. Yousefinaghani, R. Dara, S. Mubareka, and S. Sharif, "Prediction of COVID-19 waves using social media and Google search: A case study of the US and Canada," Frontiers in Public Health, Vol.9, pp.656635, 2021. 
  29. S. Ben et al., "Global internet search trends related to gastrointestinal symptoms predict regional COVID-19 outbreaks," Journal of Infection, Vol.84, No.1, pp.56-63, 2022.  https://doi.org/10.1016/j.jinf.2021.11.003
  30. S. Prasanth, U. Singh, A. Kumar, V. A. Tikkiwal, and P. H. Chong, "Forecasting spread of COVID-19 using google trends: A hybrid GWO-deep learning approach," Chaos, Solitons & Fractals, Vol.142, pp.110336, 2021. 
  31. Z. Pan, H. L. Nguyen, H. Abu-Gellban, and Y. Zhang, "Google trends analysis of covid-19 pandemic," In 2020 IEEE International Conference on Big Data (Big Data), IEEE, pp.3438-3446, 2020. 
  32. J. Brownlee, "How to control the stability of training neural networks with the batch size," In: Machine Learning Mastery, 2020. Available at: https://machinelearningmastery.com/how-to-control-the-speed-and-stability-of-training-neural-networks-with-gradient-descent-batch-size/ 
  33. Tensorflow: Recurrent Neural Networks (RNN) with Keras, 2021. Available at: https://www.tensorflow.org/guide/keras/rnn. 
  34. F. Chollet, "Deep learning with Python," Shelter Island: Manning Publications Co., 2017. 
  35. N. A. Zambri, A. Mohamed, and M. Z. C. Wanik, "Performance comparison of neural networks for intelligent management of distributed generators in a distribution system," International Journal of Electrical Power & Energy Systems, Vol.67, pp.179-190, 2015.  https://doi.org/10.1016/j.ijepes.2014.11.005
  36. R. Miotto, L. Li, B. A. Kidd, and J. T. Dudley, "Deep patient: an unsupervised representation to predict the future of patients from the electronic health records." Scientific Reports. Vol.6, No.26094, pp.1-10, 2016.  https://doi.org/10.1038/srep26094
  37. D. Hudgeon and R. Nichol, "Machine learning for business: Using Amazon SageMaker and Jupyter," Shelter Island:Manning Publications Co., 2019.
  38. S. Kim and H. Kim, "A new metric of absolute percentage error for intermittent demand forecasts," International Journal of Forecasting, Vol.32, No.3, pp.669-679, 2016.  https://doi.org/10.1016/j.ijforecast.2015.12.003
  39. J. Lago, F. De Ridder, and B. De Schutter, "Forecasting spot electricity prices: Deep learning approaches and empirical comparison of traditional algorithms," Applied Energy, Vol.221, pp.386-405, 2018.  https://doi.org/10.1016/j.apenergy.2018.02.069
  40. J. L. Leevy, T. M. Khoshgoftaar, and F. Villanustre, "Survey on rnn and crf models for de-identification of medical free text," Journal of Big Data, Vol.7, No.73, pp.1-22, 2020.  https://doi.org/10.1186/s40537-019-0278-0
  41. A. Geron, "Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems," Sebastopol: O'Reilly Media, 2019. 
  42. J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv: 1412.3555, 2014. 
  43. R. Jozefowicz, W. Zaremba, and I. Sutskever, "An empirical exploration of recurrent network architectures," International Conference on Machine Learning, pp. 2342-2350, 2015. 
  44. R. Han, "COVID-19 confirmed after 2 weeks↓" Mathematicians predict...Quarantine authorities cautious. (2022. 12.01). Retrieved 12/28/2022 from https://news.jtbc.co.kr/article/article.aspx?news_id=NB12105367 
  45. F. S. Lu, M. W. Hattab, C. L. Clemente, M. Biggerstaff, and M. Santillana, "Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches," Nature Communications, Vol.10, No.1, pp.1-10, 2019. https://doi.org/10.1038/s41467-018-07882-8
  46. L. Poole, "Seasonal influences on the spread of SARSCoV-2 (COVID19), causality, and forecastabililty," 2020. Available at http://dx.doi.org/10.2139/ssrn.3554746. 
  47. P. Pequeno et al., "Air transportation, population density and temperature predict the spread of COVID-19 in Brazil," PeerJ, Vol.8, pp.e9322, 2020. 
  48. C. Poirier et al., "Influenza forecasting for the French regions by using EHR, web and climatic data sources with an ensemble approach ARGONet," medRxiv: 19009795, 2019.