DOI QR코드

DOI QR Code

Exploring COVID-19 in mainland China during the lockdown of Wuhan via functional data analysis

  • Li, Xing (Department of Statistics and Finance, University of Science and Technology of China) ;
  • Zhang, Panpan (Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania) ;
  • Feng, Qunqiang (Department of Statistics and Finance, University of Science and Technology of China)
  • Received : 2021.07.16
  • Accepted : 2021.12.14
  • Published : 2022.01.31

Abstract

In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke out in China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional data analysis methods to analyze the collected time series data. The analysis is divided into three parts. First, the functional principal component analysis is conducted to investigate the modes of variation. Second, we carry out the functional canonical correlation analysis to explore the relationship between confirmed and death cases. Finally, we utilize a clustering method based on the Expectation-Maximization (EM) algorithm to run the cluster analysis on the counts of confirmed cases, where the number of clusters is determined via a cross-validation approach. Besides, we compare the clustering results with some migration data available to the public.

Keywords

Acknowledgement

We thank two anonymous reviewers for their thoughtful comments and suggestions that have improved the overall quality of the manuscript.

References

  1. Abraham C, Cornillon PA, Matzner-Lober E, and Molinari N (2003). Unsupervised curve clustering using B-splines. Scandinavian Journal of Statistics, 30, 581-595. https://doi.org/10.1111/1467-9469.00350
  2. Anderson KG, Ranbaut A, Lipkin WI, Holmes EC, and Garry RF (2020). The proximal origin of SARS-CoV-2. Nature Medicine, 26, 450-452. https://doi.org/10.1038/s41591-020-0820-9
  3. Boschi T, Di Iorio J, Testa L, Cremona MA, and Chiaromonte F (2021). Functional data analysis characterizes the shapes of the first COVID-19 epidemic wave in Italy. Scientific Reports, 11, 17054. https://doi.org/10.1038/s41598-021-95866-y
  4. Borveyron C, Celeus G, Murphy TB, and Raftery AE (2019). Model-Based Clustering and Classification for Data Science: With Applications in R, Cambridge University Press, Cambridge, UK.
  5. Baladandayuthapani V, Mallick BK, Hong MY, Lupton JR, Turner ND, and Carroll RJ (2008). Bayesian Hierarchical Spatially Correlated Functional Data Analysis with Application to Colon Carcinogenesis. Biometrics, 64, 64-73. https://doi.org/10.1111/j.1541-0420.2007.00846.x
  6. Burns DM, Houpt JW, Townsend JT, and Endres MJ (2013). Functional principal component analysis of workload capacity functions. Behavior Research Methods, 45, 1048-1057. https://doi.org/10.3758/s13428-013-0333-2
  7. Carroll C, Gajardo A, and Chen Y, et al. (2020) fdapace: Functional Data Analysis and Empirical Dynamics, R package version 0.5.4. https://CRAN.R-project.org/package=fdapace
  8. Carroll C, Bhattacharjee S, and Chen Y, et al. (2020). Time dynamics of COVID-19. Scientific Reports, 10, 21040. https://doi.org/10.1038/s41598-020-77709-4
  9. Chen WC and Maitra R (2019) EMCluster: EM algorithm for model-based clustering of finite mixture Gaussian distribution, R Package. http://cran.r-project.org/package=EMCluster
  10. Chen J, Yan J, and Zhang P (2020). Clustering US states by time series of COVID-19 new case counts with non-negative matrix factorization. arXiv:2011.14412
  11. Crawford L, Monod A, Chen AX, Mukherjee S, and Rabadan R (2020). Predicting clinical outcomes in glioblastoma: An application of topological and functional data analysis. Journal of the American Statistical Association, 115, 1139-1150. https://doi.org/10.1080/01621459.2019.1671198
  12. Dempster AP, Laird NM, and Rubin DB (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39, 1-22. https://doi.org/10.2307/2347807
  13. Fan J and Gijbels I (1996). Local Polynomial Modelling and Its Applications, Chapman and Hall, London, UK.
  14. Floriello D and Vitelli V (2017). Sparse clustering of functional data. Journal of Multivariate Analysis, 154, 1-18. https://doi.org/10.1016/j.jmva.2016.10.008
  15. Garcia-Escudero LA and Gordaliza A (2005). A proposal for robust curve clustering. Journal of Classification, 22, 185-201. https://doi.org/10.1007/s00357-005-0013-8
  16. Handcock MS, Raftery AE, and Tantrum J (2007). Model-based clustering for social networks (with discussion). Journal of the Royal Statistical Society. Series A (Statistics in Society), 170, 301-354. https://doi.org/10.1111/j.1467-985X.2007.00471.x
  17. He G, Muller H-G, and Wang J-L (2004). Methods of canonical analysis for functional data. Journal of Statistical Planning and Inference, 122, 141-159. https://doi.org/10.1016/j.jspi.2003.06.003
  18. Hyndman RJ and Ullah S (2007). Robust forecasting of mortality and fertility rates: A functional data approach. Computational Statistics & Data Analysis, 51, 4942-4956. https://doi.org/10.1016/j.csda.2006.07.028
  19. James GM and Sugar CA (2003). Clustering for sparsely sampled functional data. Journal of the American Statistical Association, 8, 397-408. https://doi.org/10.1198/016214503000189
  20. Lee G and Scott C (2012). EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Computational Statistics & Data Analysis, 56, 2816-2829. https://doi.org/10.1016/j.csda.2012.03.003
  21. Leng X and Muller H-G (2006). Classification using functional data analysis for temporal gene expression data. Biostatistics, 22, 68-76.
  22. Li Y and Hsing T (2010). Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. The Annals of Statistics, 38, 3321-3351. https://doi.org/10.1214/10-AOS813
  23. Li Y, Wang N, and Carroll RJ (2013). Selecting the number of principal components in functional data. Journal of the American Statistical Association, 108, 1284-1294. https://doi.org/10.1080/01621459.2013.788980
  24. McLachlan GJ and Peel D (2000). Finite Mixture Models, New York, Wiley-Interscience.
  25. Newman MEJ (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America, 103, 8577-8582. https://doi.org/10.1073/pnas.0601602103
  26. Ouyang G, Dey DK, and Zhang P (2019). Clique-based method for social network clustering. Journal of Classification, 37, 254-274. https://doi.org/10.1007/s00357-019-9310-5
  27. Petersen A and Muller H-G (2016). Functional data analysis for density functions by transformation to a Hilbert space. The Annals of Statistics, 44, 183-218. https://doi.org/10.1214/15-AOS1363
  28. Rajgor DD, Lee MH, Archuleta S, Bagdasarian N, and Quek SC (2020). The many estimates of the COVID-19 case fatality rate. The Lancet Infectious Diseases, 20, 776-777. https://doi.org/10.1016/S1473-3099(20)30244-9
  29. Rahman A and Jiang D (2021). Regional and temporal patterns of influenza: Application of functional data analysis. Infectious Disease Modelling, 6, 1061-1072. https://doi.org/10.1016/j.idm.2021.08.006
  30. Ramsay JO (1982). When the data are functions. Psychometrika, 47, 379-396. https://doi.org/10.1007/BF02293704
  31. Ramsay JO and Dalzell CJ (1991). Some tools for functional data analysis. Journal of the Royal Statistical Society. Series B (Methodological), 53, 539-572. https://doi.org/10.1111/j.2517-6161.1991.tb01844.x
  32. Ramsay JO and Silverman BW, Applied Functional Data Analysis: Methods and Case Studies, Springer-Verlag, New York.
  33. Ramsay JO and Silverman BW (2005). Functional Data Analysis, Springer-Verlag, New York.
  34. Ramsay JO, Hooker G, and Graves S (2009). Functional Data Analysis with R and MATLAB, Springer-Verlag New York, NY.
  35. Ramsay JO, Graves S, and Hooker SG (2020) fda: Functional Data Analysis, R package version 5.1.4. https://CRAN.R-project.org/package=fda
  36. Shen M, Tan H, Zhou S, Smith GN, Walker MC, and Wen SW (2017). Trajectory of blood pressure change during pregnancy and the role of pre-gravid blood pressure: A functional data analysis approach. Scientific Reports, 7, 6227. https://doi.org/10.1038/s41598-017-06606-0
  37. Shin H and Lee S (2015). Canonical correlation analysis for irregularly and sparsely observed functional data. Journal of Multivariate Analysis, 134, 1-18. https://doi.org/10.1016/j.jmva.2014.10.001
  38. Tang B, Xia F, and Tang S, et al. (2020). The effectiveness of quarantine and isolation determine the trend of the COVID-19 epidemic in the final phase of the current outbreak in China. International Journal of Infectious Diseases, 95, 288-293. https://doi.org/10.1016/j.ijid.2020.03.018
  39. Tang C, Wang T, and Zhang P (2021). Functional data analysis: An application to COVID-19 data in the United States. Quantitative Biology.
  40. The Novel Coronavirus Pneumonia Emergency Response Epidemiology Team (2020). Vital Surveillances: The Epidemiological Characteristics of an Outbreak of 2019 novel coronavirus diseases (COVID-19)-China. China CDC Weekly, 2, 113-122. https://doi.org/10.46234/ccdcw2020.032
  41. Ullah S and Finch CF (2013). Applications of functional data analysis: A systematic review. BMC Medical Research Methodology, 13.
  42. World Health Organization (2020). Coronavirus disease 2019 (COVID-19): situation report, 114, https://apps.who.int/iris/handle/10665/332089
  43. Yang W, Muller H-G, and Stadtmuller U (2011). Functional singular component analysis. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 73, 303-324. https://doi.org/10.1111/j.1467-9868.2010.00769.x
  44. Yao F, Muller H-G, and Wang J-L (2005). Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association, 100, 577-590. https://doi.org/10.1198/016214504000001745
  45. Zhu H, Wei L, and Niu P (2020). The novel coronavirus outbreak in Wuhan, China. Global Health Research and Policy, 5.