DOI QR코드

DOI QR Code

Time series analysis for Korean COVID-19 confirmed cases: HAR-TP-T model approach

한국 COVID-19 확진자 수에 대한 시계열 분석: HAR-TP-T 모형 접근법

  • Yu, SeongMin (Department of Applied Statistics, Gachon University) ;
  • Hwang, Eunju (Department of Applied Statistics, Gachon University)
  • 유성민 (가천대학교 응용통계학과) ;
  • 황은주 (가천대학교 응용통계학과)
  • Received : 2020.11.17
  • Accepted : 2021.01.05
  • Published : 2021.04.30

Abstract

This paper studies time series analysis with estimation and forecasting for Korean COVID-19 confirmed cases, based on the approach of a heterogeneous autoregressive (HAR) model with two-piece t (TP-T) distributed errors. We consider HAR-TP-T time series models and suggest a step-by-step method to estimate HAR coefficients as well as TP-T distribution parameters. In our proposed step-by-step estimation, the ordinary least squares method is utilized to estimate the HAR coefficients while the maximum likelihood estimation (MLE) method is adopted to estimate the TP-T error parameters. A simulation study on the step-by-step method is conducted and it shows a good performance. For the empirical analysis on the Korean COVID-19 confirmed cases, estimates in the HAR-TP-T models of order p = 2, 3, 4 are computed along with a couple of selected lags, which include the optimal lags chosen by minimizing the mean squares errors of the models. The estimation results by our proposed method and the solely MLE are compared with some criteria rules. Our proposed step-by-step method outperforms the MLE in two aspects: mean squares error of the HAR model and mean squares difference between the TP-T residuals and their densities. Moreover, forecasting for the Korean COVID-19 confirmed cases is discussed with the optimally selected HAR-TP-T model. Mean absolute percentage error of one-step ahead out-of-sample forecasts is evaluated as 0.0953% in the proposed model. We conclude that our proposed HAR-TP-T time series model with optimally selected lags and its step-by-step estimation provide an accurate forecasting performance for the Korean COVID-19 confirmed cases.

이 논문에서는, 2개의 혼합된 t-분포(TP-T)의 오차과정을 따르는 이질적 자기회귀 (HAR) 모형을 이용하여, 한국 코로나 (COVID-19) 확진자 수 데이터에 대한 시계열 분석, 즉 추정과 예측에 대하여 연구한다. HAR-TP-T 시계열 모형을 고려하여 HAR 모형의 계수 뿐 아니라 TP-T 오차과정의 모수를 추정하고자 단계별 추정법을 제안한다. 본 연구에서 제안하고 있는 단계별 추정법은, HAR 계수 추정을 위해서는 통상적 최소제곱추정법을 채택하고, TP-T 모수 추정을 위해서는 최대우도추정법을 이용한다. 단계별 추정법에 대한 모의실험을 수행하여, 성능이 우수함을 입증한다. 한국 코로나 확진자 수에 대한 실증적 데이터 분석에서, HAR 모형에서의 차수 p = 2, 3, 4에 대해, 모형의 평균제곱오차가 최소가 되도록 하는 최적화 시간간격(optimal lag)을 포함하여, 여러가지 시간간격을 고려한 HAR-TP-T 모형의 모수 추정값을 계산한다. 제안된 단계별 추정방법과 기존의 MLE만의 방법을, 추정 결과를 제시함으로 함께 비교한다. 본 연구에서 제안하고 있는 추정은 두 가지의 오차 측면, 즉 HAR 모형의 평균제곱오차와 잔차분포에 대한 밀도함수 추정의 평균제곱오차, 두 측면에서 모두 우수함을 입증하였다. 나아가, 추정 결과를 활용한 코로나 확진자 수 예측을 수행하였고, 예측정확도의 한 측도로서 mean absolute percentage error (MAPE)를 계산하여 0.0953%의 매우 작은 오차값을 얻었다. 본 연구에서 선택한 최적화 시간간격을 고려한 HAR-TP-T 시계열 모형 및 단계별 추정 방법은, 정확한 한국 코로나 확진자 수 예측 성능을 제공한다고 할 수 있다.

Keywords

Acknowledgement

본 연구는 가천대학교 교내연구과제 (GCU-202003640001) 지원을 받아 수행되었음.

References

  1. Andersen, T. G. and Bollerslev, T. (1998). Answering the Skeptics: YES, Standard Volatility Models do Provide Accurate Forecasts, International Economic Review, 39, 885-905. https://doi.org/10.2307/2527343
  2. Andrew, D. R. and Mallows, C. L. (1974). Scale mixture of normal distribution. Journal of the Royal Statistical Society: Series B, 36, 99-102.
  3. Arellano-Valle, R. B., Gomez, H., and Quintana, F. A. (2005). Statistical inference for a general class of asymmetric distributions, Journal of Statistical Planning and Inference, 128, 427-443. https://doi.org/10.1016/j.jspi.2003.11.014
  4. Benvenuto, D., Giovanetti, M., Vassallo, L., Angeletti, S., and Cicozzi, A. (2020). Application of the ARMA model on the COVID-2019 epidemic dataset, Data in Brief, 29, 105340. https://doi.org/10.1016/j.dib.2020.105340
  5. Bondon, P. (2009). Estimation of autoregressive models with epsilon-skew-normal innovations, Journal of Multivariate Analysis, 100, 1761-1776. https://doi.org/10.1016/j.jmva.2009.02.006
  6. Branco, M. D. and Dey, D. K. (2001). A general class of multivariate skew-elliptical distributions, Journal of Multivariate Analysis , 79, 99-113. https://doi.org/10.1006/jmva.2000.1960
  7. Ceylan, Z. (2020). Estimation of COVID-19 prevalence in Italy, Spain, and France, Science of the Total Environment, 729, 138817. https://doi.org/10.1016/j.scitotenv.2020.138817
  8. Corsi, F. (2009). A simple approximate long-memory model of realized volatility, Journal of Financial Econometrics, 7, 174-196. https://doi.org/10.1093/jjfinec/nbp001
  9. Ghasami, S., Khodadadi, M., and Maleki, M. (2019). Autoregressive processes with generalized hyperbolic innovations, Communications in Statistics - Simulation and Computation, 49, 3080-3092. https://doi.org/10.1080/03610918.2018.1535066
  10. Ghasami, S., Maleki, M., and Khodadadi, Z. (2020). Leptokurtic and platykurtic class of robust symmetrical and asymmetrical time series models, Journal of Computational and Applied Mathematics, 376, 1-12.
  11. Kirbas, I., Sozen, A., Tuncer, A. D., and Kazancioglu, F. (2020). Comparative analysis and forecasting of COVID-19 cases in various European countries with ARIMA, NARNN and LSTM approaches, Chaos, Solitons & Fractals, 138, 110015. https://doi.org/10.1016/j.chaos.2020.110015
  12. Maleki, M., Mahmoudi, M., Wraith, D., and Pho, K. (2020). Time series modelling to forecast the confirmed and recovered cases of COVID-19. Travel Medicine and Infectious Disease, 37, 101742. https://doi.org/10.1016/j.tmaid.2020.101742
  13. Muller, U. A., Dacorogna, M. M., Dave, R. D., Pictet, O. V., Olsen, R. B., and Ward, J. R. (1993). Fractals and intrinsic time: a challenge to econometricians. in Proceedings of the 39th International AEA Conference on Real Time Econometrics, Luxembourg, October 1993.
  14. Ribeiro, M. H. D. M., da Silva, R. G., Mariani, V. C., and Coelho, L. D. S. (2020). Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil, Chaos, Solitons & Fractals, 135, 109853. https://doi.org/10.1016/j.chaos.2020.109853