DOI QR코드

DOI QR Code

Classification of Time-Series Data Based on Several Lag Windows

  • Received : 20100200
  • Accepted : 20100400
  • Published : 2010.05.31

Abstract

In the case of time-series analysis, it is often more convenient to rely on the frequency domain than the time domain. Spectral density is the core of the frequency-domain analysis that describes autocorrelation structures in a time-series process. Possible ways to estimate spectral density are to compute a periodogram or to average the periodogram over some frequencies with (un)equal weights. This can be an attractive tool to measure the similarity between time-series processes. We employ the metrics based on a smoothed periodogram proposed by Park and Kim (2008) for the classification of different classes of time-series processes. We consider several lag windows with unequal weights instead of a modified Daniel's window used in Park and Kim (2008). We evaluate the performance under various simulation scenarios. Simulation results reveal that the metrics used in this study split the time series into the preassigned clusters better than do the raw-periodogram based ones proposed by Caiado et al. 2006. Our metrics are applied to an economic time-series dataset.

Keywords

References

  1. Baker, F. B. and Hubert, L. J. (1975). Measuring the power of hierarchical cluster analysis, Journal of the American Statistical Association, 70, 31-38. https://doi.org/10.2307/2285371
  2. Bohte, Z., Cepar, D. and Kosmelij, K. (1980). Clustering of time series, In Proceedings of COMPSTAT, 587-593.
  3. Brillinger, D. (1981). Time Series: Data Analysis and Theory, Holden-Day, San Francisco.
  4. Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods, Springer-Verlag, New York.
  5. Caiado, J., Crato, N. and Pena, D. (2006). A periodogram-based metric for time series classification, Computational Statistics and Data Analysis, 50, 2668-2684. https://doi.org/10.1016/j.csda.2005.04.012
  6. Chatfield, C. (1975). The Analysis of Time Series: Theory and practice, Chapman & Hall, London.
  7. Chen, G., Abraham, B. and Peiris, S. (1994). Lag window estimation of the degree of differencing in fractionally integrated time series models, Journal of Time Series Analysis, 15, 473-487. https://doi.org/10.1111/j.1467-9892.1994.tb00205.x
  8. Corduas, M. and Piccolo, D. (2008). Time series clustering and classification by the autoregressive metric, Computational Statistics and Data Analysis, 52, 1860-1872. https://doi.org/10.1016/j.csda.2007.06.001
  9. Cowpertwait, P. S. P. and Cox, T. F. (1992). Clustering population means under heterogeneity of variance with an application to a rainfall time series problem, The Statistician, 41, 113-121. https://doi.org/10.2307/2348642
  10. Galeano, P. and Pena, D. (2000). Multivariate analysis in vector time series, Resenhas, 4, 383-403.
  11. Golay, X., Kollias, S., Stoll, G., Meier, D., Valvanis, A. and Boesiger, P. (1998). A new correlation-based fuzzy logic clustering algorithm for fMRI, Magnetic Resonance in Medicine, 40, 249-260. https://doi.org/10.1002/mrm.1910400211
  12. Goutte, C., Toft, P., Rostrup, E., Nielsen, F. A. and Hansen, L. K. (1999). On clustering fMRI time series, Neuroimage, 9, 298-310. https://doi.org/10.1006/nimg.1998.0391
  13. Kakizawa, Y., Shumway, R. H. and Taniguchi, M. (1998). Discrimination and clustering for multivariate time series, Journal of American Statstical Association, 93, 328-340. https://doi.org/10.2307/2669629
  14. Kovacic, Z. J. (1996). Classification of time series with applications to the leading indicator selection, In Proceedings of the Fifth Conference of IFCS, 2, 204-207.
  15. Kullback, S. (1978). Information Theory and Statistics, Peter Smith, Gloucester, Massachusetts.
  16. Kullback, S. and Leibler, R. A. (1951). On information and sufficiency, Annals of Mathematical Statistics, 22, 79-86. https://doi.org/10.1214/aoms/1177729694
  17. Macchiato, M., La Rotonda, L., Lapenna, V. and Ragosta, M. (1995). Time modelling and spatial clustering of daily ambient temperature an application in Southern Italy, Environmetrics, 6, 31-53. https://doi.org/10.1002/env.3170060105
  18. Maharaj, E. A. (2000). Cluster of time series, Journal of Classification, 17, 297-314. https://doi.org/10.1007/s003570000023
  19. Park, M. S. and Kim, H.-Y. (2008). Classification of precipitation data based on smoothed periodogram, The Korean Journal of Applied Statistics, 21, 547-560. https://doi.org/10.5351/KJAS.2008.21.3.547
  20. Pena, D. and Poncela, P. (2006). Nonstationary dynamic factor models, Journal of Statistical Planning and Inference, 136, 1237-1257. https://doi.org/10.1016/j.jspi.2004.08.020
  21. Piccolo, D. (1990). A distance measure for classifying ARIMA models, Journal of Time Series Analysis, 11, 153-164. https://doi.org/10.1111/j.1467-9892.1990.tb00048.x
  22. Priestley, M. B. (1981). Spectral Analysis and Time Series, Academic Press, New York.
  23. R Development Core Team (2006). R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
  24. Shumway, R. H. (2003). Time-frequency clustering and discriminant analysis, Statistics and Probability Letters, 63, 307-314. https://doi.org/10.1016/S0167-7152(03)00095-6
  25. Wismuller, A., Lange, O., Dersch, D. R., Leinsinger, G. L., Hahn, K., Putz, B. and Auer, D. (2002). Cluster analysis of biomedical image time-series, International Journal of Computer Vision, 46, 103-128. https://doi.org/10.1023/A:1013550313321