DOI QR코드

DOI QR Code

시계열 데이터베이스에서 타임 워핑 하의 서브시퀀스 매칭 : 관찰, 최적화, 성능 결과

Subsequence Matching Under Time Warping in Time-Series Databases : Observation, Optimization, and Performance Results

  • 김만순 (강원대학교 컴퓨터정보통신공학과) ;
  • 김상욱 (한양대학교 정보통신학부)
  • 발행 : 2004.12.01

초록

본 논문에서는 시계열 데이터베이스에서 타임 워핑 하의 서브시퀀스 매칭을 효과적으로 처리하는 방안에 관하여 논의한다. 타임 워핑은 시퀀스의 길이가 서로 다른 경우에도 유사한 패턴을 갖는 시퀀스들을 찾을 수 있도록 해 준다. 먼저, 사전 실험을 통하여 기존의 기본적인 처리 방식인 Naive-Scan의 성능 병목이 CPU 처리 과정에 있음을 지적하고, Naive-Scan의 CPU 처리 과정을 최적화하는 새로운 기법을 제안한다. 제안된 기법은 질의 시퀀스와 서브시퀀스들간의 타임 워핑 거리들을 계산하는 과정에서 발생하는 중복 작업들을 사전에 제거함으로써 CPU 처리 성능을 극대화한다. 제안된 기법이 착오 기각을 발생시키지 않음과 Naive-Scan을 처리하기 위한 최적의 기법임을 이론적으로 증명한다. 또한, 제안된 기법을 기존의 타임 워핑 하의 서브시퀀스 매칭 기법인 LB-Scan과 ST-Filter의 후처리 정량적으로 검증한다. 실험 결과에 의하면, 기존의 타임 워핑 하의 서비시퀀스 매칭을 위한 모든 기법들이 제안된 최적화 기법에 의하여 성능이 개선되는 것으로 나타났다. 특히, Nsive-Scan은 최적화 기법의 적용 전에는 가장 떨어지는 성능을 보였으나, 최적화 기법의 적용 후에는 모든 경우에서 ST-Filter나 LB-Scan을 사용한 경우보다 더 좋은 성능을 보였다. 이것은 성능 병목인 CPU 처리 과정을 최적화함으로써 기존 기법들인 Naive-Scan, LB-Scan, ST-Filter 간의 처리 성능 상의 순위 역전 현상이 발생하였음을 보이는 매우 중요한 결과이다.

This paper discusses an effective processing of subsequence matching under time warping in time-series databases. Time warping is a trans-formation that enables finding of sequences with similar patterns even when they are of different lengths. Through a preliminary experiment, we first point out that the performance bottleneck of Naive-Scan, a basic method for processing of subsequence matching under time warping, is on the CPU processing step. Then, we propose a novel method that optimizes the CPU processing step of Naive-Scan. The proposed method maximizes the CPU performance by eliminating all the redundant calculations occurring in computing the time warping distance between the query sequence and data subsequences. We formally prove the proposed method does not incur false dismissals and also is the optimal one for processing Naive-Scan. Also, we discuss the we discuss to apply the proposed method to the post-processing step of LB-Scan and ST-Filter, the previous methods for processing of subsequence matching under time warping. Then, we quantitatively verify the performance improvement ef-fects obtained by the proposed method via extensive experiments. The result shows that the performance of all the three previous methods im-proves by employing the proposed method. Especially, Naive-Scan, which is known to show the worst performance, performs much better than LB-Scan as well as ST-Filter in all cases when it employs the proposed method for CPU processing. This result is so meaningful in that the performance inversion among Nive- Scan, LB-Scan, and ST-Filter has occurred by optimizing the CPU processing step, which is their perform-ance bottleneck.

키워드

참고문헌

  1. R. Agrawal, C. Faloutsos and A. Swami, 'Efficient Similarity Search in Sequence Databases,' In Proc. Int'l. Conf. on Foundations of Data Organization and Algorithms, FODO, pp.69-84, Oct., 1993
  2. C. Chatfield, The Analysis of Time-Series: An Introduction, Third Edition, Chapman and Hall, 1984
  3. R. Agrawal et al., 'Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases,' In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp.490-501, Sept., 1995
  4. C. Faloutsos, M. Ranganathan and Y. Manolopoulos, 'Fast Subsequence Matching in Time-series Databases,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp.419-429, May, 1994 https://doi.org/10.1145/191839.191925
  5. M. S., Chen, J., Han and P. S., Yu, 'Data Mining : An Overview from Database Perspective,' IEEE Trans. on Knowledge and Data Engineering, Vol.8, No.6, pp.866-883, 1996 https://doi.org/10.1109/69.553155
  6. D. Rafiei and A. Mendelzon, 'Similarity-Based Queries for Time-Series Data,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, pp.13-24, 1997 https://doi.org/10.1145/253260.253264
  7. B. K. Yi and C. Faloutsos, 'Fast Time Sequence Indexing for Arbitrary lp Norms,' In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp.385-394, 2000
  8. K. P. Chan and A. W. C. Fu, 'Efficient Time Series Matching by Wavelets,' In Proc. Int'l. Conf. on Data Engineering, IEEE ICDE, pp.126-133, 1999 https://doi.org/10.1109/ICDE.1999.754915
  9. K. K. W. Chu and M. H. Wong, 'Fast Time-Series Searching with Scaling and Shifting,' In Proc. Int'l. Symp. on Principles of Database Systems, ACM PODS, pp.237-248, May, 1999 https://doi.org/10.1145/303976.304000
  10. D. Q. Goldin and P. C. Kanellakis, 'On Similarity Queries for Time-Series Data : Constraint Specification and Implementation,' In Proc. Int'l. Conf. on Principles and Practice of Constraint Programming, CP, pp.137-153, Sept., 1995
  11. D. Rafiei, 'On Similarity-Based Queries for Time Series Data,' In Proc. Int'l. Conf. on Data Engineering, IEEE ICDE, pp.410-417, 1999
  12. G. Das, D. Gunopulos and H. Mannila, 'Finding Similar Time Series,' In Proc. European Symp. on Principles of Data Mining and Knowledge Discovery, PKDD, pp. 88-100, 1997
  13. W. K. Loh, S. W. Kim and K. Y. Whang, 'Index Interpolation : An Approach for Subsequence Matching Supporting Normalization Transform in Time-Series Databases,' In Proc. ACM Int'l. Conf. on Information and Knowledge Management, ACM CIKM, pp.480-487, 2000 https://doi.org/10.1145/354756.354856
  14. W. K Loh, S. W. Kim and K. Y. Whang, 'Index Interpolation : A Subsequence Matching Algorithm Supporting Moving Average Transform of Arbitrary Order in Time-Series Databases,' IEICE Trans. on Information and Systems, Vol.E84-D, No.1, pp.76-86, 2001
  15. D. J. Berndt and J. Clifford, 'Finding Patterns in Time Series : A Dynamic Programming Approach,' Advances in Knowledge Discovery and Data Mining, pp.229-248, 1996
  16. B. K. Yi, H. V. Jagadish and C. Faloutsos, 'Efficient Retrieval of Similar Time Sequences Under Time Warping,' In Proc. Int'l. Conf. on Data Engineering, IEEE ICDE, pp.201-208, 1998 https://doi.org/10.1109/ICDE.1998.655778
  17. S. H. Park et al., 'Efficient Searches for Similar Subsequences of Difference Lengths in Sequence Databases,' In Proc. Int'l. Conf. on Data Engineering, IEEE ICDE, pp.23-32, 2000 https://doi.org/10.1109/ICDE.2000.839384
  18. S. W. Kim, S. H. Park and W. W. Chu, 'An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases,' In Proc. Int'l. Conf. on Data Engineering, IEEE ICDE, pp.607-614, 2001 https://doi.org/10.1109/ICDE.2001.914875
  19. S. H. Park, S. W. Kim, J. S. Cho and S. Padmanabhan, 'Prefix-Querying : An Approach for Effective Subsequence Matching Under Time Warping in Sequence Databases,' In Proc. ACM Int'l. Conf. on Information and Knowledge Management, ACM CIKM, pp.255-262, 2001 https://doi.org/10.1145/502585.502629
  20. L. Rabiner and H. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993
  21. Y. S. Moon, K. Y. Whang and W. K. Loh, 'Duality-Based Subsequence Matching in Time-Series Data-bases,' In Proc. Int'l Conf. on Data Engineering, IEEE ICDE, pp.263-272, 2001 https://doi.org/10.1109/ICDE.2001.914837
  22. Y. S. Moon, K. Y. Whang and W. S. Han, 'GeneralMatch : A Subsequence Matching Method in Time-Series Databases Based on Generalized Windows,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, 2002 https://doi.org/10.1145/564691.564735
  23. S. H. Park, private communication, 2003
  24. G. A. Stephen, String Searching Algorithms, World Scientific Publishing, 1994

피인용 문헌

  1. Mapping Rice Cropping Systems in Vietnam Using an NDVI-Based Time-Series Similarity Measurement Based on DTW Distance vol.8, pp.1, 2016, https://doi.org/10.3390/rs8010019