• Title/Summary/Keyword: 매칭인덱스

Search Result 63, Processing Time 0.025 seconds

Hybrid Lower-Dimensional Transformation for Similar Sequence Matching (유사 시퀀스 매칭을 위한 하이브리드 저차원 변환)

  • Moon, Yang-Sae;Kim, Jin-Ho
    • The KIPS Transactions:PartD
    • /
    • v.15D no.1
    • /
    • pp.31-40
    • /
    • 2008
  • We generally use lower-dimensional transformations to convert high-dimensional sequences into low-dimensional points in similar sequence matching. These traditional transformations, however, show different characteristics in indexing performance by the type of time-series data. It means that the selection of lower-dimensional transformations makes a significant influence on the indexing performance in similar sequence matching. To solve this problem, in this paper we propose a hybrid approach that integrates multiple transformations and uses them in a single multidimensional index. We first propose a new notion of hybrid lower-dimensional transformation that exploits different lower-dimensional transformations for a sequence. We next define the hybrid distance to compute the distance between the transformed sequences. We then formally prove that the hybrid approach performs the similar sequence matching correctly. We also present the index building and the similar sequence matching algorithms that use the hybrid approach. Experimental results for various time-series data sets show that our hybrid approach outperforms the single transformation-based approach. These results indicate that the hybrid approach can be widely used for various time-series data with different characteristics.

An Index Data Structure for String Search in External Memory (외부 메모리에서 문자열을 효율적으로 탐색하기 위한 인덱스 자료 구조)

  • Na, Joong-Chae;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.598-607
    • /
    • 2005
  • We propose a new external-memory index data structure, the Suffix B-tree. The Suffix B-tree is a B-tree in which the key is a string like the String B-tree. While the node in the String B-tree is implemented with a Patricia trio, the node in the Suffix B-tree is implemented with an array. So the Suffix B-tree is simpler and easier to be Implemented than the String B-tree. Nevertheless, the branching algorithm of the Suffix B-tree is as efficient as that of the String B-tree. Consequently, the Suffix B-tree takes the same worst-case disk accesses as the String B-tree to solve the string matching problem, which is fundamental and important in the area of string algorithms.

Noise Control Boundary Image Matching Using Time-Series Moving Average Transform (시계열 이동평균 변환을 이용한 노이즈 제어 윤곽선 이미지 매칭)

  • Kim, Bum-Soo;Moon, Yang-Sae;Kim, Jin-Ho
    • Journal of KIISE:Databases
    • /
    • v.36 no.4
    • /
    • pp.327-340
    • /
    • 2009
  • To achieve the noise reduction effect in boundary image matching, we use the moving average transform of time-series matching. Our motivation is based on an intuition that using the moving average transform we may exploit the noise reduction effect in boundary image matching as in time-series matching. To confirm this simple intuition, we first propose $\kappa$-order image matching, which applies the moving average transform to boundary image matching. A boundary image can be represented as a sequence in the time-series domain, and our $\kappa$-order image matching identifies similar images in this time-series domain by comparing the $\kappa$-moving average transformed sequences. Next, we propose an index-based matching method that efficiently performs $\kappa$-order image matching on a large volume of image databases, and formally prove the correctness of the index-based method. Moreover, we formally analyze the relationship between an order $\kappa$ and its matching result, and present a systematic way of controlling the noise reduction effect by changing the order $\kappa$. Experimental results show that our $\kappa$-order image matching exploits the noise reduction effect, and our index-based matching method outperforms the sequential scan by one or two orders of magnitude.

Window-Join: An Optimal Way to Process Duality-Based Subsequence Matchi (윈도우-초인: 이원성 기반 서브시퀸스 매칭을 위한 최적의 방법)

  • 김상욱;박대현;이헌길;김만순;박정일
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.184-186
    • /
    • 2001
  • 본 논문에서는 시계열 데이터베이스에서 서브시퀸스 매칭을 효과적으로 처리하는 방안에 관하여 논의한다. 본 논문에서는 먼저, 기존의 이원성 기반 서브시퀸스 매칭 기법에서 발생하는 성능상의 문제점들을 지적하고, 이들을 해결할 수 있는 방법을 제시한다. 제안된 기법은 서브시퀸스 매칭 시 요구되는 인덱스 검색을 윈도우-조인이라는 일종의 공간 조인 문제로 새롭게 해석하는 것에서 출발한다. 제안된 기법에서는 효과적인 윈도우-조인의 처리를 위하여 질의 윈도우 점들을 위한 R*-트리를 주기억장치 내에 on-the-fly로 구성하는 방법을 사용한다. 또한, 데이터 윈도우 점들을 위한 디스크 상의 R*-트리와 질 윈도우 점들을 위한 주기억장치 상의 R*-트리를 효과적으로 조인할 수 있는 새로운 알고리즘을 제안한다. 제안된 기법은 R*-트리 페이지들을 착오 채택 없이 단 한번만 디스크로부터 액세스 측면에서 이원성 기반 서브시퀸스 매칭을 위한 최적의 기법이다.

  • PDF

Visualization Tool for Scaling-Invariant Boundary Image Matching (스케일링-불변 윤곽선 이미지 매칭의 시각화 도구)

  • Moon, Seongwoo;Lee, Sanghun;Kim, Bum-Soo;Moon, Yang-Sae
    • Annual Conference of KIPS
    • /
    • 2015.04a
    • /
    • pp.683-686
    • /
    • 2015
  • 본 논문에서는 스케일링-불변 윤곽선 이미지 매칭의 시각화 도구를 제안한다. 윤곽선 이미지를 시계열로 나타낼 경우, 시계열 매칭 기술을 활용하여 대용량 윤곽선 이미지 매칭을 보다 빠르게 수행할 수 있다. 이러한 윤곽선 이미지 매칭에서, 스케일링 불변의 지원은 스케일된 유사 이미지를 검색하기 위한 중요한 요소이다. 본 논문에서는 스케일링-불변 윤곽선 이미지 매칭 시스템을 클라이언트-서버 모델을 기반으로 구현한다. 먼저, 클라이언트는 질의 이미지를 시계열로 변환하고, 스케일링 팩터 구간 및 허용치와 함께 서버에 전달하고, 매칭 결과로 반환된 이미지를 차트 형태로 시각화한다. 다음으로 서버는 다차원 인덱스를 활용하여 대용량 윤곽선 시계열 데이터에 대한 빠른 시계열 매칭을 수행한다. 구현 결과, 제안하는 윤곽선 이미지 매칭 시각화 도구는 질의 이미지와 스케일링-불변 결과 이미지를 세 가지의 차트를 통해 직관적으로 비교 및 분석 가능하게 하였다.

Effectiveness Evaluations of Subsequence Matching Methods Using KOSPI Data (한국 주식 데이터를 이용한 서브시퀀스 매칭 방법의 효과성 평가)

  • Yoo Seung Keun;Lee Sang Ho
    • The KIPS Transactions:PartD
    • /
    • v.12D no.3 s.99
    • /
    • pp.355-364
    • /
    • 2005
  • Previous researches on subsequence matching have been focused on how to make indexes in order to speed up the matching time, and do not take into account the effectiveness issues of subsequence matching methods. This paper considers the effectiveness of subsequence matching methods and proposes two metrics for effectiveness evaluations of subsequence matching algorithms. We have applied the proposed metrics to Korean stock data and five known matching algorithms. The analysis on the empirical data shows that two methods (i.e., the method supporting normalization, and the method supporting scaling and shifting) outperform the others in terms of the effectiveness of subsequence matching.

An Efficient Algorithm for Streaming Time-Series Matching that Supports Normalization Transform (정규화 변환을 지원하는 스트리밍 시계열 매칭 알고리즘)

  • Loh, Woong-Kee;Moon, Yang-Sae;Kim, Young-Kuk
    • Journal of KIISE:Databases
    • /
    • v.33 no.6
    • /
    • pp.600-619
    • /
    • 2006
  • According to recent technical advances on sensors and mobile devices, processing of data streams generated by the devices is becoming an important research issue. The data stream of real values obtained at continuous time points is called streaming time-series. Due to the unique features of streaming time-series that are different from those of traditional time-series, similarity matching problem on the streaming time-series should be solved in a new way. In this paper, we propose an efficient algorithm for streaming time- series matching problem that supports normalization transform. While the existing algorithms compare streaming time-series without any transform, the algorithm proposed in the paper compares them after they are normalization-transformed. The normalization transform is useful for finding time-series that have similar fluctuation trends even though they consist of distant element values. The major contributions of this paper are as follows. (1) By using a theorem presented in the context of subsequence matching that supports normalization transform[4], we propose a simple algorithm for solving the problem. (2) For improving search performance, we extend the simple algorithm to use $k\;({\geq}\;1)$ indexes. (3) For a given k, for achieving optimal search performance of the extended algorithm, we present an approximation method for choosing k window sizes to construct k indexes. (4) Based on the notion of continuity[8] on streaming time-series, we further extend our algorithm so that it can simultaneously obtain the search results for $m\;({\geq}\;1)$ time points from present $t_0$ to a time point $(t_0+m-1)$ in the near future by retrieving the index only once. (5) Through a series of experiments, we compare search performances of the algorithms proposed in this paper, and show their performance trends according to k and m values. To the best of our knowledge, since there has been no algorithm that solves the same problem presented in this paper, we compare search performances of our algorithms with the sequential scan algorithm. The experiment result showed that our algorithms outperformed the sequential scan algorithm by up to 13.2 times. The performances of our algorithms should be more improved, as k is increased.

Sparse Signal Recovery with Parallel Orthogonal Matching Pursuit and Its Performances (병렬OMP 기법을 통한 성긴신호 복원과 그 성능)

  • Park, Jeonghong;Jung, Bang Chul;Kim, Jong Min;Ban, Tae Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.8
    • /
    • pp.1784-1789
    • /
    • 2013
  • In this paper, parallel orthogonal matching pursuit (POMP) is proposed to supplement the orthogonal matching pursuit (OMP) which has been widely used as a greedy algorithm for sparse signal recovery. The process of POMP is simple but effective: (1) multiple indexes maximally correlated with the observation vector are chosen at the firest iteration, (2) the conventional OMP process is carried out in parallel for each selected index, (3) the index set which yields the minimum residual is selected for reconstructing the original sparse signal. Empirical simulations show that POMP outperforms than the existing sparse signal recovery algorithms in terms of exact recovery ratio (ERR) for sparse pattern and mean-squared error (MSE) between the estimated signal and the original signal.

Searching for Variants Using Trie-Index (트라이 인덱스를 이용한 이형태 검색)

  • Park, In-Cheol
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.8
    • /
    • pp.1986-1992
    • /
    • 2009
  • A user often searches a data by inputting a variant such as the abbreviation or substring of a word, or a misspelled word. The simple approach to the searching for variants is to build a variants dictionary. However, it entails enormous cost and time and can not handle variants by misspelling. Approximate searching, searching by approximate string matching, is a good approach to the searching. A problem in the approach is that it cannot handle variants by abbreviations. This paper propose a method for searching various variants including abbreviations and misspelled words, by using the trie indexing. First, this paper shows a variant matching method with the calculation of path weighted-metric. In addition, it provides variant searching algorithm to reduce the search time.

Extension of the Prefix-Querying Method for Efficient Time-Series Subsequence Matching Under Time Warping (타임 워핑 하의 효율적인 시계열 서브시퀀스 매칭을 위한 접두어 질의 기법의 확장)

  • Chang, Byoung-Chol;Kim, Sang-Wook;Cha, Jae-Hyuk
    • Annual Conference of KIPS
    • /
    • 2005.11a
    • /
    • pp.121-124
    • /
    • 2005
  • 본 논문에서는 타임 워핑 하의 시계열 서브시퀀스 매칭을 처리하는 방법에 대하여 논의한다. 타임 워핑은 시퀀스의 길이가 서로 다른 경우에도 유사한 패턴을 갖는 시퀀스들을 찾을 수 있도록 해 주는 변환이다. 접두어 질의 기법(prefix-querying method)는 착오 기각(false dismissal) 없이 타임 워핑 하의 시계열 서브시퀀스 매칭을 처리하는 인덱스를 이용한 최초의 방식이다. 이 방법은 사용자가 질의를 편리하게 작성하도록 하기 위하여 기본 거리 함수로서 $L_{\infty}$를 사용한다. 본 논문에서는 $L_{\infty}$ 대신 타임 워핑 하의 시계열 서브시퀀스 매칭에서 기본 거리 함수로서 가장 널리 사용되는 $L_1$을 적용할 수 있도록 접두어 질의를 확장한다. 또한, 제안된 기법으로 타임 워핑 하의 시계열 서브시퀀스 매칭을 수행하는 경우 착오 기각이 발생하지 않음을 이론적으로 증명한다. 다양한 실험을 통한 성능 평가를 통하여 본 연구에서 제시하는 기법의 우수성을 검증한다. 실험 결과에 의하면, 제안된 기법은 가장 좋은 성능을 보이는 기존의 기법과 비교하여 매우 뛰어난 성능 개선 효과를 보이는 것으로 나타났다.

  • PDF