Search | Korea Science

Kim, Do-Han;Park, Hee-Jin;Paek, Eun-Ok
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.9
- /
- pp.475-482
- /
- 2005
String data containing wildcard characters may represent certain patterns in texts. A subsumption relation between two patterns can be defined by a subset relation between sets of strings that match those patterns. Thus, the subsumption relation check is important to determine whether each pattern represents a set of strings without any overlap with another pattern. In this paper, we propose an effective algorithm that can determine subsumption relation between strings with wildcard characters. First, we consider a simple extension of the suffix tree algorithm so that it nay include wildcard characters and then we propose another method that checks the subsumption relation by dividing a suffix tree structure at each location of string data.
PDF KSCI

Kim Woo-Cheol;Park Sang-Hyun;Won Jung-Im;Kim Sang-Wook;Yoon Jee-Hee
- Journal of KIISE:Databases
- /
- v.32 no.3
- /
- pp.263-275
- /
- 2005
In a large DNA database, indexing techniques are widely used for rapid approximate sequence searching. However, most indexing techniques require a space larger than original databases, and also suffer from difficulties in seamless integration with DBMS. In this paper, we suggest a space-efficient and disk-based indexing and query processing algorithm for approximate DNA sequence searching, specially exact match queries, wildcard match queries, and k-mismatch queries. Our indexing method places a sliding window at every possible location of a DNA sequence and extracts its signature by considering the occurrence frequency of each nucleotide. It then stores a set of signatures using a multi-dimensional index, such as R*-tree. Especially, by assigning a weight to each position of a window, it prevents signatures from being concentrated around a few spots in index space. Our query processing algorithm converts a query sequence into a multi-dimensional rectangle and searches the index for the signatures overlapped with the rectangle. The experiments with real biological data sets revealed that the proposed method is at least three times, twice, and several orders of magnitude faster than the suffix-tree-based method in exact match, wildcard match, and k- mismatch, respectively.
PDF KSCI

Yun, Gi-tae;Kim, Sung-Tan;Lee, Sang-Won
- Proceedings of the Korea Information Processing Society Conference
- /
- 2009.11a
- /
- pp.795-796
- /
- 2009
DBMS에서 사용되는 SQL의 표준으로는 검색에 관련해서 LIKE 만을 명시하고 있다. LIKE는 2 종류의 와일드 카드 문자를 사용한다. 하지만 두 가지만으로는 사용자의 다양한 검색 요구에 응하기 어렵다. 그 해결방법으로 LIKE를 보완할만한 기능을 가진 정규표현식 검색을 제안하는 바, 이를 DBMS에 추가적으로 구현하는데 있어 고려해야 할 사항을 정리한다.
https://doi.org/10.3745/PKIPS.y2009m11a.795 인용 PDF