• Title/Summary/Keyword: 와일드카드 문자

Search Result 3, Processing Time 0.021 seconds

An Effective Algorithm for Checking Subsumption Relation on String Data Containing Wildcard Characters (와일드카드 문자를 포함하는 스트링 데이터 사이의 포함관계 확인을 위한 효율적인 알고리즘)

  • Kim, Do-Han;Park, Hee-Jin;Paek, Eun-Ok
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.9
    • /
    • pp.475-482
    • /
    • 2005
  • String data containing wildcard characters may represent certain patterns in texts. A subsumption relation between two patterns can be defined by a subset relation between sets of strings that match those patterns. Thus, the subsumption relation check is important to determine whether each pattern represents a set of strings without any overlap with another pattern. In this paper, we propose an effective algorithm that can determine subsumption relation between strings with wildcard characters. First, we consider a simple extension of the suffix tree algorithm so that it nay include wildcard characters and then we propose another method that checks the subsumption relation by dividing a suffix tree structure at each location of string data.

A DNA Index Structure using Frequency and Position Information of Genetic Alphabet (염기문자의 빈도와 위치정보를 이용한 DNA 인덱스구조)

  • Kim Woo-Cheol;Park Sang-Hyun;Won Jung-Im;Kim Sang-Wook;Yoon Jee-Hee
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.263-275
    • /
    • 2005
  • In a large DNA database, indexing techniques are widely used for rapid approximate sequence searching. However, most indexing techniques require a space larger than original databases, and also suffer from difficulties in seamless integration with DBMS. In this paper, we suggest a space-efficient and disk-based indexing and query processing algorithm for approximate DNA sequence searching, specially exact match queries, wildcard match queries, and k-mismatch queries. Our indexing method places a sliding window at every possible location of a DNA sequence and extracts its signature by considering the occurrence frequency of each nucleotide. It then stores a set of signatures using a multi-dimensional index, such as R*-tree. Especially, by assigning a weight to each position of a window, it prevents signatures from being concentrated around a few spots in index space. Our query processing algorithm converts a query sequence into a multi-dimensional rectangle and searches the index for the signatures overlapped with the rectangle. The experiments with real biological data sets revealed that the proposed method is at least three times, twice, and several orders of magnitude faster than the suffix-tree-based method in exact match, wildcard match, and k- mismatch, respectively.

Implementation of Regular Expression Searching in DBMS (DBMS에서의 정규표현식 검색기능 구현)

  • Yun, Gi-tae;Kim, Sung-Tan;Lee, Sang-Won
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.795-796
    • /
    • 2009
  • DBMS에서 사용되는 SQL의 표준으로는 검색에 관련해서 LIKE 만을 명시하고 있다. LIKE는 2 종류의 와일드 카드 문자를 사용한다. 하지만 두 가지만으로는 사용자의 다양한 검색 요구에 응하기 어렵다. 그 해결방법으로 LIKE를 보완할만한 기능을 가진 정규표현식 검색을 제안하는 바, 이를 DBMS에 추가적으로 구현하는데 있어 고려해야 할 사항을 정리한다.