• Title/Summary/Keyword: string pattern

Search Result 112, Processing Time 0.019 seconds

An Analysis System for Whole Genomic Sequence Using String B-Tree (스트링 B-트리를 이용한 게놈 서열 분석 시스템)

  • Choe, Jeong-Hyeon;Jo, Hwan-Gyu
    • The KIPS Transactions:PartA
    • /
    • v.8A no.4
    • /
    • pp.509-516
    • /
    • 2001
  • As results of many genome projects, genomic sequences of many organisms are revealed. Various methods such as global alignment, local alignment are used to analyze the sequences of the organisms, and k -mer analysis is one of the methods for analyzing the genomic sequences. The k -mer analysis explores the frequencies of all k-mers or the symmetry of them where the k -mer is the sequenced base with the length of k. However, existing on-memory algorithms are not applicable to the k -mer analysis because a whole genomic sequence is usually a large text. Therefore, efficient data structures and algorithms are needed. String B-tree is a good data structure that supports external memory and fits into pattern matching. In this paper, we improve the string B-tree in order to efficiently apply the data structure to k -mer analysis, and the results of k -mer analysis for C. elegans and other 30 genomic sequences are shown. We present a visualization system which enables users to investigate the distribution and symmetry of the frequencies of all k -mers using CGR (Chaotic Game Representation). We also describe the method to find the signature which is the part of the sequence that is similar to the whole genomic sequence.

  • PDF

A Study on Woosu Theory of Tasan Chung, Yak-yong (다산의 악수설에 관한 연구)

  • 임영자;순남숙
    • Journal of the Korean Society of Costume
    • /
    • v.50 no.2
    • /
    • pp.5-13
    • /
    • 2000
  • Woosu is the last process in the preparation of the deceased for burial. There are tow hypotheses on the purpose of Woosu. One is that it is to cover hands. The other is that it connects both arms of th body. The pattern of Woosu is 1 cheok 2 chon in length and 5 chon in width. The length is divided intc 3 parts equally with 4 chon. And then, it will be cutted 1 chon both sides of the middle part in Woosu. So, the width of middle part become 3 chon. Regarding the string used in soosu, two applications have been hypothesised . One is that each string is at both sides, the other is that each string ist at 4 edge respectively. Also there are two hypotheses about number of strings in Woosu. One is that total number of Woosu is one. The other is that the total number of Woosu is two. During the period of the Chosun dynasty, it is acknowledged that Sagye Kim, Jang-Saeng insisted that string of woosu is at one by one at both arms and hands respectively to be useful to cover hands of the body. But again the great scholar Tasan Chung, Yak-young pointed out that there are some problems on the explanatory notes and annotations of Sasangrye which ar quoted by Sagye Kim. Jang-Saeng. He insisted that the purpose of Woosu is to connect both arms using one string. Therefore, I will explain the intrinsic nature of Woosu by researching Tasan Chung , Yak-yong's Woosu theory.

  • PDF

The Consensus String Problem based on Radius is NP-complete (거리반경기반 대표문자열 문제의 NP-완전)

  • Na, Joong-Chae;Sim, Jeong-Seop
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.3
    • /
    • pp.135-139
    • /
    • 2009
  • The problems to compute the distances or similarities of multiple strings have been vigorously studied in such diverse fields as pattern matching, web searching, bioinformatics, computer security, etc. One well-known method to compare multiple strings in the given set is finding a consensus string which is a representative of the given set. There are two objective functions that are frequently used to find a consensus string, one is the radius and the other is the consensus error. The radius of a string x with respect to a set S of strings is the smallest number r such that the distance between the string x and each string in S is at most r. A consensus string based on radius is a string that minimizes the radius with respect to a given set. The consensus error of a string with respect to a given set S is the sum of the distances between x and all the strings in S. A consensus string of S based on consensus error is a string that minimizes the consensus error with respect to S. In this paper, we show that the problem of finding a consensus string based on radius is NP-complete when the distance function is a metric.

Fast Matching Method for DNA Sequences (DNA 서열을 위한 빠른 매칭 기법)

  • Kim, Jin-Wook;Kim, Eun-Sang;Ahn, Yoong-Ki;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.4
    • /
    • pp.231-238
    • /
    • 2009
  • DNA sequences are the fundamental information for each species and a comparison between DNA sequences of different species is an important task. Since DNA sequences are very long and there exist many species, not only fast matching but also efficient storage is an important factor for DNA sequences. Thus, a fast string matching method suitable for encoded DNA sequences is needed. In this paper, we present a fast string matching method for encoded DNA sequences which does not decode DNA sequences while matching. We use four-characters-to-one-byte encoding and combine a suffix approach and a multi-pattern matching approach. Experimental results show that our method is about 5 times faster than AGREP and the fastest among known algorithms.

An Effective Algorithm for Checking Subsumption Relation on String Data Containing Wildcard Characters (와일드카드 문자를 포함하는 스트링 데이터 사이의 포함관계 확인을 위한 효율적인 알고리즘)

  • Kim, Do-Han;Park, Hee-Jin;Paek, Eun-Ok
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.9
    • /
    • pp.475-482
    • /
    • 2005
  • String data containing wildcard characters may represent certain patterns in texts. A subsumption relation between two patterns can be defined by a subset relation between sets of strings that match those patterns. Thus, the subsumption relation check is important to determine whether each pattern represents a set of strings without any overlap with another pattern. In this paper, we propose an effective algorithm that can determine subsumption relation between strings with wildcard characters. First, we consider a simple extension of the suffix tree algorithm so that it nay include wildcard characters and then we propose another method that checks the subsumption relation by dividing a suffix tree structure at each location of string data.

A Study on 1-D Bit-Serial Array Processor Design for Code-String Matching Using a MWLD Algorithm (MWLD 알고리즘을 이용한 문자열정합 1차원 Bit-Serial 어레이 프로세서의 설계)

  • 박종진;김은원;조원경
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.29B no.2
    • /
    • pp.1-8
    • /
    • 1992
  • This paper is proposed a Modified WLD (Weighted Levenshtein Distance) algorithm for processor desihn of code-string matching. A proposed MWLD (Modified Weighted Levenshtein Distance) algorithm is consist of 1-dimension bit-serial array processor to pattern matching using a Hamming Distance. The proposed processor is applied to recognition of character with real time input. The recognition rate of Hangul strokes is resulted to 98.65$\%$

  • PDF

Kinematic Optimization and Experiment on Power Train for Flapping Wing Micro Air Vehicle (날갯짓 초소형 비행체의 끈을 이용한 동력 전달 장치에 대한 기구학적 최적화 및 실험)

  • Gong, Du-Hyun;Shin, Sang-Joon;Kim, Sang-Yong
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.30 no.4
    • /
    • pp.289-296
    • /
    • 2017
  • In this paper, geometrical optimization for newly designed flapping mechanism for insect-like micro air vehicle is presented. The mechanism uses strings to convert rotation of motor to reciprocating wing motion to reduce the total weight and inertial force. The governing algorithm of movement of the mechanism is established considering the characteristic of string that only tensile force can be acted by string, to optimize the kinematics. Modified pattern search method which is complemented to avoid converging into local optimum is adopted to the geometrical optimization of the mechanism. Then, prototype of the optimized geometry is produced and experimented to check the feasibility of the mechanism and the optimization method. The results from optimization and experiment shows good agreement in flapping amplitude and other wing kinematics. Further research will be conducted on dynamic analysis of the mechanism and detailed specification of the prototype.

A String Reconstruction Algorithm and Its Application to Exponentiation Problems (문자열 재구성 알고리즘 및 멱승문제 응용)

  • Sim, Jeong-Seop;Lee, Mun-Kyu;Kim, Dong-Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.9_10
    • /
    • pp.476-484
    • /
    • 2008
  • Most string problems and their solutions are relevant to diverse applications such as pattern matching, data compression, recently bioinformatics, and so on. However, there have been few works on the relations between string problems and cryptographic problems. In this paper, we consider the following string reconstruction problems and show how these problems can be applied to cryptography. Given a string x of length n over a constant-sized alphabet ${\sum}$ and a set W of strings of lengths at most an integer $k({\leq}n)$, the first problem is to find the sequence of strings in W that reconstruct x by the minimum number of concatenations. We propose an O(kn+L)-time algorithm for this problem, where L is the sum of all lengths of strings in a given set, using suffix trees and a shortest path algorithm for directed acyclic graphs. The other is a dynamic version of the first problem and we propose an $O(k^3n+L)$-time algorithm. Finally, we show that exponentiation problems that arise in cryptography can be successfully reduced to these problems and propose a new solution for exponentiation.

Parallel Approximate String Matching with k-Mismatches for Multiple Fixed-Length Patterns in DNA Sequences on Graphics Processing Units (GPU을 이용한 다중 고정 길이 패턴을 갖는 DNA 시퀀스에 대한 k-Mismatches에 의한 근사적 병열 스트링 매칭)

  • Ho, ThienLuan;Kim, HyunJin;Oh, SeungRohk
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.6
    • /
    • pp.955-961
    • /
    • 2017
  • In this paper, we propose a parallel approximate string matching algorithm with k-mismatches for multiple fixed-length patterns (PMASM) in DNA sequences. PMASM is developed from parallel single pattern approximate string matching algorithms to effectively calculate the Hamming distances for multiple patterns with a fixed-length. In the preprocessing phase of PMASM, all target patterns are binary encoded and stored into a look-up memory. With each input character from the input string, the Hamming distances between a substring and all patterns can be updated at the same time based on the binary encoding information in the look-up memory. Moreover, PMASM adopts graphics processing units (GPUs) to process the data computations in parallel. This paper presents three kinds of PMASM implementation methods in GPUs: thread PMASM, block-thread PMASM, and shared-mem PMASM methods. The shared-mem PMASM method gives an example to effectively make use of the GPU parallel capacity. Moreover, it also exploits special features of the CUDA (Compute Unified Device Architecture) memory structure to optimize the performance. In the experiments with DNA sequences, the proposed PMASM on GPU is 385, 77, and 64 times faster than the traditional naive algorithm, the shift-add algorithm and the single thread PMASM implementation on CPU. With the same NVIDIA GPU model, the performance of the proposed approach is enhanced up to 44% and 21%, compared with the naive, and the shift-add algorithms.

A Fast String Matching Scheme without using Buffer for Linux Netfilter based Internet Worm Detection (리눅스 넷필터 기반의 인터넷 웜 탐지에서 버퍼를 이용하지 않는 빠른 스트링 매칭 방법)

  • Kwak, Hu-Keun;Chung, Kyu-Sik
    • The KIPS Transactions:PartC
    • /
    • v.13C no.7 s.110
    • /
    • pp.821-830
    • /
    • 2006
  • As internet worms are spread out worldwide, the detection and filtering of worms becomes one of hot issues in the internet security. As one of implementation methods to detect worms, the Linux Netfilter kernel module can be used. Its basic operation for worm detection is a string matching where coming packet(s) on the network is/are compared with predefined worm signatures(patterns). A worm can appear in a packet or in two (or more) succeeding packets where some part of worm is in the first packet and its remaining part is in its succeeding packet(s). Assuming that the maximum length of a worm pattern is less than 1024 bytes, we need to perform a string matching up to two succeeding packets of 2048 bytes. To do so, Linux Netfilter keeps the previous packet in buffer and performs matching with a combined 2048 byte string of the buffered packet and current packet. As the number of concurrent connections to be handled in the worm detection system increases, the total size of buffer (memory) increases and string matching speed becomes low In this paper, to reduce the memory buffer size and get higher speed of string matching, we propose a string matching scheme without using buffer. The proposed scheme keeps the partial matching result of the previous packet with signatures and has no buffering for previous packet. The partial matching information is used to detect a worm in the two succeeding packets. We implemented the proposed scheme by modifying the Linux Netfilter. Then we compared the modified Linux Netfilter module with the original Linux Netfilter module. Experimental results show that the proposed scheme has 25% lower memory usage and 54% higher speed compared to the original scheme.