• Title/Summary/Keyword: string matching

Search Result 102, Processing Time 0.025 seconds

Efficient Construction of Generalized Suffix Arrays by Merging Suffix Arrays (써픽스 배열 합병을 이용한 일반화된 써픽스 배열의 효율적인 구축 알고리즘)

  • Jeon, Jeong-Eun;Park, Heejin;Kim, Dong-Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.6
    • /
    • pp.268-278
    • /
    • 2005
  • We consider constructing the generalized suffix way of strings A and B when the suffix arrays of A and B are given, j.e., merging two suffix arrays of A and B. There are efficient algorithms to merge some special suffix arrays such as the odd array and the even array. However, for the general case that A and B are arbitrary strings, no efficient merging algorithms have been developed. Thus, one had to construct the generalized suffix arrays of A and B by constructing the suffix array of A$\#$B$\$$ from scratch, even though the suffix ways of A and B are given. In this paper, we Present efficient merging algorithms for the suffix arrays of two arbitrary strings A and B drawn from constant and integer alphabets. The experimental results show that merging two suffix ways of A and B are about 5 times faster than constructing the suffix way of A$\#$B$\$$ from scratch for constant alphabets. Our algorithms include searching all suffixes of string B in the suffix array of A. To do this, we use suffix links in suffix ways and we developed efficient algorithms for computing the suffix links. Efficient computation of suffix links is another contribution of this paper because it can be used to solve other problems occurred in bioinformatics that should search all suffixes of a given string in the suffix array of another string such as computing matching statistics, finding longest common substrings, and so on. The experimental results show that our methods for computing suffix links is about 3-4 times faster than the previous fastest methods.

Development of polygon object set matching algorithm between heterogeneous digital maps - using the genetic algorithm based on the shape similarities (형상 유사도 기반의 유전 알고리즘을 활용한 이종 수치지도 간의 면 객체 집합 정합 알고리즘 개발)

  • Huh, Yong;Lee, Jeabin
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.31 no.1
    • /
    • pp.1-9
    • /
    • 2013
  • This paper proposes a matching algorithm to find corresponding polygon feature sets between heterogeneous digital maps. The algorithm finds corresponding sets in terms of optimizing their shape similarities based on the assumption that the feature sets describing the same entities in the real world are represented in similar shapes. Then, by using a binary code, it is represented that a polygon feature is chosen for constituting a corresponding set or not. These codes are combined into a binary string as a candidate solution of the matching problem. Starting from initial candidate solutions, a genetic algorithm iteratively optimizes the candidate solutions until it meets a termination condition. Finally, it presents the solution with the highest similarity. The proposed method is applied for the topographical and cadastral maps of an urban region in Suwon, Korea to find corresponding polygon feature sets for block areas, and the results show its feasibility. The results were assessed with manual detection results, and showed overall accuracy of 0.946.

Video Index Generation and Search using Trie Structure (Trie 구조를 이용한 비디오 인덱스 생성 및 검색)

  • 현기호;김정엽;박상현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.610-617
    • /
    • 2003
  • Similarity matching in video database is of growing importance in many new applications such as video clustering and digital video libraries. In order to provide efficient access to relevant data in large databases, there have been many research efforts in video indexing with diverse spatial and temporal features. however, most of the previous works relied on sequential matching methods or memory-based inverted file techniques, thus making them unsuitable for a large volume of video databases. In order to resolve this problem, this paper proposes an effective and scalable indexing technique using a trie, originally proposed for string matching, as an index structure. For building an index, we convert each frame into a symbol sequence using a window order heuristic and build a disk-resident trie from a set of symbol sequences. For query processing, we perform a depth-first search on the trie and execute a temporal segmentation. To verify the superiority of our approach, we perform several experiments with real and synthetic data sets. The results reveal that our approach consistently outperforms the sequential scan method, and the performance gain is maintained even with a large volume of video databases.

Code Optimization Using Pattern Table (패턴 테이블을 이용한 코드 최적화)

  • Yun Sung-Lim;Oh Se-Man
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.11
    • /
    • pp.1556-1564
    • /
    • 2005
  • Various optimization techniques are deployed in the compilation process of a source program for improving the program's execution speed and reducing the size of the source code. Of the optimization pattern matching techniques, the string pattern matching technique involves finding an optimal pattern that corresponds to the intermediate code. However, it is deemed inefficient due to excessive time required for optimized pattern search. The tree matching pattern technique can result in many redundant comparisons for pattern determination, and there is also the disadvantage of high cost involved in constructing a code tree. The objective of this paper is to propose a table-driven code optimizer using the DFA(Deterministic Finite Automata) optimization table to overcome the shortcomings of existing optimization techniques. Unlike other techniques, this is an efficient method of implementing an optimizer that is constructed with the deterministic automata, which determines the final pattern, refuting the pattern selection cost and expediting the pattern search process.

  • PDF

δ-approximate Periods and γ-approximate Periods of Strings over Integer Alphabets (정수문자집합에 대한 문자열의 δ-근사주기와 γ-근사주기)

  • Kim, Youngho;Sim, Jeong Seop
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1073-1078
    • /
    • 2016
  • (${\delta}$, ${\gamma}$)-matching for strings over integer alphabets can be applied to such fields as musical melody and share prices on stock markets. In this paper, we define ${\delta}$-approximate periods and ${\gamma}$-approximate periods of strings over integer alphabets. We also present two $O(n^2)$-time algorithms, each of which finds minimum ${\delta}$-approximate periods and minimum ${\gamma}$-approximate periods, respectively. Then, we provide the experimental results of execution times of both algorithms.

Effective Scheme for File Search Engine in Mobile Environments (모바일 환경에서 파일 검색 엔진을 위한 효과적인 방식)

  • Cho, Jong-Keun;Ha, Sang-Eun
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.11
    • /
    • pp.41-48
    • /
    • 2008
  • This study focuses on the modeling file search engine and suggesting modified file search schema based on weight value using file contents in order to improve the performance in terms of search accuracy and matching time. Most of the file search engines have used string matching algorithms like KMP(Knuth.Morris.Pratt), which may limit portability and fast searching time. However, this kind of algorithms don't find exactly the files what you want. Hence, the file search engine based on weight value using file contents is proposed here in order to optimize the performance for mobile environments. The Comparison with previous research shows that the proposed schema provides better.

Modern Methods of Text Analysis as an Effective Way to Combat Plagiarism

  • Myronenko, Serhii;Myronenko, Yelyzaveta
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.242-248
    • /
    • 2022
  • The article presents the analysis of modern methods of automatic comparison of original and unoriginal text to detect textual plagiarism. The study covers two types of plagiarism - literal, when plagiarists directly make exact copying of the text without changing anything, and intelligent, using more sophisticated techniques, which are harder to detect due to the text manipulation, like words and signs replacement. Standard techniques related to extrinsic detection are string-based, vector space and semantic-based. The first, most common and most successful target models for detecting literal plagiarism - N-gram and Vector Space are analyzed, and their advantages and disadvantages are evaluated. The most effective target models that allow detecting intelligent plagiarism, particularly identifying paraphrases by measuring the semantic similarity of short components of the text, are investigated. Models using neural network architecture and based on natural language sentence matching approaches such as Densely Interactive Inference Network (DIIN), Bilateral Multi-Perspective Matching (BiMPM) and Bidirectional Encoder Representations from Transformers (BERT) and its family of models are considered. The progress in improving plagiarism detection systems, techniques and related models is summarized. Relevant and urgent problems that remain unresolved in detecting intelligent plagiarism - effective recognition of unoriginal ideas and qualitatively paraphrased text - are outlined.

Design and Implementation of High-Speed Pattern Matcher Using Multi-Entry Simultaneous Comparator in Network Intrusion Detection System (네트워크 침입 탐지 시스템에서 다중 엔트리 동시 비교기를 이용한 고속패턴 매칭기의 설계 및 구현)

  • Jeon, Myung-Jae;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.11
    • /
    • pp.2169-2177
    • /
    • 2015
  • This paper proposes a new pattern matching module to overcome the increased runtime of previous algorithm using RAM, which was designed to overcome cost limitation of hash-based algorithm using CAM (Content Addressable Memory). By adopting Merge FSM algorithm to reduce the number of state, the proposed module contains state block and entry block to use in RAM. In the proposed module, one input string is compared with multiple entry strings simultaneously using entry block. The effectiveness of the proposed pattern matching unit is verified by executing Snort 2.9 rule set. Experimental results show that the number of memory reads has decreased by 15.8%, throughput has increased by 47.1%, while memory usage has increased by 2.6%, when compared to previous methods.

Semantic Image Retrieval Using Color Distribution and Similarity Measurement in WordNet (컬러 분포와 WordNet상의 유사도 측정을 이용한 의미적 이미지 검색)

  • Choi, Jun-Ho;Cho, Mi-Young;Kim, Pan-Koo
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.509-516
    • /
    • 2004
  • Semantic interpretation of image is incomplete without some mechanism for understanding semantic content that is not directly visible. For this reason, human assisted content-annotation through natural language is an attachment of textual description to image. However, keyword-based retrieval is in the level of syntactic pattern matching. In other words, dissimilarity computation among terms is usually done by using string matching not concept matching. In this paper, we propose a method for computerized semantic similarity calculation In WordNet space. We consider the edge, depth, link type and density as well as existence of common ancestors. Also, we have introduced method that applied similarity measurement on semantic image retrieval. To combine wi#h the low level features, we use the spatial color distribution model. When tested on a image set of Microsoft's 'Design Gallery Line', proposed method outperforms other approach.

String matching for Network Intrusion Detection System using FPGA (FPGA를 사용한 네트워크 침입탐지 시스템의 문자열 비교)

  • Lee, Jang-Haeng;Hwang, Sung-Ho;Park, Neung-Soo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11a
    • /
    • pp.886-888
    • /
    • 2005
  • Network Intrusion Detection System(NIDS)는 네트워크를 통해 들어오는 패킷들을 모니터링 하고 분석하여 내부 시스템에 유해한 내용을 담고 있는 패킷을 탐지 하는 시스템이다. 이 시스템은 네트워크의 안에서 돌아다니는 패킷을 놓치지 않고 분석할 수 있어야 하며, 예측 불허의 공격 방법들에 대해서는 새로운 법칙을 적용하여 방어할 수 있어야 한다. 본 연구에서 NDIS에 snort를 이용한 소프트웨어적인 패턴매칭을 FPGA를 이용하여 하드웨어적 패턴매칭으로 구현하였으며, 새로운 법칙에 따라서 유연하게 적응할 수 있도록 패턴매칭을 정규 표현식(Regular Expression)으로 나타내어 FPGA에 재구성할 수 있도록 하였다.

  • PDF