• Title/Summary/Keyword: Exact string matching

Search Result 6, Processing Time 0.017 seconds

String Matching Algorithm on Multi-byte Character Set Texts (다중바이트 문자집합 텍스트에서의 문자열 검색 알고리즘)

  • Kim, Eun-Sang;Kim, Jin-Wook;Park, Kun-Soo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.10
    • /
    • pp.1015-1019
    • /
    • 2010
  • An extensive research on exact string matching has been done, but there have been few researches on the matching in multi-byte character set texts such as EUC~KR. This paper shows that false matches may occur in multi-byte character set texts such as EUC-KR when using KMP algorithm, and presents a refined KMP algorithm without false matches applying a character-based prefix function. And also, Experimental results show that our algorithm is faster than string matching algorithms of widely used editors, Vim and Emacs, and the existing automata-based algorithm.

A Novel Scalable and Storage-Efficient Architecture for High Speed Exact String Matching

  • Peiravi, Ali;Rahimzadeh, Mohammad Javad
    • ETRI Journal
    • /
    • v.31 no.5
    • /
    • pp.545-553
    • /
    • 2009
  • String matching is a fundamental element of an important category of modern packet processing applications which involve scanning the content flowing through a network for thousands of strings at the line rate. To keep pace with high network speeds, specialized hardware-based solutions are needed which should be efficient enough to maintain scalability in terms of speed and the number of strings. In this paper, a novel architecture based upon a recently proposed data structure called the Bloomier filter is proposed which can successfully support scalability. The Bloomier filter is a compact data structure for encoding arbitrary functions, and it supports approximate evaluation queries. By eliminating the Bloomier filter's false positives in a space efficient way, a simple yet powerful exact string matching architecture is proposed that can handle several thousand strings at high rates and is amenable to on-chip realization. The proposed scheme is implemented in reconfigurable hardware and we compare it with existing solutions. The results show that the proposed approach achieves better performance compared to other existing architectures measured in terms of throughput per logic cells per character as a metric.

Design and Implementation of High-Speed Pattern Matcher in Network Intrusion Detection System (네트워크 침입 탐지 시스템에서 고속 패턴 매칭기의 설계 및 구현)

  • Yoon, Yeo-Chan;Hwang, Sun-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.11B
    • /
    • pp.1020-1029
    • /
    • 2008
  • This paper proposes an high speed pattern matching algorithm and its implementation. The pattern matcher is used to check patterns from realtime input packet. The proposed algorithm can find exact string, range of string values, and combination of string values from input packet at high speed. Given string and rule set are modelled as a state transition graph which can find overlapped strings simultaneously, and the state transition graph is partitioned according to input implicants to reduce implementation complexity. The pattern matcher scheme uses the transformed state transition graph and input packet as an input. The pattern matcher was modelled and implemented in VHDL language. Experimental results show the proprieties of the proposed approach.

A Novel Cryptosystem Based on Steganography and Automata Technique for Searchable Encryption

  • Truong, Nguyen Huy
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.2258-2274
    • /
    • 2020
  • In this paper we first propose a new cryptosystem based on our data hiding scheme (2,9,8) introduced in 2019 with high security, where encrypting and hiding are done at once, the ciphertext does not depend on the input image size as existing hybrid techniques of cryptography and steganography. We then exploit our automata approach presented in 2019 to design two algorithms for exact and approximate pattern matching on secret data encrypted by our cryptosystem. Theoretical analyses remark that these algorithms both have O(n) time complexity in the worst case, where for the approximate algorithm, we assume that it uses ⌈(1-ε)m)⌉ processors, where ε, m and n are the error of our string similarity measure and lengths of the pattern and secret data, respectively. In searchable encryption, our cryptosystem is used by users and our pattern matching algorithms are performed by cloud providers.

A Study on the Method and System for Organization's Name Authorization of Korean Science and Technology Contents (국내 과학기술콘텐츠 전거데이터 구축을 위한 소속기관명 식별 방법과 시스템에 관한 연구)

  • Kim, Jinyoung;Lee, Seok-Hyong;Suh, Dongjun;Kim, Kwang-Young
    • Journal of Digital Contents Society
    • /
    • v.17 no.6
    • /
    • pp.555-563
    • /
    • 2016
  • Science and technology contents (research papers, patents, reports) are the most common reference material for researchers involved in research and development in the fields of science and technology. Based on various search elements (title, abstract, keyword, year of publication, name of journal, name of author, publisher, etc.), many services are available for users to search science and technology contents and bibliographic information owned by libraries. Authority data on organization name can be useful as an element for author identification and as an element to search for results produced by specific organizations. However, organization name is not taken into account by current search services for domestic academic information and bibliographic records. This study analyzes organization name data contained in the metadata of science and technology contents, which are the basis of the establishment of authority data, and proposes a method and system based on string containment and exact string matching.

Modern Methods of Text Analysis as an Effective Way to Combat Plagiarism

  • Myronenko, Serhii;Myronenko, Yelyzaveta
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.242-248
    • /
    • 2022
  • The article presents the analysis of modern methods of automatic comparison of original and unoriginal text to detect textual plagiarism. The study covers two types of plagiarism - literal, when plagiarists directly make exact copying of the text without changing anything, and intelligent, using more sophisticated techniques, which are harder to detect due to the text manipulation, like words and signs replacement. Standard techniques related to extrinsic detection are string-based, vector space and semantic-based. The first, most common and most successful target models for detecting literal plagiarism - N-gram and Vector Space are analyzed, and their advantages and disadvantages are evaluated. The most effective target models that allow detecting intelligent plagiarism, particularly identifying paraphrases by measuring the semantic similarity of short components of the text, are investigated. Models using neural network architecture and based on natural language sentence matching approaches such as Densely Interactive Inference Network (DIIN), Bilateral Multi-Perspective Matching (BiMPM) and Bidirectional Encoder Representations from Transformers (BERT) and its family of models are considered. The progress in improving plagiarism detection systems, techniques and related models is summarized. Relevant and urgent problems that remain unresolved in detecting intelligent plagiarism - effective recognition of unoriginal ideas and qualitatively paraphrased text - are outlined.