• Title/Summary/Keyword: generating string

Search Result 18, Processing Time 0.024 seconds

An Efficient String Similarity Search Technique based on Generating Inverted Lists of Variable-Length Grams (가변길이 그램의 역리스트 생성을 이용한 효율적인 유사 문자열 검색 기법)

  • Kim, Jongik
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1275-1280
    • /
    • 2016
  • Existing techniques for string similarity search first generate a set of candidate strings and then verify the candidates. The efficiency of string similarity search is highly dependent on candidate generation methods. State of the art techniques select fixed length q-grams from a query string and generate candidates using inverted lists of the selected q-grams. In this paper, we propose a technique to generate candidates using variable length grams of a query string and develop a dynamic programming algorithm that selects an optimal combination of variable length grams from a query string. Experimental results show that the proposed technique improves the performance of string similarity search compared with the existing techniques.

Puppet Control System Optimized in the Number of Motors and the Size (구동기 수와 크기에서 최적화된 줄 인형 제어 시스템)

  • Kim, Byeong-Yeol;Han, Young-Jun;Hahn, Hun-Soo
    • The Journal of Korea Robotics Society
    • /
    • v.5 no.4
    • /
    • pp.318-325
    • /
    • 2010
  • This paper proposes a new string controller for puppet which is optimized in terms of the number of motors and its size. To optimize the number of motors needed for generating the essential motions of puppet, the motion of bending a leg is implemented by one string and the walking motion by two legs is implemented by one motor. To minimize the space needed for the controller when generating the essential motions of puppet, cylindrical and articulated joints are used in the controller. The proposed controller is actually implemented to perform various puppet shows and it has been proved that the size of the controller is small enough for two puppets to stand close to shake hands and it is fast enough to simulate fast dance motions.

Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms

  • Galbadrakh, Bulgan;Lee, Kyung-Eun;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.10 no.4
    • /
    • pp.266-270
    • /
    • 2012
  • Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate.

Language-generating-Power of HRNCE Grammars (HRNCE 문법의 언어 생성력)

  • Jeong, Tae-Ui;Park, Dong-Seon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.7
    • /
    • pp.1659-1668
    • /
    • 1996
  • Graph grammars generate graph languages while string grammars generate string languages which are the subset of graph languages. One of the most successful graph grammars models is the NLC grammars, which gen-erate graphs by replacing a node by a node by a graph through node labels. For grammars generating hypergraphs which are the superset of graphs, there are CFHG grammars, which replace a hyperedge by a hypergraph through their preidentified gluing points, an extension of CFHG grammars called HH grammars, which replace a handle by a hypergraph through the rewriting mechanism that can also duplicate or delete the hyperedges surrounding the replaced handle, and finally HRNCE grammars, which replace a handle by a hypergraph through an eNCE way of rewriting, In this paper, we compare the language-generating power of HRNCE grammars with that a graph grammars mentioned above by comparing graph langrages generated by them, respecti vely.

  • PDF

Ambiguity Types of the Homonymic & Heterographic Units for Improving Korean Voice Recognition System - a Preliminary Research (한국어 음성인식 시스템 향상을 위한 동음이철 단위의 중의성 유형 분류)

  • Yoon, Ae-Sun;Kang, Mi-Young
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.67-81
    • /
    • 2008
  • The accuracy rate of P2G (Phoneme-to-Grapheme) is one of the important factors determining the quality of unlimited voice recognition (VR) systems. Few studies were, however, conducted to reduce ambiguities of a phoneme string which can be segmented into a variety of different linguistic units (i.e. morphemes, words, eo-jeols), thus be transformed into more than one grapheme string. This paper is a preliminary research for building a large knowledge base of those homonymic & heterographic units(HHUs), which will provide unlimited Korean VR systems with more accurate P2G information. This paper analyzes 2 main factors generating HHUs: (1) boundary determination of the prosodic unit; (2) its segmentation into linguistic units. In this paper, linguistic characteristics determining variable boundaries of a prosodic unit are investigated, and the ambiguity types of HHUs are classified in accordance with their morphological and syntactic structures as well as with the phonological rules governing them.

  • PDF

Efficient Hyperplane Generation Techniques for Human Activity Classification in Multiple-Event Sensors Based Smart Home (다중 이벤트 센서 기반 스마트 홈에서 사람 행동 분류를 위한 효율적 의사결정평면 생성기법)

  • Chang, Juneseo;Kim, Boguk;Mun, Changil;Lee, Dohyun;Kwak, Junho;Park, Daejin;Jeong, Yoosoo
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.5
    • /
    • pp.277-286
    • /
    • 2019
  • In this paper, we propose an efficient hyperplane generation technique to classify human activity from combination of events and sequence information obtained from multiple-event sensors. By generating hyperplane efficiently, our machine learning algorithm classify with less memory and run time than the LSVM (Linear Support Vector Machine) for embedded system. Because the fact that light weight and high speed algorithm is one of the most critical issue in the IoT, the study can be applied to smart home to predict human activity and provide related services. Our approach is based on reducing numbers of hyperplanes and utilizing robust string comparing algorithm. The proposed method results in reduction of memory consumption compared to the conventional ML (Machine Learning) algorithms; 252 times to LSVM and 34,033 times to LSTM (Long Short-Term Memory), although accuracy is decreased slightly. Thus our method showed outstanding performance on accuracy per hyperplane; 240 times to LSVM and 30,520 times to LSTM. The binarized image is then divided into groups, where each groups are converted to binary number, in order to reduce the number of comparison done in runtime process. The binary numbers are then converted to string. The test data is evaluated by converting to string and measuring similarity between hyperplanes using Levenshtein algorithm, which is a robust dynamic string comparing algorithm. This technique reduces runtime and enables the proposed algorithm to become 27% faster than LSVM, and 90% faster than LSTM.