• Title/Summary/Keyword: string matching

Search Result 102, Processing Time 0.023 seconds

A Mark Automatic Checking System to Inspect Character String on Chip (칩의 문자들을 검사하기 위한 마크 자동 검사 시스템)

  • Kim, Eun-Seok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.3
    • /
    • pp.577-583
    • /
    • 2007
  • The character strings on chips and components are so tiny and numerous that it is a very difficult work for people to perform. In this paper, we propose a mark automatic checking system, which will determine whether chip is wrong-mark or not by recognizing characters on chips. Lots of faulty detection conditions and template matching methods are used to inspect the faulty mark items. The faulty detection classifies conditions as five kinds-darkness, matching, area, broken and branch. A series of experimentation show that the method proposed here can offer an effective way to determine wrong-mark on chips.

Suffix Tree Constructing Algorithm for Large DNA Sequences Analysis (대용량 DNA서열 처리를 위한 서픽스 트리 생성 알고리즘의 개발)

  • Choi, Hae-Won
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.15 no.1
    • /
    • pp.37-46
    • /
    • 2010
  • A Suffix Tree is an efficient data structure that exposes the internal structure of a string and allows efficient solutions to a wide range of complex string problems, in particular, in the area of computational biology. However, as the biological information explodes, it is impossible to construct the suffix trees in main memory. We should find an efficient technique to construct the trees in a secondary storage. In this paper, we present a method for constructing a suffix tree in a disk for large set of DNA strings using new index scheme. We also show a typical application example with a suffix tree in the disk.

A Proposal of a Shape Matching and Geo-referencing method for Building Features in Construction CAD Data to Digital Map using a Vertex Attributed String Matching algorithm (VASM 알고리즘을 이용한 건축물 CAD 자료의 수치지도 건물 객체와의 형상 정합 및 지도좌표 부여 방법의 제안)

  • Huh, Yong;Yu, Ki-Yun;Kim, Hyung-Tae
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.26 no.4
    • /
    • pp.387-396
    • /
    • 2008
  • An integration between construction CAD data and GIS data needs geo-referencing processes of construction CAD data whose coordinate systems are their own native or even unknown. Generally, these processes are based on manually detected conjugate-vertices. In this study, we proposed an semi-automated conjugate -vertices detection method for building features between construction CAD data and a digital map using a vertex attributed string matching algorithm. A geo-referencing function for construction CAD data based on the similarity transform could be derived with those conjugate-vertices. Using our proposed method, we overlaid geo-referenced CAD data to a digital map of the College of Engineering, Seoul National University and evaluated our method.

Segmentation Algorithm for Wafer ID using Active Multiple Templates Model

  • Ahn, In-Mo;Kang, Dong-Joong;Chung, Yoon-Tack
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.839-844
    • /
    • 2003
  • This paper presents a method to segment wafer ID marks on poor quality images under uncontrolled lighting conditions of the semiconductor process. The active multiple templates matching method is suggested to search ID areas on wafers and segment them into meaningful regions and it would have been impossible to recognize characters using general OCR algorithms. This active template model is designed by applying a snake model that is used for active contour tracking. Active multiple template model searches character areas and segments them into single characters optimally, tracking each character that can vary in a flexible manner according to string configurations. Applying active multiple templates, the optimization of the snake energy is done using Greedy algorithm, to maximize its efficiency by automatically controlling each template gap. These vary according to the configuration of character string. Experimental results using wafer images from real FA environment are presented.

  • PDF

Implementation of k-mer Analysis System for DNA Sequence Using String B-Tree (스트링 B-트리를 이용한 염기 서열의 k-mer 분석 시스템 구현)

  • 최정현;진희정;조환규
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04a
    • /
    • pp.748-750
    • /
    • 2001
  • 최근 Human Genome Project(HGP)에서 사람의 염기 서열의 초안이 발표되었다. 생물체의 염기 서열을 분석하는 방법은 매우 많은데, 그 중 하나가 k-mer 분석이다. k-mer는 유전자의 염기 서열내의 길이가 k인 연속된 염기 서열이다. k-mer 분석은 염기서열이 가진 k-mer들의 빈도의 분포나 대칭성 등을 탐색하는 것이다. 그런데 유전자의 염기 서열은 대용량 텍스트이고 k가 줄 때 기존의 온메모리 알고리즘으로는 처리가 불가능하므로 효율적인 자료구조와 알고리즘이 필요하다. 본 논문에서는 패턴 일치(pattern matching)에 적합하고 외부 메모리를 지원하는 스트링 B-트리(string B-tree)를 이용한 k-mer 분석 방법을 제시하고, 그것을 구현하였으며 몇 가지 실험 결과에 대하여 기술한다.

  • PDF

Parallel Computation For The Edit Distance Based On The Four-Russians' Algorithm (4-러시안 알고리즘 기반의 편집거리 병렬계산)

  • Kim, Young Ho;Jeong, Ju-Hui;Kang, Dae Woong;Sim, Jeong Seop
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.2
    • /
    • pp.67-74
    • /
    • 2013
  • Approximate string matching problems have been studied in diverse fields. Recently, fast approximate string matching algorithms are being used to reduce the time and costs for the next generation sequencing. To measure the amounts of errors between two strings, we use a distance function such as the edit distance. Given two strings X(|X| = m) and Y(|Y| = n) over an alphabet ${\Sigma}$, the edit distance between X and Y is the minimum number of edit operations to convert X into Y. The edit distance between X and Y can be computed using the well-known dynamic programming technique in O(mn) time and space. The edit distance also can be computed using the Four-Russians' algorithm whose preprocessing step runs in $O((3{\mid}{\Sigma}{\mid})^{2t}t^2)$ time and $O((3{\mid}{\Sigma}{\mid})^{2t}t)$ space and the computation step runs in O(mn/t) time and O(mn) space where t represents the size of the block. In this paper, we present a parallelized version of the computation step of the Four-Russians' algorithm. Our algorithm computes the edit distance between X and Y in O(m+n) time using m/t threads. Then we implemented both the sequential version and our parallelized version of the Four-Russians' algorithm using CUDA to compare the execution times. When t = 1 and t = 2, our algorithm runs about 10 times and 3 times faster than the sequential algorithm, respectively.

Development of Workbench for Analysis and Visualization of Whole Genome Sequence (전유전체(Whole gerlome) 서열 분석과 가시화를 위한 워크벤치 개발)

  • Choe, Jeong-Hyeon;Jin, Hui-Jeong;Kim, Cheol-Min;Jang, Cheol-Hun;Jo, Hwan-Gyu
    • The KIPS Transactions:PartA
    • /
    • v.9A no.3
    • /
    • pp.387-398
    • /
    • 2002
  • As whole genome sequences of many organisms have been revealed by small-scale genome projects, the intensive research on individual genes and their functions has been performed. However on-memory algorithms are inefficient to analysis of whole genome sequences, since the size of individual whole genome is from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce a workbench system for analysis and visualization of whole genome sequence using string B-tree that is suitable for analysis of huge data. This system consists of two parts : analysis query part and visualization part. Query system supports various transactions such as sequence search, k-occurrence, and k-mer analysis. Visualization system helps biological scientist to easily understand whole structure and specificity by many kinds of visualization such as whole genome sequence, annotation, CGR (Chaos Game Representation), k-mer, and RWP (Random Walk Plot). One can find the relations among organisms, predict the genes in a genome, and research on the function of junk DNA using our workbench.

Personalized Service Based on Context Awareness through User Emotional Perception in Mobile Environment (모바일 환경에서의 상황인식 기반 사용자 감성인지를 통한 개인화 서비스)

  • Kwon, Il-Kyoung;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.10 no.2
    • /
    • pp.287-292
    • /
    • 2012
  • In this paper, user personalized services through the emotion perception required to support location-based sensing data preprocessing techniques and emotion data preprocessing techniques is studied for user's emotion data building and preprocessing in V-A emotion model. For this purpose the granular context tree and string matching based emotion pattern matching techniques are used. In addition, context-aware and personalized recommendation services technique using probabilistic reasoning is studied for personalized services based on context awareness.

N-gram Based Robust Spoken Document Retrievals for Phoneme Recognition Errors (음소인식 오류에 강인한 N-gram 기반 음성 문서 검색)

  • Lee, Su-Jang;Park, Kyung-Mi;Oh, Yung-Hwan
    • MALSORI
    • /
    • no.67
    • /
    • pp.149-166
    • /
    • 2008
  • In spoken document retrievals (SDR), subword (typically phonemes) indexing term is used to avoid the out-of-vocabulary (OOV) problem. It makes the indexing and retrieval process independent from any vocabulary. It also requires a small corpus to train the acoustic model. However, subword indexing term approach has a major drawback. It shows higher word error rates than the large vocabulary continuous speech recognition (LVCSR) system. In this paper, we propose an probabilistic slot detection and n-gram based string matching method for phone based spoken document retrievals to overcome high error rates of phone recognizer. Experimental results have shown 9.25% relative improvement in the mean average precision (mAP) with 1.7 times speed up in comparison with the baseline system.

  • PDF