• Title/Summary/Keyword: Hangul text

Search Result 96, Processing Time 0.029 seconds

A Hangul Document Classification System using Case-based Reasoning (사례기반 추론을 이용한 한글 문서분류 시스템)

  • Lee, Jae-Sik;Lee, Jong-Woon
    • Asia pacific journal of information systems
    • /
    • v.12 no.2
    • /
    • pp.179-195
    • /
    • 2002
  • In this research, we developed an efficient Hangul document classification system for text mining. We mean 'efficient' by maintaining an acceptable classification performance while taking shorter computing time. In our system, given a query document, k documents are first retrieved from the document case base using the k-nearest neighbor technique, which is the main algorithm of case-based reasoning. Then, TFIDF method, which is the traditional vector model in information retrieval technique, is applied to the query document and the k retrieved documents to classify the query document. We call this procedure 'CB_TFIDF' method. The result of our research showed that the classification accuracy of CB_TFIDF was similar to that of traditional TFIDF method. However, the average time for classifying one document decreased remarkably.

Matching Algorithm for Hangul Recognition Based on PDA

  • Kim Hyeong-Gyun;Choi Gwang-Mi
    • Journal of information and communication convergence engineering
    • /
    • v.2 no.3
    • /
    • pp.161-166
    • /
    • 2004
  • Electronic Ink is a stored data in the form of the handwritten text or the script without converting it into ASCII by handwritten recognition on the pen-based computers and Personal Digital Assistants(PDA) for supporting natural and convenient data input. One of the most important issue is to search the electronic ink in order to use it. We proposed and implemented a script matching algorithm for the electronic ink. Proposed matching algorithm separated the input stroke into a set of primitive stroke using the curvature of the stroke curve. After determining the type of separated strokes, it produced a stroke feature vector. And then it calculated the distance between the stroke feature vector of input strokes and one of strokes in the database using the dynamic programming technique.

The Study on Lossy and Lossless Compression of Binary Hangul Textual Images by Pattern Matching (패턴매칭에 의한 이진 한글문서의 유.무손실 압축에 관한 연구)

  • 김영태;고형화
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.22 no.4
    • /
    • pp.726-736
    • /
    • 1997
  • The textual image compression by pattern matching is a coding scheme that exploits the correlations between patterns. When we compress the Hangul (Korean character) text by patern matching, the collerations between patterns may decrease due to randoem contacts between phonemes. Therefore in this paper we separate connected phonemes to exploit effectively the corrlation between patterns by inducting the amtch. In the process of sequation, we decide whether the patterns have vowel component or not, and then vowels connected with consonant ae separated. When we compare the proposed algorithm with the existing algorith, the compression ratio is increased by 1.3%-3.0% than PMS[5] in lossy mode, by 3.4%-9.1% in lossless mode than that of SPM[7] which is submitted to standard committe for second generation binary compression algorithm.

  • PDF

A Study of Automatic Indexing Technique based on Logical Structure of SGML Hangul Document (SGML 한글문서의 논리적 구조에 근거한 색인기법에 관한 연구)

  • 유석종
    • Journal of the Korean Society for information Management
    • /
    • v.12 no.2
    • /
    • pp.85-101
    • /
    • 1995
  • Conventional indexing sytstems support only full-text indexing method for electronic documents and do not use logical structure of documents in retrieval. Most electronic documents are in different formats depending on various systems. Also, they only indicate physical style of the document without considering any logical structure. Thus, in the effort to standardize the exchange of documents. IS0 developed SGML(Stadard Generalized Markup Language) which contains information about logical structure of the documents. In this paper, to resolve the disadvantages of full-text indexing method and to use standard document format. indexing system for SGML document is designed and implemented. In this system, user can assign indexing domain on elements, thus the logical structure of document is reflected in retrieving information. Various retrieval methods can be implemented by using the structural information of the document. In addition, automatic indexing for SGML Hangul document is supported in this system

  • PDF

A Study on the Integrated Coding of Image and Document Data (영상과 문자정보의 통합 부호화에 관한 연구)

  • Lee, Huen-Joo;Park, Goo-Man;Park, Kyu-Tae
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.7
    • /
    • pp.42-49
    • /
    • 1989
  • A new integrated coding method is proposed in this study for embedding the text information including Hangul into an image. A monochrome analog image may be quantized to a few leveled digital image and be displayed on bi-leveled output devices by using halftone processing techniques. Text data are embedded on each micro pattern. Based on this concept, the encoding and the decoding algorithm are implemented and experiments are performed. As a result, the average amount of the embedded text information is more than 8 bpp (bits per pixer) in this halftone processed image converted form a $64{\times}64$ image, i.e, corresponding to 2000 characters in Hangul, or 4000 characters in alphanumeral. using this algorithm, the integrated personal record management system is implemented.

  • PDF

Recognition of Printed Hangul Text Using Circular Pattern Vectors (원형 패턴 벡터를 이용한 인쇄체 한글 인식)

  • Jeong, Ji-Ho;Choe, Tae-Yeong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.3
    • /
    • pp.269-281
    • /
    • 2001
  • This thesis deals with a novel font-dependent Hangul recognition algorithm invariant to position translation, scaling, and rotation using circular pattern vectors. The proposed algorithm removes noise from input letters using binary morphology and generates the circular pattern vectors. The generated circular pattern vectors represent spatial distributions on several concentric circles from the center of gravity in a given letter. Then the algorithm selects the letter minimizing the distance between the reference vectors and the generated circular pattern vectors. In order to estimate performances of the proposed algorithm, the completed Batang Hangul 2,350 letters were used as test images with scaling and rotational transformations. Experimental results show that the proposed algorithm are better than conventional algorithm using the ring projection in the recognition rates of Hangul letters with scaling and rotational transformation.

  • PDF

Knowledge based Text to Facial Sequence Image System for Interaction of Lecturer and Learner in Cyber Universities (가상대학에서 교수자와 학습자간 상호작용을 위한 지식기반형 문자-얼굴동영상 변환 시스템)

  • Kim, Hyoung-Geun;Park, Chul-Ha
    • The KIPS Transactions:PartB
    • /
    • v.15B no.3
    • /
    • pp.179-188
    • /
    • 2008
  • In this paper, knowledge based text to facial sequence image system for interaction of lecturer and learner in cyber universities is studied. The system is defined by the synthesis of facial sequence image which is synchronized the lip according to the text information based on grammatical characteristic of hangul. For the implementation of the system, the transformation method that the text information is transformed into the phoneme code, the deformation rules of mouse shape which can be changed according to the code of phonemes, and the synthesis method of facial sequence image by using deformation rules of mouse shape are proposed. In the proposed method, all syllables of hangul are represented 10 principal mouse shape and 78 compound mouse shape according to the pronunciation characteristics of the basic consonants and vowels, and the characteristics of the articulation rules, respectively. To synthesize the real time facial sequence image able to realize the PC, the 88 mouth shape stored data base are used without the synthesis of mouse shape in each frame. To verify the validity of the proposed method the various synthesis of facial sequence image transformed from the text information is accomplished, and the system that can be applied the PC is implemented using the proposed method.

Design of the Signature File Method for Hangul Text (한글 텍스트를 위한 요약 화일 기법의 설계)

  • Chang, Jae-Woo
    • Annual Conference on Human and Language Technology
    • /
    • 1991.10a
    • /
    • pp.247-256
    • /
    • 1991
  • 텍스트를 이용하는 새로운 데이타베이스 응용을 효율적으로 지원하기 위해 여러 가지 텍스트 검색 기법이 연구되었으며, 이러한 연구 가운데 효율적인 검색 기법으로 요약 화일 (signature file) 방법이 제안되었다. 그러나 이러한 연구는 모두 영문 텍스트를 위한 연구이며, 한글 텍스트를 위한 요약 화일 기법에 관한 연구는 거의 전무한 상태이다. 따라서 본 논문에서는 한글의 특성에 맞는 요약 화일 기법을 설계하고 아울러 제안한 기법의 실용성과 타당성을 검토한다.

  • PDF

DES Algorithm and its Implementation in School Mathematics Education (DES를 이용한 암호의 이해와 활용 및 DES에서 한글 구현)

  • 정상조;박중수
    • Journal of the Korean School Mathematics Society
    • /
    • v.6 no.2
    • /
    • pp.101-115
    • /
    • 2003
  • DES is a very simple crytosystem that uses only permutation in mathematics. Recently AES is standardized based on DES. In this paper we introduce DES and its implementation. In particular, we tried to process Hangul in DES. This paper may be used in school mathematics education.

  • PDF

Hangul Text Detection using Text Corner Edge Feature Analysis in Natural Scene Images (자연영상에서 코너 에지 특징 분석방법을 이용한 한글 텍스트 검출기법에 관한 연구)

  • Park Jong-Cheon;Kwon Kyo-Hyun;Jun Byung-Min
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2005.11a
    • /
    • pp.379-383
    • /
    • 2005
  • 본 연구에서는 자연 이미지에서 한글 텍스트가 갖고 있는 에지 코너 특징을 이용한 한글 텍스트 검출방법을 제안한다. 자연영상으로부터 에지를 검출하고, 검출된 에지를 20종류의 에지 구조 성분을 갖는 에지 맵을 생성한다. 생성된 에지 맵에서 한글 텍스트 특징 갖는 특징들을 조합하여 모두 8가지의 텍스트 영역 후보 특징을 추출한다. 추출된 텍스트 영역의 특징을 수평 및 수직방향으로 검사하여 텍스트의 시작 라인과 끝라인을 검출하여 텍스트 영역의 수평좌표를 구한다. 추출된 텍스트 후보 영역에서 최종적으로 텍스트 영역을 결정한다. 제안한 방법은 다양한 종류의 자연 이미지에서 텍스트 영역을 검출에서 좋은 성능을 나타냈다.

  • PDF