• Title/Summary/Keyword: Hangul Document Information

Search Result 32, Processing Time 0.021 seconds

Automatic translation system for hangul's romanization Based on the World Wide Web (웹 기반하의 국어의 로마자 전사 표기 자동 변환 시스템)

  • 김홍섭
    • Journal of the Korea Society of Computer and Information
    • /
    • v.7 no.4
    • /
    • pp.108-114
    • /
    • 2002
  • After automatic translation system for hangul's romanization based on the World Wide Web converting korean-word, sentence, document to Transliteration letters by applying algorithm based phonological principles. even though a user do not know the basic principles of the usage of Korean-to-Romanization notations, It refers to corresponding character table that has been currently adopted the authority's standard proposition for Korean-to-Romanization notation rule concurrently, add to make possible to convert a machinized code as well. It provides font for toggling Korean-English mode, insert-edit mode by assigning ASCII codes and Unicode are hardly used to them. This program could be made in C++ progamming language and Unified Modeling Language to implement various font. font-expanding and condensing. alternative printing.

  • PDF

Hangeul Stem Extraction Algorithm for Text Mining Based on Natural Language Processing (자연어 처리 기반 텍스트 마이닝을 위한 한글 어간 추출 알고리즘)

  • Choi, Ki-won;Choi, Seong-hun;Jo, Sang-hyeon;Kim, Hee-cheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.718-721
    • /
    • 2017
  • Natural language processing, which is the basis of text mining, differs depending on the type of language. Especially, Hangeul, which has relatively high freedom of expression compared to other languages, has various forms of words depending on the use of ending. The part that does not change in these various forms of words is called the stem. For effective text mining, it is essential to extract words and unify various types of words. Therefore, this paper proposes an extraction algorithm for Hangul word for effective text mining of Hangul document.

  • PDF

Design of automatic translation system for hangul's romanization Based on the World Wide Web (웹 기반하의 국어의 로마자 표기 자동 변환 시스템 설계)

  • 김홍섭
    • Journal of the Korea Society of Computer and Information
    • /
    • v.6 no.4
    • /
    • pp.6-11
    • /
    • 2001
  • After automatic translation system for hangul's romanization based on the World Wide Web converting korean-word. sentence, document to Transliteration letters by applying algorithm based phonological principles. even though a user do not know the basic principles of the usage of Korean-to-Romanization notations. It refers to corresponding character table that has been currently adopted the authority's standard proposition for Korean-to-Romanization notation rule concurrently, add to make possible to convert a machinized code as well. It Provides font for toggling Korean-English mode, insert-edit mode by assigning ASCII codes are hardly used to them. This program could be made in C++ programming language and Unified Modeling Language to implement various font. font-expanding and condensing, alternative printing.

  • PDF

Keyword Spotting on Hangul Document Images Using Character Feature Models (문자 별 특징 모델을 이용한 한글 문서 영상에서 키워드 검색)

  • Park, Sang-Cheol;Kim, Soo-Hyung;Choi, Deok-Jai
    • The KIPS Transactions:PartB
    • /
    • v.12B no.5 s.101
    • /
    • pp.521-526
    • /
    • 2005
  • In this Paper, we propose a keyword spotting system as an alternative to searching system for poor quality Korean document images and compare the Proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to remove the connectivity between adjacent characters and a character segmentation method by making the variance of character widths minimum. In the query creation step, feature vector for the query is constructed by a combination of a character model by typeface. In the matching step, word-to-word matching is applied base on a character-to-character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on the Korean document images, especially when the quality of documents is quite poor and point size is small.

An Image-based Word Matching Method for Large volume Printed Hangul Document Retrieval (대용량 인쇄 한글 문서 검색을 위한 영상 기반 단어 매칭 방법)

  • 진영범;오일석
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10b
    • /
    • pp.461-463
    • /
    • 2000
  • 기계 인쇄된 문서 영상에서 주제어를 탐색하는 문제는 여러 응용 분야에 필수적인 핵심 기술이지만 수작업 또는 OCR 소프트웨어를 이용하여 텍스트로 변환하는 방법은 많은 비용 때문에 한계를 가지고 있다. 요즘 영상 형태로 원문을 저장하는 경우가 많으므로 본 논문은 영상-기반 매칭을 통한 검색 방법을 채택하였다. 문자 또는 단어 매칭에서 가장 중요한 요소가 특징인데 본 논문에서는 디지털도서관과 같이 매칭 대상 단어가 수천만∼수십억에 달하는 대용량 한글 문서 검색에 이용될 수 있도록 비교적 간단히 추출할 수 있고 차원수 조절이 용이한 4방향 프로파일 특징을 이용하는 빠른 검색 방법을 제안한다. 실험결과 8-차원 정도의 간단한 특징으로도 의미 있는 검색 성능을 얻을 수 있음을 보였다.

  • PDF

A Study on the Integrated Coding of Image and Document Data (영상과 문자정보의 통합 부호화에 관한 연구)

  • Lee, Huen-Joo;Park, Goo-Man;Park, Kyu-Tae
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.7
    • /
    • pp.42-49
    • /
    • 1989
  • A new integrated coding method is proposed in this study for embedding the text information including Hangul into an image. A monochrome analog image may be quantized to a few leveled digital image and be displayed on bi-leveled output devices by using halftone processing techniques. Text data are embedded on each micro pattern. Based on this concept, the encoding and the decoding algorithm are implemented and experiments are performed. As a result, the average amount of the embedded text information is more than 8 bpp (bits per pixer) in this halftone processed image converted form a $64{\times}64$ image, i.e, corresponding to 2000 characters in Hangul, or 4000 characters in alphanumeral. using this algorithm, the integrated personal record management system is implemented.

  • PDF

Research on Designing Korean Emotional Dictionary using Intelligent Natural Language Crawling System in SNS (SNS대상의 지능형 자연어 수집, 처리 시스템 구현을 통한 한국형 감성사전 구축에 관한 연구)

  • Lee, Jong-Hwa
    • The Journal of Information Systems
    • /
    • v.29 no.3
    • /
    • pp.237-251
    • /
    • 2020
  • Purpose The research was studied the hierarchical Hangul emotion index by organizing all the emotions which SNS users are thinking. As a preliminary study by the researcher, the English-based Plutchick (1980)'s emotional standard was reinterpreted in Korean, and a hashtag with implicit meaning on SNS was studied. To build a multidimensional emotion dictionary and classify three-dimensional emotions, an emotion seed was selected for the composition of seven emotion sets, and an emotion word dictionary was constructed by collecting SNS hashtags derived from each emotion seed. We also want to explore the priority of each Hangul emotion index. Design/methodology/approach In the process of transforming the matrix through the vector process of words constituting the sentence, weights were extracted using TF-IDF (Term Frequency Inverse Document Frequency), and the dimension reduction technique of the matrix in the emotion set was NMF (Nonnegative Matrix Factorization) algorithm. The emotional dimension was solved by using the characteristic value of the emotional word. The cosine distance algorithm was used to measure the distance between vectors by measuring the similarity of emotion words in the emotion set. Findings Customer needs analysis is a force to read changes in emotions, and Korean emotion word research is the customer's needs. In addition, the ranking of the emotion words within the emotion set will be a special criterion for reading the depth of the emotion. The sentiment index study of this research believes that by providing companies with effective information for emotional marketing, new business opportunities will be expanded and valued. In addition, if the emotion dictionary is eventually connected to the emotional DNA of the product, it will be possible to define the "emotional DNA", which is a set of emotions that the product should have.

Hangul/English Automatic Toggle System for Document Input (문서 입력을 위한 한/영 자동 토글 시스템)

  • Choi, Y.W.;Seo, D.R.;Kim, Y.S.;Park, Y.H.;Lee, K.S.;Kim, C.H.
    • Annual Conference on Human and Language Technology
    • /
    • 1992.10a
    • /
    • pp.321-328
    • /
    • 1992
  • 컴퓨터의 보급과 활용이 늘어남에 따라 한글과 영문이 혼합된 문서 작성의 필요성이 계속 증대되고 있다. 컴퓨터 하드웨어 및 소프트웨어에서 사용자의 편리성과 효율성을 최대한 배려하는 "사용자 인터페이스"에 대한 연구가 증대되고 있어서 보다 편리한 한/영 혼용문서의 작성 방법이 요구되고 있다. 키보드를 통한 한/영 혼용문 입력시에 입력된 문서의 내용을 입력 오토마타 및 사전 수록 정보를 토대로 분석하여 자동적으로 한글 또는 영문으로 변환하는 자동토글 시스템 개발에 대해 설명하고, 현재의 시스템이 해결하지 못한 문제점 및 개선 방향을 기술한다.

  • PDF

A Development of the Test Set for Estimating the Retrieval Performance of an Automatic Indexer (자동색인기 성능시험을 위한 Test Set 개발)

  • 김성혁;서은경;이원규;김명철;김영환;김재군
    • Journal of the Korean Society for information Management
    • /
    • v.11 no.1
    • /
    • pp.81-102
    • /
    • 1994
  • Accordmg to the development of various information retneval system suitable for Korean database, many researchers have realized the need of R Test ColleAon which can be r d y used for evaluatmg a retneval system. Therefore, This study developed the TEST SET whch helps ob&vely evaluatmg the retrieval performance of an Hangul Automatic Indexer or Korean Information Retrieval System. The developed Test Set has four files such as: 1) Korean Document Set( * . all): 2) Natural Language Query Set(KTsetnq1): 3) Boolean Query Set(Ktset.bq1): 4) Query-Relevance Judgment Set ( KTsetrel) .

  • PDF

Construction of a Balanced Test Collection for Evaluation of Information Retrieval System (정보 검색 시스템 평가를 위한 균형 테스트 컬렉션 구축)

  • 맹성현;이석훈;이준호;이응봉;송사광
    • Journal of the Korean Society for information Management
    • /
    • v.16 no.2
    • /
    • pp.135-148
    • /
    • 1999
  • There has been some research in Korea on test collections for evaluation of information retrieval (IR) systems. The test collections constructed as an outcome from the research have provided a starting point and opportunities to test Korean IR systems in an objective manner. However, they are well short of the standard practice in the broader IR community in that they are small in their size and usually unbalanced in terms of the characteristics of the documents and the queries (such as the subject domains). In this article, we describe our research effort to alleviate this problem and the resulting test collection, called HANTEC (Hangul TEst Collection). HANTEC is balanced in terms of the subject domains, document lengths, and user types, and currently consists of 120,000 documents divided into three groups: general area, social science area and scienceltechnology area. The 30 queries in the collection are grouped into the same three areas in one dimension and into three distinct user groups in the other dimension.

  • PDF