Rule Based Document Conversion and Information Extraction on the Word Document (워드문서 콘텐츠의 사용자 XML 콘텐츠로의 변환 및 저장 시스템 개발)

  • Joo, Won-Kyun;Yang, Myung-Seok;Kim, Tae-Hyun;Lee, Min-Ho;Choi, Ki-Seok
    • Proceedings of the Korea Contents Association Conference
    • 2006.11a
    • pp.555-559
    • 2006
  • This paper will intend to contribute to extracting and storing various form of information on user interests by using structural rules user makes and XML-based word document converting techniques. The system named PPE consists of three essential element. One is converting element which converts word documents like HWP, DOC into XML documents, another is extracting element to prepare structural rules and extract concerned information from XML document by structural rules, and the other is storing element to make final XML document or store it into database system. For word document converting, we developed OCX based word converting daemon. Helping user to extracting information, we developed script language having native function/variable processing engine extended from XSLT. This system can be used in the area of constructing word document contents DB or providing various information service based on RAW word documents. We really applied it to project management system and project result management system.

Composite Document Object Retrieval and Searching System-[IN2] DOR (복합문서 개체 검색 시스템- [IN2] DOR)

  • Ahn, Tae-Sung;Yim, Joong-Su;Kim, Myung-Hoon;Ahn, Woo-Ram;Lee, Kyung-Il
    • Annual Conference on Human and Language Technology
    • 2003.10d
    • pp.113-118
    • 2003
  • 기존 문서 검색 시스템의 경우 단순히 문서 내에서 텍스트를 추출한 후 그 텍스트를 색인, 검색하는 형태를 가지고 있었다. 본 논문에서는 MS Word, Excel, HWP 등 다양한 형태의 문서에서 텍스트, 표, 이미지, 차트, 동영상 등의 문서 개체를 분석, 색인하고 이를 검색하는 시스템의 개발 방법을 제외하였다. 제안된 시스템은 문서의 내부 자료 구조를 CDML(Composite Document Markup Language)로 변환하고, 이를 색인, 저장함으로 기존의 전문 검색 시스템의 한계를 효과적으로 극복했으며, 문서 내의 검색 대상 개체로 자동 이동하고 하일라이팅 시키는 기술을 구현함으로 사용자 편익성을 높였다. 개발된 시스템의 성능을 평가한 결과, 다양한 문서 형식에 대해 평균 97% 이상의 CDML변환 성공률과 개체 검색 성공률을 보였으며, 이진 파일에서 직접 개체를 추출함으로 매우 높은 분석 및 색인 속도가 달성되었음을 확인할 수 있었다. 본 논문에서 소개된 새로운 패러다임의 문서 검색 솔루션을 통해 다양한 기술적 상업적 파급 효과가 기대되고 있다.

Automatic Reading System for On-off Type DNA Chip

  • Ryu, Mun-Ho;Kim, Jong-Dae;Kim, Jong-Won
    • Journal of Information Processing Systems
    • v.2 no.3 s.4
    • pp.189-193
    • 2006
  • In this study we propose an automatic reading system for diagnostic DNA chips. We define a general specification for an automatic reading system and propose a possible implementation method. The proposed system performs the whole reading process automatically without any user intervention, covering image acquisition, image analysis, and report generation. We applied the system for the automatic report generation of a commercialized DNA chip for cervical cancer detection. The fluorescence image of the hybridization result was acquired with a $GenePix^{TM}$ scanner using its library running in HTML pages. The processing of the acquired image and the report generation were executed by a component object module programmed with Microsoft Visual C++ 6.0. To generate the report document, we made an HWP 2002 document template with marker strings that were supposed to be searched and replaced with the corresponding information such as patient information and diagnosis results. The proposed system generates the report document by reading the template and changing the marker strings with the resultant contents. The system is expected to facilitate the usage of a diagnostic DNA chip for mass screening by the automation of a conventional manual reading process, shortening its processing time, and quantifying the reading criteria.

Validation and the Format of the Electronic Record Digital Component Technology Research (전자기록 디지털컴포넌트의 포맷과 유효성 검증 기술 연구)

  • Lee, Jae-Young;Choi, Joo-Ho
    • Journal of Korean Society of Archives and Records Management
    • v.12 no.3
    • pp.29-46
    • 2012
  • Electronic records are merely series of bits without understanding the formats of content files. There are numerous types of formats and also possibilities of extinction. For long term preservation, it is essential to understand and manage formats. In addition to managing format itself, accurate information on the format needs to be stored for electronic records. In this study, various types of electronic files, without checking with the naked eye, has developed a tool to extract the header information in the format of electronic files with the file extension validation tool to compare format and validate digital component.

Spectral Characteristics of Multiwavelength-Switchable First-Order Fiber Flexible Filter based on Polarization-Diversity Loop (편광상이 고리 형태의 다파장 스위칭 가능한 1차 광섬유 유연 필터의 스펙트럼 특성)

  • Park, Kyoungsoo;Kim, Youngho;Lee, Yong Wook
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • v.28 no.8
    • pp.6-13
    • 2014
  • In this paper, a multiwavelength-switchable first-order fiber flexible filter is newly proposed, which is based on a polarization-diversity loop. The proposed filter consists of a polarization beam splitter, three half-wave plates(HWPs), and two high birefringent fibers(HBFs). The proposed filter has a good flexibility in adjusting relative angular difference between the principal axes of two HBFs by inserting an HWP between two HBFs. The first-order flat-top or narrow band transmission spectra and the zeroth-order transmission spectra, which had a channel spacing of ~0.8nm, could be obtained by controlling the three HWPs, and, in particular, each of them could also be interleaved. In addition, zeroth-order transmission spectra with a channel spacing of ~0.8nm could be flexibly converted into those with a channel spacing of ~0.4nm through the control of three HWPs, and also be interleaved. The transmission characteristics of the proposed filter was theoretically analyzed and experimentally verified.

Data Input and Output of Unstructured Data of Large Capacity (대용량 비정형 데이터 자료 입력 및 출력)

  • Sim, Kyu-Cheol;Kang, Byung-Jun;Kim, Kyung-Hwan;Jung, Hoe-Kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • 2013.05a
    • pp.613-615
    • 2013
  • Request to provide a service to XML word file recently has been increasing. In this paper, it is converted to an XML file data input (HWP, MS-Office) a Word file, stored in a database by extracting data directly input to the word processor user creates an XML mapping file I to provide a system that. This can be retrieved from the database the required data to previously created forms word processor, to generate a Word file from the application program a word processing document.

Design and Implementation of Standard Document Management System (XML.을 적용한 표준 문서 관리 시스템의 설계 및 구현)

  • 이준섭;유정연;권석훈;나재열;이규철;구경철;박기식;박치항
    • Journal of the Korean Society for Library and Information Science
    • /
    • /
    • /
  • The Request of the information exchange is increasing because of the advanced rapid science and technology. But a different system environment has occurred many problems on the information exchange. The information exchange on based XML is a solution to the problem. It takes effect in the standard document management application that is make standard document to cooperate with many researchers mutually. This paper is design and implementation of system model for efficient exchange, store, search and manage document on based XML document in established course of standard document.

A Study of Document Format for Effective Transmission on the Internet Environments (인터넷환경하에서 효율적 전송을 위한 문서형식에 관한 연구)

  • Cho, Hyun-Yang;Choi, Hung-Sik
    • Journal of the Korean Society for Library and Information Science
    • v.34 no.1
    • pp.229-242
    • 2000
  • Today, we are confronted with huge amount of data which contain complex documents, images and multimedia contents. Therefore a new method is needed to analyze and manage the mathematical expressions and extract new Information from them. It is more and more important to manage the document files including mathematical expressions which are generated by general-purpose word processors. Three major word processors are shared over 90% of domestic market. These are HWP, TeX and MS word. Due to the progress of Internet and digital library, it is necessary to develop a system to manage the document file containing mathematical expressions over the Web.

Rule Based Document Conversion and Information Extraction on the Word Document (전자문서의 XML 문서로의 변환 및 저장 시스템)

  • Joo Won-Kyun;Yang Myung-Seok;Kim Tae-Hyun;Lee Min-Ho;Choi Ki-Seok
    • Proceedings of the Korean Information Science Society Conference
    • 2006.06c
    • pp.106-108
    • 2006
  • 본 논문은 HWP, DOC와 같은 전자 문서에서 사용자가 제공한 구조적인 규칙과 XML 기반 전자 문서 변환 기법을 이용함으로써, 사용자의 관심 영역에 해당하는 다양한 형태(표, 리스트 등)의 정보를 효과적으로 추출(변환)하여 저장하기 위한 방법에 관한 것이다. 본 논문에서 제시한 시스템은 3가지의 중요한 요소들로 구성되어 있는데, 1)전자문서의 원시 XML 문서로의 변환 방법 2)XML 기반 구조적인 규칙과 작성된 규칙을 이용하여 원시 XML 문서에서 정보를 추출(변환)하는 방법, 3)추출 된 정보에서 최종 XML을 생성하거나 DB에 저장하는 방법이 그것이다. 전자문서의 변환을 위해서 독립적으로 동작하는OCX 기반의 전자문서 변환 데몬(Daemon)을 개발하였고, 사용자의 정보 추출(변환)과정을 돕기 위해서 XSLT를 확장한 형태의 스크립트 언어를 개발하였다. 스크립트 언어는 비교적 간단한 문법 구조를 가지고 있고, 데이터 처리를 위한 자체 정의 함수와 변수를 사용한다. 추출된 정보는 원하는 형태의 데이터 포멧으로 생성하거나 DB에 저장할 수 있다. 본 시스템은 전자 문서 원문 정보에 대한 데이터베이스 구축 및 서비스의 제공, 혹은 구축된 데이터베이스를 이용하여 다양한 현황 통계를 제공하는 분야에서 유용하게 사용할 수 있다. 실제로 연구과제관리시스템과 성과정보시스템에 적용하여 그 성과를 입증하였다.

An Optical Configuration for the Normally Black Twisted Nematic Liquid Crystal Cell (꼬인 네마틱 액정 셀의 Normally Black 모드 광학설계)

  • Kim, Ki-Han;Baek, Jong-In;Kim, Jae-Chang;Yoon, Tae-Hoon
    • Korean Journal of Optics and Photonics
    • v.19 no.1
    • pp.48-53
    • 2008
  • We propose an optical configuration to compensate dispersion characteristics in the dark state of normally black twisted nematic (NB-TN) liquid crystal display (LCD). We employed a half-wave plate (HWP) and a +A plate to achieve the superior dark state. By using the parameter space diagram (PSD) method, we could obtain the optimum values of parameters and the high contrast-ratio over 500 : 1 could be obtained. Furthermore, excellent dispersion characteristics were also obtained in the bright state. We could confirm the performance of the proposed structure using both the numerical calculations and the experiments.