• 제목/요약/키워드: Document Order

검색결과 777건 처리시간 0.028초

문서 영상의 영역 분류와 회전각 검출 (A Block Classification and Rotation Angle Extraction for Document Image)

  • 모문정;김욱현
    • 정보처리학회논문지B
    • /
    • 제9B권4호
    • /
    • pp.509-516
    • /
    • 2002
  • 본 논문에서는 그림, 글자, 표, 직선 등과 같은 다양한 정보를 포함하는 문서 영상 인식에 대한 효율적인 알고리즘을 제안한다. 이 시스템은 문서영상의 기울짐을 보정하기 위한 회전각검출 단계, 불필요한 배경영역을 제거하는 단계, 문서영상에 내재된 각 구성요소를 검출하는 분류 단계로 구성된다. 알고리즘은 문서의 기울어짐에 의해서 발생되는 오류를 최소화하기 위한 회전각 검출과정과 검출된 회전각을 기반으로 문서를 보정하는 전처리단계를 수행한다. 입력된 문서영상의 수평성분과 수직성분만을 이용하여 회전각을 검출하고, 문서의 구성요소 검출과정에서 불필요한 배경영역을 제거함으로써 계산시간을 최소화하였다. 그리고 영상에 내재된 그림영역, 글자영역, 표영역, 직선영역 둥의 다양한 구성요소를 분류한다. 제안한 문서 인식 시스템의 성능 평가를 위해서 다양한 문서영상에 제안한 방법을 적용하고 성공적인 결과를 보인다.

상호대차에 의한 원문복사서비스의 도서관 면책에 관한 연구 (A Study on Library Exemption of Document Delivery Service by Interlibrary Loan)

  • 홍재현
    • 정보관리학회지
    • /
    • 제22권1호
    • /
    • pp.21-45
    • /
    • 2005
  • 도서관 상호대차에 의한 원문복사서비스는 정보를 공동 이용하여 이용자의 정보요구를 충족시켜 주기 위한 진보된 형태의 서비스이다. 현재 상호대차에 의한 원문복사서비스의 면책 적용에 관해서는 해석이 분분하다. 본 연구의 목적은 Fax 및 Ariel 시스템을 이용한 원문복사의 저작권 문제에 관한 법적 해결 방안을 제시하는 데 있다. 이를 위하여 상호대차에 의한 원문복사서비스에 관한 국제적인 면책 적용의 동향을 검토하였다. 우리 현행저작권법에 입각한 원문복사서비스의 면책 적용에 대한 해석들을 분석하였고 문제점을 지적하였다. 이러한 분석 결과를 기초로 하여, 원문복사서비스와 관련한 현행 저작권법상의 도서관 면책 규정의 문제점을 해결하기 위한 법적 개정방안과 그 조문을 구체적으로 제시하였다. 따라서 본 연구가 제시한 법적 개정 방안은 2005년 또는 향후 도서관 면책 규정의 개정을 위한 기초 자료로 활용될 수 있을 것이다.

전자정부내 의미기반 기술 도입에 따른 기능 및 정책 연구 (Research on Function and Policy for e-Government System using Semantic Technology)

  • 고광섭;장영철;이창훈
    • 한국디지털정책학회:학술대회논문집
    • /
    • 한국디지털정책학회 2007년도 춘계학술대회
    • /
    • pp.79-87
    • /
    • 2007
  • This paper aims to offer a solution based on semantic document classification to improve e-Government utilization and efficiency for people using their own information retrieval system and linguistic expression Generally, semantic document classification method is an approach that classifies documents based on the diverse relationships between keywords in a document without fully describing hierarchial concepts between keywords. Our approach considers the deep meanings within the context of the document and radically enhances the information retrieval performance. Concept Weight Document Classification(CoWDC) method, which goes beyond using exist ing keyword and simple thesaurus/ontology methods by fully considering the concept hierarchy of various concepts is proposed, experimented, and evaluated. With the recognition that in order to verify the superiority of the semantic retrieval technology through test results of the CoWDC and efficiently integrate it into the e-Government, creation of a thesaurus, management of the operating system, expansion of the knowledge base and improvements in search service and accuracy at the national level were needed.

  • PDF

Document Classification Model Using Web Documents for Balancing Training Corpus Size per Category

  • Park, So-Young;Chang, Juno;Kihl, Taesuk
    • Journal of information and communication convergence engineering
    • /
    • 제11권4호
    • /
    • pp.268-273
    • /
    • 2013
  • In this paper, we propose a document classification model using Web documents as a part of the training corpus in order to resolve the imbalance of the training corpus size per category. For the purpose of retrieving the Web documents closely related to each category, the proposed document classification model calculates the matching score between word features and each category, and generates a Web search query by combining the higher-ranked word features and the category title. Then, the proposed document classification model sends each combined query to the open application programming interface of the Web search engine, and receives the snippet results retrieved from the Web search engine. Finally, the proposed document classification model adds these snippet results as Web documents to the training corpus. Experimental results show that the method that considers the balance of the training corpus size per category exhibits better performance in some categories with small training sets.

조선총독부 공문서(公文書) 제도 -기안(起案)에서 성책(成冊)까지의 과정을 중심으로- (The Chosun Governor General Office's Administration regarding Official Documents)

  • 이승일
    • 기록학연구
    • /
    • 제9호
    • /
    • pp.3-40
    • /
    • 2004
  • In this article, the elements usually included in the official documents issued by the Chosun Governor General office, the process of a certain document being put together and legally authorized, and its path of circulation and preservation are all examined. In order to create an official document of the Governor General office with legal authorization, a draft of a bill had to go through several discussions and a subsequent agreement before it was finally approved. Personnels involved in the discussion stage had the authority to ask for modifications and retouching of the draft, and the modifying process were all recorded in order to make clear who was responsible for a certain change or who objected to what at any given stage of the process. The approved version of an official document was called the 'Completed one(成案), and it was issued after the contents were turned into a fair copy by the office that originated the draft in the first place. With the original finalized version left in custody of that office, the fair copy was handed over to the Document department which was responsible for issuing outgoing documents. After the document was issued and the contained orders were carried out, the originally involved offices began to classify the documents according to their own standards and measures for safekeeping, but it was the Document department that was mainly responsible for document preservation. The Document department classified the documents according to related offices, nature of the documents(편찬류별), and most suitable preservation methods(보존종별). The documents were made into books, and documents to be permanently destroyed were handed over to the Account office where they would be demolished. The manners of document processing of the Chosun Governor General office was in fact a modified version of the manners of the Japanese government. Modifications were made so that the process would be more suitable to the situations and environment of the Chosun society. The office's managing process was inherited by the Chosun government after the Liberation, and cast a significant impact upon the document managing manners of the Korean authorities. The official document administration of the Chosun Governor General office marked both the beginning of the colony document administration, and also the beginning of a modernized document managing system.

XML Repository System Using DBMS and IRS

  • Kang, Hyung-Il;Yoo, Jae-Soo;Lee, Byoung-Yup
    • International Journal of Contents
    • /
    • 제3권3호
    • /
    • pp.6-14
    • /
    • 2007
  • In this paper, we design and implement a XML Repository System(XRS) that exploits the advantages of DBMSs and IRSs. Our scheme uses BRS to support full text indexing and content-based queries efficiently, and ORACLE to store XML documents, multimedia data, DTD and structure information. We design databases to manage XML documents including audio, video, images as well as text. We employ the non-composition model when storing XML documents into ORACLE. We represent structured information as ETID(Element Type Id), SORD(Sibling ORDer) and SSORD(Same Sibling ORDer). ETID is a unique value assigned to each element of DTD. SORD and SSORD represent an order information between sibling nodes and an order information among the sibling nodes with the same element respectively. In order to show superiority of our XRS, we perform various experiments in terms of the document loading time, document extracting time and contents retrieval time. It is shown through experiments that our XRS outperforms the existing XML document management systems. We also show that it supports various types of queries through performance experiments.

기계 조립품 정보의 표현을 위한 XML 기반 공용문서 구조 개발 (Development of Common Document Structure based on XML for Representing Mechanical Part Assembly Information)

  • 정태형;박승현;윤성원
    • 한국공작기계학회:학술대회논문집
    • /
    • 한국공작기계학회 2002년도 추계학술대회 논문집
    • /
    • pp.359-364
    • /
    • 2002
  • In engineering design environment it is hard to link design data and system because the types of them are disparate. Therefore, the importance of metadata has increased. Some researches have been executed to develop metadata. But they cannot interact with other metadata and are difficult to extend. The purpose of this paper is to develop a common metadata structure which represents the general information of mechanical part assembly using XML, and to use it as base documents in order to integrate design data and systems. It is composed of part and assembly documents. Part document represents the information of a part independently to part type. Assembly document represents the location of part documents which compose an assembly. Common documents can be used as a broker between design data and systems and improve interpretability and reusability of document. We applied the developed common document structure to 2-stage spur gear drive.

  • PDF

Deep Learning Document Analysis System Based on Keyword Frequency and Section Centrality Analysis

  • Lee, Jongwon;Wu, Guanchen;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • 제19권1호
    • /
    • pp.48-53
    • /
    • 2021
  • Herein, we propose a document analysis system that analyzes papers or reports transformed into XML(Extensible Markup Language) format. It reads the document specified by the user, extracts keywords from the document, and compares the frequency of keywords to extract the top-three keywords. It maintains the order of the paragraphs containing the keywords and removes duplicated paragraphs. The frequency of the top-three keywords in the extracted paragraphs is re-verified, and the paragraphs are partitioned into 10 sections. Subsequently, the importance of the relevant areas is calculated and compared. By notifying the user of areas with the highest frequency and areas with higher importance than the average frequency, the user can read only the main content without reading all the contents. In addition, the number of paragraphs extracted through the deep learning model and the number of paragraphs in a section of high importance are predicted.

Feature Extraction Method for the Character Recognition of the Low Resolution Document

  • Kim, Dae-Hak;Cheong, Hyoung-Chul
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권3호
    • /
    • pp.525-533
    • /
    • 2003
  • In this paper we introduce some existing preprocessing algorithm for character recognition and consider feature extraction method for the recognition of low resolution document. Image recognition of low resolution document including fax images can be frequently misclassified due to the blurring effect, slope effect, noise and so on. In order to overcome these difficulties in the character recognition we considered a mesh feature extraction and contour direction code feature. System for automatic character recognition were suggested.

  • PDF

구조적 문서의 효율적인 구조 질의 처리 및 검색을 위한 알고리즘의 설계 (Design of Algorithm for Efficient Retrieve Pure Structure-Based Query Processing and Retrieve in Structured Document)

  • 김현주
    • 한국컴퓨터산업학회논문지
    • /
    • 제2권8호
    • /
    • pp.1089-1098
    • /
    • 2001
  • 구조적 문서가 가지는 구조 정보는 문서로의 다양한 접근경로를 나타내는데 사용될 수 있다. 이러한 구조적 문서가 가지는 구조 정보를 활용하기 위해서는 문서의 구조에 대해서 색인을 해야 한다. 이때 내용색인뿐만 아니라 문서마다 구조정보를 저장하므로 색인에 필요한 공간이 커진다. 그러므로, 색인공간 오버헤드를 최소화시키면서도 엘리먼트간의 포함관계나 순서 등 문서의 순수 구조에 바탕을 둔 순수 구조 질의를 처리할 수 있어야 한다. 본 논문에서는 색인공간 오버헤드를 최소화하면서도 여러 유형의 구조 관련 질의를 효율적으로 처리할 수 있는 구조 색인 구조와 GDIT자료구조를 제시한다. 제안하는 구조 색인 구조는 문서에 존재하는 가장 하위 엘리먼트만을 색인대상으로 하며, 검색엘리먼트가 존재하는 문서개수에 영향을 받지 않는다. 그리고 이 색인구조를 바탕으로 순수 구조에 대한 질의 처리과정을 보이고 색인공간에 대해 그 성능을 평가한다. 제안된 색인 구조는 GDIT개념[2]에 바탕을 두고, GDIT기반의 색인기법을 사용한다.

  • PDF