• Title/Summary/Keyword: document classification

Search Result 443, Processing Time 0.029 seconds

A Study on the Theory and Historical Development of Official Document Classification Scheme in Korea - Since Chosun Dynasty to Current Korea Government - (문서분류의 이론과 변천에 관한 연구 - 조선조이후 현행 '정부공문서분류'까지 -)

  • Choe, Jung-Tai;Lee, Ju-Yeon
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.3 no.2
    • /
    • pp.1-33
    • /
    • 2003
  • This study is to aim on the theory of document classification system and historical development of official document classification scheme since Chosun dynasty to Republic of Korea. We have been new version of classification scheme 'Document Classification Standard' is scheduled in 2004, though there are many fundamental problems in governmental agencies and record centers. Thus new 'Document Classification Standard' should be make discussion and inquire.

A Document Classification System Using Modified ECCD and Category Weight for each Document (Modified ECCD 및 문서별 범주 가중치를 이용한 문서 분류 시스템)

  • Han, Chung-Seok;Park, Sang-Yong;Lee, Soo-Won
    • The KIPS Transactions:PartB
    • /
    • v.19B no.4
    • /
    • pp.237-242
    • /
    • 2012
  • Web information service needs a document classification system for efficient management and conveniently searches. Existing document classification systems have a problem of low accuracy in classification, if a few number of feature words is selected in documents or if the number of documents that belong to a specific category is excessively large. To solve this problem, we propose a document classification system using 'Modified ECCD' feature selection method and 'Category Weight for each Document'. Experimental results show that the 'Modified ECCD' feature selection method has higher accuracy in classification than ${\chi}^2$ and the ECCD method. Moreover, combining the 'Category Weight for each Document' feature value and 'Modified ECCD' feature selection method results better accuracy in classification.

Korean Document Classification using Characteristics of Word Information

  • Kim, Seok-Ki;Han, Kyung-Soo;Ahn, Jeong-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.167-175
    • /
    • 2003
  • In document classification, target of analysis is not document itself but words appeared in the document. Word information, therefore, is a significant factor in document classification. In this study, we are dealing with the classification of Korean document based on words and feature vectors. First, we present the performance of document classification using nouns and keywords. Second, we compare to the results for the size of feature vectors.

  • PDF

Document Classification Model Using Web Documents for Balancing Training Corpus Size per Category

  • Park, So-Young;Chang, Juno;Kihl, Taesuk
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.268-273
    • /
    • 2013
  • In this paper, we propose a document classification model using Web documents as a part of the training corpus in order to resolve the imbalance of the training corpus size per category. For the purpose of retrieving the Web documents closely related to each category, the proposed document classification model calculates the matching score between word features and each category, and generates a Web search query by combining the higher-ranked word features and the category title. Then, the proposed document classification model sends each combined query to the open application programming interface of the Web search engine, and receives the snippet results retrieved from the Web search engine. Finally, the proposed document classification model adds these snippet results as Web documents to the training corpus. Experimental results show that the method that considers the balance of the training corpus size per category exhibits better performance in some categories with small training sets.

Document Image Segmentation and Classification using Texture Features and Structural Information (텍스쳐 특징과 구조적인 정보를 이용한 문서 영상의 분할 및 분류)

  • Park, Kun-Hye;Kim, Bo-Ram;Kim, Wook-Hyun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.11 no.3
    • /
    • pp.215-220
    • /
    • 2010
  • In this paper, we propose a new texture-based page segmentation and classification method in which table region, background region, image region and text region in a given document image are automatically identified. The proposed method for document images consists of two stages, document segmentation and contents classification. In the first stage, we segment the document image, and then, we classify contents of document in the second stage. The proposed classification method is based on a texture analysis. Each contents in the document are considered as regions with different textures. Thus the problem of classification contents of document can be posed as a texture segmentation and analysis problem. Two-dimensional Gabor filters are used to extract texture features for each of these regions. Our method does not assume any a priori knowledge about content or language of the document. As we can see experiment results, our method gives good performance in document segmentation and contents classification. The proposed system is expected to apply such as multimedia data searching, real-time image processing.

A Three-Step Preprocessing Algorithm for Enhanced Classification of E-Mail Recommendation System (이메일 추천 시스템의 분류 향상을 위한 3단계 전처리 알고리즘)

  • Jeong Ok-Ran;Cho Dong-Sub
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.4
    • /
    • pp.251-258
    • /
    • 2005
  • Automatic document classification may differ significantly according to the characteristics of documents that are subject to classification, as well as classifier's performance. This research identifies e-mail document's characteristics to apply a three-step preprocessing algorithm that can minimize e-mail document's atypical characteristics. In the first 5go, uncertain based sampling algorithm that used Mean Absolute Deviation(MAD), is used to address the question of selection learning document for the rule generation at the time of classification. In the subsequent stage, Weighted vlaue assigning method by attribute is applied to increase the discriminating capability of the terms that appear on the title on the e-mail document characteristic level. in the third and last stage, accuracy level during classification by each category is increased by using Naive Bayesian Presumptive Algorithm's Dynamic Threshold. And, we implemented an E-Mail Recommendtion System using a three-step preprocessing algorithm the enable users for direct and optimal classification with the recommendation of the applicable category when a mail arrives.

A Hangul Document Classification System using Case-based Reasoning (사례기반 추론을 이용한 한글 문서분류 시스템)

  • Lee, Jae-Sik;Lee, Jong-Woon
    • Asia pacific journal of information systems
    • /
    • v.12 no.2
    • /
    • pp.179-195
    • /
    • 2002
  • In this research, we developed an efficient Hangul document classification system for text mining. We mean 'efficient' by maintaining an acceptable classification performance while taking shorter computing time. In our system, given a query document, k documents are first retrieved from the document case base using the k-nearest neighbor technique, which is the main algorithm of case-based reasoning. Then, TFIDF method, which is the traditional vector model in information retrieval technique, is applied to the query document and the k retrieved documents to classify the query document. We call this procedure 'CB_TFIDF' method. The result of our research showed that the classification accuracy of CB_TFIDF was similar to that of traditional TFIDF method. However, the average time for classifying one document decreased remarkably.

An Automatic Classification System of Korean Documents Using Weight for Keywords of Document and Word Cluster (문서의 주제어별 가중치 부여와 단어 군집을 이용한 한국어 문서 자동 분류 시스템)

  • Hur, Jun-Hui;Choi, Jun-Hyeog;Lee, Jung-Hyun;Kim, Joong-Bae;Rim, Kee-Wook
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.447-454
    • /
    • 2001
  • The automatic document classification is a method that assigns unlabeled documents to the existing classes. The automatic document classification can be applied to a classification of news group articles, a classification of web documents, showing more precise results of Information Retrieval using a learning of users. In this paper, we use the weighted Bayesian classifier that weights with keywords of a document to improve the classification accuracy. If the system cant classify a document properly because of the lack of the number of words as the feature of a document, it uses relevance word cluster to supplement the feature of a document. The clusters are made by the automatic word clustering from the corpus. As the result, the proposed system outperformed existing classification system in the classification accuracy on Korean documents.

  • PDF

The Reorganization and Institutional Characteristics of National Records Management System during the 1980s to the 1990s (1980~90년대 국가기록관리체제의 개편과 제도적 특징)

  • Lee, Seung-Il
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.8 no.2
    • /
    • pp.5-38
    • /
    • 2008
  • Under the changes of administrative systems and office automation, 'the national record management system' had been reformed until 'Record Management Act' was enacted in 1999. Between 1984 and 1992, the national record management system was reformed in process of overcoming national crises and carrying out office automation. As a result, the system was absorbed into 'Governmental Document Regulations', 'Official Document Management Regulations' and 'Governmental Document Regulations'. In addition, 'government document classification scheme' and 'Record schedule' were unified into 'Official Document Classification and Record schedule'.

Classification Techniques for XML Document Using Text Mining (텍스트 마이닝을 이용한 XML 문서 분류 기술)

  • Kim Cheon-Shik;Hong You-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.2 s.40
    • /
    • pp.15-23
    • /
    • 2006
  • Millions of documents are already on the Internet, and new documents are being formed all the time. This poses a very important problem in the management and querying of documents to classify them on the Internet by the most suitable means. However, most users have been using the document classification method based on a keyword. This method does not classify documents efficiently, and there is a weakness in the category of document that includes meaning. Document classification by a person can be very correct sometimes and often times is required. Therefore, in this paper, We wish to classify documents by using a neural network algorithm and C4.5 algorithms. We used resume data forming by XML for a document classification experiment. The result showed excellent possibilities in the document category. Therefore, We expect an applicable solution for various document classification problems.

  • PDF