• Title/Summary/Keyword: Automatic Categorization

Search Result 84, Processing Time 0.023 seconds

Normalized Term Frequency Weighting Method in Automatic Text Categorization (자동 문서분류에서의 정규화 용어빈도 가중치방법)

  • 김수진;박혁로
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.255-258
    • /
    • 2003
  • This paper defines Normalized Term Frequency Weighting method for automatic text categorization by using Box-Cox, and then it applies automatic text categorization. Box-Cox transformation is statistical transformation method which makes normalized data. This paper applies that and suggests new term frequency weighting method. Because Normalized Term Frequency is different from every term compared by existing term frequency weighting method, it is general method more than fixed weighting method such as log or root. Normalized term frequency weighting method's reasonability has been proved though experiments, used 8000 newspapers divided in 4 groups, which resulted high categorization correctness in all cases.

  • PDF

Improving the Performance of Statistical Automatic Text Categorization by using Phrasal Patterns and Keyword Sets (구문 패턴과 키워드 집합을 이용한 통계적 자동 문서 분류의 성능 향상)

  • Han, Jeong-Gi;Park, Min-Gyu;Jo, Gwang-Je;Kim, Jun-Tae
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1150-1159
    • /
    • 2000
  • This paper presents an automatic text categorization model that improves the accuracy by combining statistical and knowledge-based categorization methods. In our model we apply knowledge-based method first, and then apply statistical method on the text which are not categorized by knowledge-based method. By using this combined method, we can improve the accuracy of categorization while categorize all the texts without failure. For statistical categorization, the vector model with Inverted Category Frequency (ICF) weighting is used. For knowledge-based categorization, Phrasal Patterns and Keyword Sets are introduced to represent sentence patterns, and then pattern matching is performed. Experimental results on new articles show that the accuracy of categorization can be improved by combining the tow different categorization methods.

  • PDF

Automatic Summarization of French Scientific Articles by a Discourse Annotation Method using the EXCOM System

  • Antoine, Blais
    • Language and Information
    • /
    • v.13 no.1
    • /
    • pp.1-20
    • /
    • 2009
  • Summarization is a complex cognitive task and its simulation is very difficult for machines. This paper presents an automatic summarization strategy that is based on a discourse categorization of the textual information. This categorization is carried out by the automatic identification of discourse markers in texts. We defend here the use of discourse methods in automatic summarization. Two evaluations of the summarization strategy are presented. The summaries produced by our strategy are evaluated with summaries produced by humans and other applications. These two evaluations display well the capacity of our application, based on EXCOM, to produce summaries comparable to the summaries of other applications.

  • PDF

The Study on the Effective Automatic Classification of Internet Document Using the Machine Learning (기계학습을 기반으로 한 인터넷 학술문서의 효과적 자동분류에 관한 연구)

  • 노영희
    • Journal of Korean Library and Information Science Society
    • /
    • v.32 no.3
    • /
    • pp.307-330
    • /
    • 2001
  • This study experimented the performance of categorization methods using the kNN classifier. Most sample based automatic text categorization techniques like the kNN classifier reduces the feature set of the training documents. We sought to find out which percentage reductions in the feature set would result in high performances. In addition, the kNN classifier has to find the k number of training documents most similar to the test documents in the training documents. We sought to verify the most appropriate k value through experiments.

  • PDF

A Study on Automatic Text Categorization of Web-Based Query Using Synonymy List (유사어 사전을 이용한 웹기반 질의문의 자동 범주화에 관한 연구)

  • Nam, Young-Joon;Kim, Gyu-Hwan
    • Journal of Information Management
    • /
    • v.35 no.4
    • /
    • pp.81-105
    • /
    • 2004
  • In this study, the way of the automatic text categorization on web-based query was implemented. X2 methods based on the Supported Vector Machine were used to test the efficiency of text categorization on queries. This test is carried out by the model using the Synonymy List. 713 synonyms were extracted manually from the tested documents. As the result of this test, the precision ratio and the recall ratio were decreased by -0.01% and by 8.53%, respectively whether the synonyms were assigned or not. It also shows that the Value of F1 Measure was increased by 4.58%. The standard deviation between the recall and precision ratio was improve by 18.39%.

A Study on Development of Automatic Categorization System for Internet Documents (인터넷 문서 자동 분류 시스템 개발에 관한 연구)

  • Han, Kwang-Rok;Sun, B.K.;Han, Sang-Tae;Rim, Kee-Wook
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.9
    • /
    • pp.2867-2875
    • /
    • 2000
  • In this paper, we discuss the implementation of automatic internet text categorization system. A categorization algorithm is designed and the system is implemented by back propagation learning model. Internet documents are collected according to the established categories and tested by Chi-squre ($\chi^2$) for the document leaning, and the category features are extracted. The sets of learning and separating vector are productt>d by these features. As a result of experimental evaluation, we show that this system is more improved in the performance of automatic categorization than the nearest neigbor method.

  • PDF

An Automatic Text Categorization Theories and Techniques for Text Management (문서관리를 위한 자동문서범주화에 대한 이론 및 기법)

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of Information Management
    • /
    • v.33 no.2
    • /
    • pp.19-32
    • /
    • 2002
  • With the growth of the digital library and the use of Internet, the amount of online text information has increased rapidly. The need for efficient data management and retrieval techniques has also become greater. An automatic text categorization system assigns text documents to predefined categories. The system allows to reduce the manual labor for text categorization. In order to classify text documents, the good features from the documents should be selected and the documents are indexed with the features. In this paper, each steps of text categorization and several techniques used in each step are introduced.

Impact of Instance Selection on kNN-Based Text Categorization

  • Barigou, Fatiha
    • Journal of Information Processing Systems
    • /
    • v.14 no.2
    • /
    • pp.418-434
    • /
    • 2018
  • With the increasing use of the Internet and electronic documents, automatic text categorization becomes imperative. Several machine learning algorithms have been proposed for text categorization. The k-nearest neighbor algorithm (kNN) is known to be one of the best state of the art classifiers when used for text categorization. However, kNN suffers from limitations such as high computation when classifying new instances. Instance selection techniques have emerged as highly competitive methods to improve kNN through data reduction. However previous works have evaluated those approaches only on structured datasets. In addition, their performance has not been examined over the text categorization domain where the dimensionality and size of the dataset is very high. Motivated by these observations, this paper investigates and analyzes the impact of instance selection on kNN-based text categorization in terms of various aspects such as classification accuracy, classification efficiency, and data reduction.

Automatic categorization of chloride migration into concrete modified with CFBC ash

  • Marks, Maria;Jozwiak-Niedzwiedzka, Daria;Glinicki, Michal A.
    • Computers and Concrete
    • /
    • v.9 no.5
    • /
    • pp.375-387
    • /
    • 2012
  • The objective of this investigation was to develop rules for automatic categorization of concrete quality using selected artificial intelligence methods based on machine learning. The range of tested materials included concrete containing a new waste material - solid residue from coal combustion in fluidized bed boilers (CFBC fly ash) used as additive. The rapid chloride permeability test - Nordtest Method BUILD 492 method was used for determining chloride ions penetration in concrete. Performed experimental tests on obtained chloride migration provided data for learning and testing of rules discovered by machine learning techniques. It has been found that machine learning is a tool which can be applied to determine concrete durability. The rules generated by computer programs AQ21 and WEKA using J48 algorithm provided means for adequate categorization of plain concrete and concrete modified with CFBC fly ash as materials of good and acceptable resistance to chloride penetration.

Automatic Categorization of Clusters in Unsupervised Classificatin

  • Jeon, Dong-Keun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.1E
    • /
    • pp.29-33
    • /
    • 1996
  • A categorization for cluster is necessary when an unsupervised classfication is used for remote sensing image classification. It is desirable that this method is performed automatically, because manual categorization is a highly time consuming process. In this paper, several automatic determination methods were proposed and evaluated. They are four methods. a) maximum number method : which assigns the tharget cluster to the category which occupies the largest area of that cluster b) maximum percentage method : which assigns the target cluster to the category which shows the maximum percentage within the category in that cluster. c) minmun distance method : which assigns the target cluster to the category having minmum distance with that cluster d) element ratio matching method : which assigns local regions to the category having the most similar element ratio of that region From the results of the experiments, it was certified that the result of minimum distance method was almost the same as the result made by a human operator.

  • PDF