• Title/Summary/Keyword: text categorization

Search Result 145, Processing Time 0.029 seconds

An Efficient Algorithm for NaiveBayes with Matrix Transposition (행렬 전치를 이용한 효율적인 NaiveBayes 알고리즘)

  • Lee, Jae-Moon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.1
    • /
    • pp.117-124
    • /
    • 2004
  • This paper proposes an efficient algorithm of NaiveBayes without loss of its accuracy. The proposed method uses the transposition of category vectors, and minimizes the computation of the probability of NaiveBayes. The proposed method was implemented on the existing framework of the text categorization, so called, AI::Categorizer and it was compared with the conventional NaiveBayes with the well-known data, Router-21578. The comparisons show that the proposed method outperforms NaiveBayes about two times with respect to the executing time.

A Study on Compilations of 『Duchanggyeongheombang』 and its Practical Application (『두창경험방(痘瘡經驗方)』의 편집본과 그 활용에 대한 연구)

  • Kim, Sanghyun
    • Journal of Korean Medical classics
    • /
    • v.33 no.1
    • /
    • pp.81-88
    • /
    • 2020
  • Objectives : To investigate how 『Duchanggyeongheombang』 has been adopted and edited in practical texts such as 『Gosachwalyo』, 『Sallimgyeongje』, 『Gosasinseo』. Methods : Based on the disassembled verses of a paragraph in the 『Duchanggyeongheombang』, the 「Duchanggyeongheombang」 contents in 『Gosachwalyo』, 『Sallimgyeongje』,『Gosasinseo』 were compared and examined. Results : 『Gosachwalyo』 directly summarized and quoted the contents of 『Duchanggyeongheombang』 written by Park, Jinhee, while the contents in 『Sallimgyeongje』 and 『Gosasinseo』 are mostly similar, summarizing and quoting from 『Gosachwalyo』. Conclusions : In the perspective of text categorization, while the professional and specialized contents of 『Duchanggyeongheombang』 has been excluded, it was edited in ways of increasing practicality. As these texts were widely dispersed to the public, we can conclude that 『Duchanggyeongheombang』 was very influential in the treatment of douchang(痘瘡, smallpox) among the public.

Fuzzy based Intelligent Expert Search for Knowledge Management Systems

  • Yang, Kun-Woo;Huh, Soon-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.2
    • /
    • pp.87-100
    • /
    • 2003
  • In managing organizational tacit knowledge, recent researches have shown that it is more applicable in many ways to provide expert search mechanisms in KMS to pinpoint experts in the organizations with searched expertise. In this paper, we propose an intelligent expert search framework to provide search capabilities for experts in similar or related fields according to the user′s information needs. In enabling intelligent expert searches, Fuzzy Abstraction Hierarchy (FAH) framework has been adopted, through which finding experts with similar or related expertise is possible according to the subject field hierarchy defined in the system. To improve FAH, a text categorization approach called Vector Space Model is utilized. To test applicability and practicality of the proposed framework, the prototype system, "Knowledge Portal for Researchers in Science and Technology" sponsored by the Ministry of Science and Technology (MOST) of Korea, was developed.

  • PDF

Automation of Expert Classification in Knowledge Management Systems Using Text Categorization Technique (문서 범주화를 이용한 지식관리시스템에서의 전문가 분류 자동화)

  • Yang, Kun-Woo;Huh, Soon-Young
    • Asia pacific journal of information systems
    • /
    • v.14 no.2
    • /
    • pp.115-130
    • /
    • 2004
  • This paper proposes how to build an expert profile database in KMS, which provides the information of expertise that each expert possesses in the organization. To manage tacit knowledge in a knowledge management system, recent researches in this field have shown that it is more applicable in many ways to provide expert search mechanisms in KMS to pinpoint experts in the organizations with searched expertise so that users can contact them for help. In this paper, we develop a framework to automate expert classification using a text categorization technique called Vector Space Model, through which an expert database composed of all the compiled profile information is built. This approach minimizes the maintenance cost of manual expert profiling while eliminating the possibility of incorrectness and obsolescence resulted from subjective manual processing. Also, we define the structure of expertise so that we can implement the expert classification framework to build an expert database in KMS. The developed prototype system, "Knowledge Portal for Researchers in Science and Technology," is introduced to show the applicability of the proposed framework.

Fuzzy-based Intelligent Expert Search for Knowledge Management Systems

  • Yang, Kun-woo;Huh, Soon-young
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.73-79
    • /
    • 2003
  • In managing organizational tacit knowledge, recent researches have shown that it is more applicable in many ways to provide expert search mechanisms in KMS to pinpoint experts in the organizations with searched expertise. In this paper, we propose an intelligent expert search framework to provide search capabilities for experts in similar or related fields according to the user's information needs. In enabling intelligent expert searches, Fuzzy Abstraction Hierarchy (FAH) framework has been adopted, through which finding experts with similar or related expertise is possible according to the subject field hierarchy defined in the system. To improve FAH, a text categorization approach called Vector Space Model is utilized. To test applicability and practicality of the proposed framework, the prototype system, "Knowledge Portal for Researchers in Science and Technology" sponsored by the Ministry of Science and Technology (MOST) of Korea, was developed.

  • PDF

Performance Improvement by a Virtual Documents Technique in Text Categorization (문서분류에서 가상문서기법을 이용한 성능 향상)

  • Lee, Kyung-Soon;An, Dong-Un
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.501-508
    • /
    • 2004
  • This paper proposes a virtual relevant document technique in the teaming phase for text categorization. The method uses a simple transformation of relevant documents, i.e. making virtual documents by combining document pairs in the training set. The virtual document produced by this method has the enriched term vector space, with greater weights for the terms that co-occur in two relevant documents. The experimental results showed a significant improvement over the baseline, which proves the usefulness of the proposed method: 71% improvement on TREC-11 filtering test collection and 11% improvement on Routers-21578 test set for the topics with less than 100 relevant documents in the micro average F1. The result analysis indicates that the addition of virtual relevant documents contributes to the steady improvement of the performance.

A Semantic-Based Feature Expansion Approach for Improving the Effectiveness of Text Categorization by Using WordNet (문서범주화 성능 향상을 위한 의미기반 자질확장에 관한 연구)

  • Chung, Eun-Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.261-278
    • /
    • 2009
  • Identifying optimal feature sets in Text Categorization(TC) is crucial in terms of improving the effectiveness. In this study, experiments on feature expansion were conducted using author provided keyword sets and article titles from typical scientific journal articles. The tool used for expanding feature sets is WordNet, a lexical database for English words. Given a data set and a lexical tool, this study presented that feature expansion with synonymous relationship was significantly effective on improving the results of TC. The experiment results pointed out that when expanding feature sets with synonyms using on classifier names, the effectiveness of TC was considerably improved regardless of word sense disambiguation.

A Categorization Scheme of Tag-based Folksonomy Images for Efficient Image Retrieval (효과적인 이미지 검색을 위한 태그 기반의 폭소노미 이미지 카테고리화 기법)

  • Ha, Eunji;Kim, Yongsung;Hwang, Eenjun
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.6
    • /
    • pp.290-295
    • /
    • 2016
  • Recently, folksonomy-based image-sharing sites where users cooperatively make and utilize tags of image annotation have been gaining popularity. Typically, these sites retrieve images for a user request using simple text-based matching and display retrieved images in the form of photo stream. However, these tags are personal and subjective and images are not categorized, which results in poor retrieval accuracy and low user satisfaction. In this paper, we propose a categorization scheme for folksonomy images which can improve the retrieval accuracy in the tag-based image retrieval systems. Consequently, images are classified by the semantic similarity using text-information and image-information generated on the folksonomy. To evaluate the performance of our proposed scheme, we collect folksonomy images and categorize them using text features and image features. And then, we compare its retrieval accuracy with that of existing systems.

Classification Performance Analysis of Cross-Language Text Categorization using Machine Translation (기계번역을 이용한 교차언어 문서 범주화의 분류 성능 분석)

  • Lee, Yong-Gu
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.1
    • /
    • pp.313-332
    • /
    • 2009
  • Cross-language text categorization(CLTC) can classify documents automatically using training set from other language. In this study, collections appropriated for CLTC were extracted from KTSET. Classification performance of various CLTC methods were compared by SVM classifier using machine translation. Results showed that the classification performance in the order of poly-lingual training method, training-set translation and test-set translation. However, training-set translation could be regarded as the most useful method among CLTC, because it was efficient for machine translation and easily adapted to general environment. On the other hand, low performance was shown to be due to the feature reduction or features with no subject characteristics, which occurred in the process of machine translation of CLTC.

Reinforcement Method for Automated Text Classification using Post-processing and Training with Definition Criteria (학습방법개선과 후처리 분석을 이용한 자동문서분류의 성능향상 방법)

  • Choi, Yun-Jeong;Park, Seung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.12B no.7 s.103
    • /
    • pp.811-822
    • /
    • 2005
  • Automated text categorization is to classify free text documents into predefined categories automatically and whose main goals is to reduce considerable manual process required to the task. The researches to improving the text categorization performance(efficiency) in recent years, focused on enhancing existing classification models and algorithms itself, but, whose range had been limited by feature based statistical methodology. In this paper, we propose RTPost system of different style from i.ny traditional method, which takes fault tolerant system approach and data mining strategy. The 2 important parts of RTPost system are reinforcement training and post-processing part. First, the main point of training method deals with the problem of defining category to be classified before selecting training sample documents. And post-processing method deals with the problem of assigning category, not performance of classification algorithms. In experiments, we applied our system to documents getting low classification accuracy which were laid on a decision boundary nearby. Through the experiments, we shows that our system has high accuracy and stability in actual conditions. It wholly did not depend on some variables which are important influence to classification power such as number of training documents, selection problem and performance of classification algorithms. In addition, we can expect self learning effect which decrease the training cost and increase the training power with employing active learning advantage.