The Study on the Effective Automatic Classification of Internet Document Using the Machine Learning

기계학습을 기반으로 한 인터넷 학술문서의 효과적 자동분류에 관한 연구

  • Published : 2001.09.01

Abstract

This study experimented the performance of categorization methods using the kNN classifier. Most sample based automatic text categorization techniques like the kNN classifier reduces the feature set of the training documents. We sought to find out which percentage reductions in the feature set would result in high performances. In addition, the kNN classifier has to find the k number of training documents most similar to the test documents in the training documents. We sought to verify the most appropriate k value through experiments.

본 연구에서는 kNN분류기를 이용한 범주화 방법에 대한 성능 실험을 하였다. kNN분류기와 같은 대부분의 예제기반 자동 분류기법은 학습문서집단의 자질을 축소하게 되는데 자질을 몇 퍼센트 축소함으로써 높은 성능을 얻을 수 있는지를 알아보고자 하였다. 또한, kNN분류기는 학습문서집단에서 검증문서와 가장 유사한 k개의 학습문서를 찾아야 하는데, 이때 가장 적합한 k값은 얼마인지를 실험을 통하여 검증하여 보고자 하였다.

Keywords

References

  1. 인터넷 탐색엔진의 분류체계에 관한 연구 : 컴퓨터, 인터넷 분야를 중심으로 김영보
  2. 한국문헌정보학회지 v.3 no.2 네트웍 데이터베이스에서의 주제별 디렉토리와 키워드 탐색엔진의 탐색효율에 관한 탐색적 연구 이명희
  3. 정보관리학회 학술대회 논문집 KNN 분류기의 범주할당 방법 비교 실험 이영숙;정영미
  4. 정보관리학회지 v.15 no.2 인터넷 학술정보자원의 디렉토리 서비스 설계에 있어서 DDC 분류체계의 활용에 관한 연구 최재황
  5. 정보관리학회지 v.15 no.3 인터넷 정보서비스의 분류체계에 대한 비교연구 : 물리학을 중심으로 최희윤
  6. Blue Web'n, Browse by Subject Area
  7. biz/ed: Internet Catalogue
  8. Expanding Universe: A Classified Search Tool for Amateur Astronomy
  9. PICK: Quality Internet Resources in Library and Information Science
  10. Automatic Classification of WAIS databases Ard, Anders;Koch, Traugott
  11. Nearest Neighbor(NN) Norms: NN Patern Classification Techniques Belur V. Dasarathy
  12. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) Context-Sensitive Learning Methods for Text Categorization Cohen, William W.;Yoram Singer
  13. Cataloging & Classification Quarterly v.21 no.2 The Future of Classification in Libraries and Networks, a Theoretical Point of View Dahlberg, Ingtraut
  14. IEEE Intelligent Systems v.13 no.4 Support vector machines Hearst, M.A(et al.)
  15. Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) Cluster-based Text Categorization: a Comparison of Category Search Strategies Iwayama, Makato;Takenobu Tokunaga
  16. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) Training algorithms for linear text classifiers Lewis, David D.(et al.)
  17. Dewey Decimal Classification Online Project : Evaluation of a Library Schedule and Index Integrated into the Subject Searching Capabilities of an Online Catalog Markey, K.;A. N. Demeyer
  18. Beyond Bookmarks : Schemes for Organizing the Web McKierman, Gerry
  19. Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) Classifying News Stories Using Memory Based Reseonin Masand, B.;G. Linoff;D.Waltz
  20. Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval Text Categorization: a Symbolic Approach Moulinier, I.;G. Raskinis;J. Ganascia
  21. Library Resources and Technical Services v.27 no.1 Use of Classification in Online Retrieval Svenonius, Elaine
  22. Information Retrieval van Rijsbergen, C. J.
  23. Classification Research at OCLC Vizine-Goetz, Diane
  24. Proceedings of the Seventh ACM Conference on Hypertext HyPursuit: a Hierarchical Network Search Engine That Exploits Content-Link Hypertext Clustering Weiss, Ron(et al.)
  25. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) Expert Network : Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval Yang, Y.
  26. An Evaluation of Statistical Approaches to Text Categorization Yang, Y.
  27. Journal of Information Retrieval v.1 no.1-2 An Evaluation of Statistical Approaches to Text Categorization Yang, Y.
  28. Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) A Re-examination of Text Categorization Methods Yang, Y.;Xin Liu