DOI QR코드

DOI QR Code

An Improvement Of Efficiency For kNN By Using A Heuristic

휴리스틱을 이용한 kNN의 효율성 개선

  • 이재문 (한성대학교 정보전산학부)
  • Published : 2003.10.01

Abstract

This paper proposed a heuristic to enhance the speed of kNN without loss of its accuracy. The proposed heuristic minimizes the computation of the similarity between two documents which is the dominant factor in kNN. To do this, the paper proposes a method to calculate the upper limit of the similarity and to sort the training documents. The proposed heuristic was implemented on the existing framework of the text categorization, so called, AI :: Categorizer and it was compared with the conventional kNN with the well-known data, Router-21578. The comparisons show that the proposed heuristic outperforms kNN about 30∼40% with respect to the execution time.

이 논문은 kNN의 정확도의 손실 없이 kNN의 효율성을 개선하는 휴리스틱을 제안한다. 제안된 휴리스틱은 kNN 실행 시간의 주요 요소인 두 문서간 유사성 계산을 최소화하는 것이다. 이것을 위하여 본 논문은 유사성의 상한값을 계산하는 방법과 훈련 문서를 정렬하는 방법을 제안한다. 제안된 휴리스틱을 문서 분류 프레임?인 AI :: Categorizer 상에서 구현하였으며, 잘 알려진 로이터-21578 데이터를 사용하여 기존의 kNN과 비교하였다. 성능 비교의 결과로부터 제안된 휴리스틱을 적용한 방법이 기존의 kNN보다 실행 속도측면에서 약 30∼40%의 개선 효과가 있음을 알 수 있었다.

Keywords

References

  1. Y. Yang, 'Expert Network : Effective and efficient learning from human decisions in text categorization and retrieval,' In 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994
  2. S.T. Dumais, J. Platt, D. Heckerman, and M. Sahami, 'Inductive learning algorithms and representations for text categorization,' In CIKN https://doi.org/10.1145/288627.288651
  3. Y. Yang and X. Liu, 'A re-examination of text categorization methods,' In 22nd Annual International ACM SIGIR Congerence on Reseaech and Development in Information Retrieval, Berkley, August, 1999 https://doi.org/10.1145/312624.312647
  4. Calvo, R.A. and H.A. Ceccatto, 'Intelligent Document Classification,' Intelligent Data Analysis, 4(5), 2000
  5. Calvo, R.A., 'Classifying financial news with neural networs,' In 6th Australian Document Symposium, p.6, December, 2001
  6. Tom Ault and Y. Yang, 'kNN, Rocchio and Metrics for Information Fitering at TREC-10,' In The 10th Text Retrieval Conference(TREC-10), NIST, 2001
  7. Y. Yang, 'A Study on Thresholding Strategies for Text Categorization,' In 24th Annual Intermational ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 2001
  8. Reuters-21578 Document Collection, http://about.reuters.com/researchandstandards/corpus
  9. Sebastiani F., 'Machine learning in automated text categorization,' ACM Computing Surveys, 34(1), pp.1-47, 2002 https://doi.org/10.1145/505282.505283
  10. Williams K. and R.A. Calvo, 'A Framework for Text Categorization,' 7th Australian Document Computing Symposium, December, 2002
  11. 김한준, '텍스트 마이닝 기술을 적용한 대용량 온라인 문서데이터의 계층적 조직화 기법,' 서울대학교 대학원 박사학위 논문, 2002
  12. Calvo, R.A. and J.M. Lee, 'Coping with the News: the machine learning way,' The 9th Australian Workd Wide Web Conference(AUSWEB 03), 2003

Cited by

  1. A Study on the Documents's Automatic Classification Using Machine Learning vol.39, pp.4, 2008, https://doi.org/10.1633/JIM.2008.39.4.047