DOI QR코드

DOI QR Code

An Efficient Algorithm for NaiveBayes with Matrix Transposition

행렬 전치를 이용한 효율적인 NaiveBayes 알고리즘

  • 이재문 (한성대학교 컴퓨터공학부)
  • Published : 2004.02.01

Abstract

This paper proposes an efficient algorithm of NaiveBayes without loss of its accuracy. The proposed method uses the transposition of category vectors, and minimizes the computation of the probability of NaiveBayes. The proposed method was implemented on the existing framework of the text categorization, so called, AI::Categorizer and it was compared with the conventional NaiveBayes with the well-known data, Router-21578. The comparisons show that the proposed method outperforms NaiveBayes about two times with respect to the executing time.

본 논문은 NaiveBayes에서 정확도의 손실 없이 효율적으로 동작하는 NaiveBayes에 대한 새로운 알고리즘을 제안한다. 제안된 방법은 분류 벡터에 대한 행렬 전치를 사용하여 NaiveBayes의 확률 계산 량을 최소화하는 것이다. 제안된 방법을 문서 분류 프레임 인 AI::Categorizer 상에서 구현하였으며, 잘 알려진 로이터-21578 데이터를 사용하여 기존의 NaiveBayes 방법과 비교하였다. 성능 비교의 결과로부터 제안된 방법이 기존의 NaiveBayes 방법보다 실행 속도측면에서 약 2배 정도의 성능 개선 효과가 있음을 알 수 있었다. 수 있었다.

Keywords

References

  1. Y. Yang, 'Expert Network : Effective and efficient learning from human decisions in text categorization and retrieval,' In 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994
  2. S. T. Dumais, J. Platt, D. Heckerman, and M. Sahami, 'Inductive learning algorithms and representations for text categorization,' In CIKM, 1998 https://doi.org/10.1145/288627.288651
  3. Y. Yang and X. Liu, 'A re-examination of text categorization methods,' In 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkley, August, 1999 https://doi.org/10.1145/312624.312647
  4. Calvo, R. A. and H. A. Ceccatto, 'Intelligent Document Classification,' Intelligent Data Analysis, 4(5), 2000
  5. Calvo R. A., 'Classifying financial news with neural networks,' In 6th Australian Document Symposium, page 6, Dec., 2001
  6. Tom Ault and Y. Yang, 'kNN, Rocchio and Metrics for Information Filtering at TREC-10,' In The 10th Text Retrieval Conference(TREC-10), NIST, 2001
  7. Y. Yang, 'A Study on Thresholding Strategies for Text Categorization,' In 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 2001
  8. Reuters-21578 Document Collection, http://about.reuters.com/researchandstandards/corpus
  9. Sebastiani, F., 'Machine learning in automated text categorization,' ACM Computing Surveys, 34(1), pp.1-47, 2002 https://doi.org/10.1145/505282.505283
  10. Williams, K. and R. A. Calvo, 'A Framework for Text Categorization', 7th Australian Document Computing Symposium, Dec., 2002
  11. 김한준, '텍스트 마이닝 기술을 적용한 대용량 온라인 문서 데이터의 계층적 조직화 기법', 서울대학교 대학원 박사학위 논문, 2002
  12. Calvo, R. A. and J. M. Lee, 'Coping with the News : the machine learning way', The 9th Australian World Wide Web Conference(AUSWEB 03), 2003
  13. 이재문, '휴리스틱을 이용한 kNN의 효율성 개선', 정보처리학회논문지B, 제10-B권 제6호, 2003 https://doi.org/10.3745/KIPSTB.2003.10B.6.719