Interplay of Text Mining and Data Mining for Classifying Web Contents

웹 컨텐츠의 분류를 위한 텍스트마이닝과 데이터마이닝의 통합 방법 연구

  • 최윤정 (이화여자대학교 컴퓨터 학과) ;
  • 박승수 (이화여자대학교 컴퓨터 학과)
  • Published : 2002.09.01

Abstract

Recently, unstructured random data such as website logs, texts and tables etc, have been flooding in the internet. Among these unstructured data there are potentially very useful data such as bulletin boards and e-mails that are used for customer services and the output from search engines. Various text mining tools have been introduced to deal with those data. But most of them lack accuracy compared to traditional data mining tools that deal with structured data. Hence, it has been sought to find a way to apply data mining techniques to these text data. In this paper, we propose a text mining system which can incooperate existing data mining methods. We use text mining as a preprocessing tool to generate formatted data to be used as input to the data mining system. The output of the data mining system is used as feedback data to the text mining to guide further categorization. This feedback cycle can enhance the performance of the text mining in terms of accuracy. We apply this method to categorize web sites containing adult contents as well as illegal contents. The result shows improvements in categorization performance for previously ambiguous data.

References

  1. Proceedings, Fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'01) Topic Detection Tracking and Trend Analysis Using Self-Organizing Neural Networks Kanagasa R.;A-H. Tan
  2. Proceedings, Fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'01) Predictive Self-Organizing Networks for Text Categorization A-H Tan
  3. Accepted by ICDAR'01 Worshop on Web Document Analysis Web Structure Analysis for Information Mining Lakshmi V.;A-H Tan;C-L Tan
  4. Proceedings of the Sixth ACM SIGKDD International Conference in KDD Workshop on Text Mining Using Information Extraction to Aid the Discovery of Prediction Rules from Text Mooney J.
  5. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining A Feature Weight Adjustment Algorithm for Document Categorization Shankar S.;Karypis G.
  6. Automatic Personalization Based on Web Usage Mining Mobasher B.;Cooley R.;Srivastava J
  7. Frequent Sets, Sequences, and Taxonomies : New, Efficient Algorithmic Proposals, Technical Report LSI-00-78-R Baixeries, J.;G. Casas, J.;L. Balcazar
  8. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Tect Mining: Finding Nuggets in Mountains of Textual Data Dorre J.;Gerstl P.;Seiffert R.
  9. Trend in Knowledge Discovery from Databases (29th) Text Mining-Knowledge Discovery from Text Lee Hing Yan
  10. Proceedings of the Fifth ACM SIGKDD International Conference on KDD Fast and Effective Text Mining Using Linear-Time Document Clustering Larsen B.;Aone C.
  11. Journal of Konwledge and Information Systems v.1 no.1 Data Preparation for Mining World Wide Web Browsing Patterns Cooley R.;Mobasher B.;Srivastava, J.
  12. Communications of the ACM v.42 Mining Online Text Kevin K.
  13. Learning to Extract Key Phaese from Text, Technical Report ERB-1057 Turney P.
  14. Proceedings of the Third European Conference of Principles and Practice of Knowledge Discovery in Databases TopCat: Data Mining for Topic Identification in a Text Corpus Clifton C.;Cooley R.
  15. Journal of Information Retrieval An Evaluation of Statistical Approaches to Text Categorization Yang Y.
  16. Proceedings of the Seventh International Conference on Information and Knowledge Management Inductive Learning Algorithms and Representations for Text Categorization Platt J.;Heckerman, D.;Sahami M.
  17. Proceeding of Fifteenth National Conference on Artificial Intelligence Adaptive Web Sites: Automatically Synthesizing Web page Perkowitz M.;Etzioni O.
  18. IEEE Bulletin of the Technical Committee on Data Engineering v.1 no.21 Hypergraph Based Clustering in High-Dimensional Data Sets : a Summary of Results E-H Han;Karypis G.;Kumar, V.;Mobasher, B.
  19. Advanced in Knowledge Discovery and Data Mining From Data Mining to Knowledge Discovery Fayyad, U. M.;Piatetsky-Shapiro, G.;Smyth, P.
  20. IBM Text Mining