Evaluation of User Profile Construction Method by Fuzzy Inference

  • 발행 : 2008.09.01


To construct user profiles automatically, an extraction method for representative keywords from a set of documents is needed. In our previous works, we suggested such a method and showed its usefulness. Here, we apply it to the classification problem and observe how much it contributes to performance improvement. The method can be used as a linear document classifier with few modifications. So, we first evaluate its performance for that case. The method is also applicable to some non-linear classification methods such as GIS (Generalized Instance Set). In GIS algorithm, generalized instances are built from training documents by a generalization function and then the K-NN algorithm is applied to them, where the method can be used as a generalization function. For comparative works, two famous linear classification methods, Rocchio and Widrow-Hoff algorithms, are also used. Experimental results show that our method is better than the others for the case that only positive documents are considered, but not when negative documents are considered together.



  2. R. Baeza-Yates andB. Ribeiro-Neto, Modern Information Retrieval, ACM Press, NY, USA, 1999
  3. C. Buckley and G. Salton, 'Optimization of relevance feedback weights,' Proc. of 18th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, 1995
  4. D. Goldberg, D. Nichols, B. M. Oki and Douglas Terry, 'Using Collaborative Filtering to Weave an Information Tapestry. Commun,' ACM, 35, 1992
  5. T. Joachims, 'Text Categorization with Support Vector Machines: Learning with Many Relevant Features,' Proc. of European Conference on Machine Learning, pp. 137-142, 1998
  6. J. Kim, D.W. Oard and K. Romanik, 'User modeling for information filtering based on implicit feedback,' Proc. of ISKO-France, 2000
  7. Byeong Man Kim, Ju Youn Kim and Jongwan Kim, 'Query Term Expansion and Reweighting using Term Co-Occurrence Similarity and Fuzzy Inference,' Proc. of IFSA/NAFIPS, pp.715-720, 2001
  8. Byeong Man Kim, Qing Li, Kwang-Ho Lee and Bo-Yeong Kang, 'Extraction of Representative Keywords Considering Co-occurrence in Positive Documents,' FSKD 2005 : Fuzzy Systems and Knowledge Discovery, Lipo Wang and Yaochu Jin, Eds., LNAI 3614, Springer-Verlag, pp. 752-761, 2005
  9. J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L.R. Gordon and J. Riedl, 'GroupLens: Applying collaborative filtering to Usenet News,' CACM, 40(3), pp. 77-87, 1997
  10. K. Lam and C. Ho, 'Using a generalized instance set for automatic textcategorization,' Proc. of 21th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 88-89, 1998
  11. Kwok-Yin Lai and Wai Lam, 'Automatic Textual Document Categorization Using Multiple Similarity-Based Models,' Proc. of SDM01, 2001
  12. D.D. Lewis, R.E. Schapore, J.P. Call and R. Papka, 'Training algorithms for linear text classifiers,' Proc. of 19th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 298-306, 1996
  13. Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997
  14. D. M. Nichols, 'Implicit ratings and filtering,' Proc. of the 5th DELOS Workshop on Filtering and Collaborative Filtering, pp. 10-12, 1997
  15. M. Pazzani and D. Billsus, 'Learning and revising user profiles: the identification of interesting Web sites,' Machine Learning, 1997
  16. lan Ruthven and M. Lalmas, 'A survey on the use of relevance feedback for information access systems,' Knowledge Engineering Review, 18 (2), pp. 95- 145, 2003
  17. R. Schapire, Y. Singer and A. Singal, 'Boosting and Rocchio Applied to Text Filtering,' Proc. of 21th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, 1998
  18. F. Sebastiani, 'Machine Learning in Automated Text,' Technical Report IEI-B4-31-1999, Istituto di Elaborazione dellInformazione, 1999
  19. Y. Seo and B. Zhang, 'Personalized Web Document Filtering Using Reinforcement Learning,' Applied Artificial Intelligence, 2001
  20. Y. Yang, 'Expert network: effective and efficient learning from human decisions in text categorization and retrieval,' Proc. of 17th Int. ACM SIGIR Conference on Research and Development in Information Retrieval, 1994
  21. T. W. Yan and H. Garcia-Molina, 'SIFT- A tool for wide-area information dissemination,' Proc. of the 1995 USENIX Technical Conference, 1995
  22. Y. Yang and X. Liu, 'A re-examination of text categorization methods,' Proc. of 22nd Int. ACM SIGIR Conference on Research and Development in Information Retrieval, 1999
  23. Y. Yang, 'An evaluation of statistical approaches to text,' Journal of Information Retrieval, pp. 67-88, 1999