A Hybrid Collaborative Filtering Using a Low-dimensional Linear Model

저차원 선형 모델을 이용한 하이브리드 협력적 여과

  • 고수정 (인덕대학 컴퓨터소프트웨어과)
  • Published : 2009.10.15

Abstract

Collaborative filtering is a technique used to predict whether a particular user will like a particular item. User-based or item-based collaborative techniques have been used extensively in many commercial recommender systems. In this paper, a hybrid collaborative filtering method that combines user-based and item-based methods using a low-dimensional linear model is proposed. The proposed method solves the problems of sparsity and a large database by using NMF among the low-dimensional linear models. In collaborative filtering systems the methods using the NMF are useful in expressing users as semantic relations. However, they are model-based methods and the process of computation is complex, so they can not recommend items dynamically. In order to complement the shortcomings, the proposed method clusters users into groups by using NMF and selects features of groups by using TF-IDF. Mutual information is then used to compute similarities between items. The proposed method clusters users into groups and extracts features of groups on offline and determines the most suitable group for an active user using the features of groups on online. Finally, the proposed method reduces the time required to classify an active user into a group and outperforms previous methods by combining user-based and item-based collaborative filtering methods.

협력적 여과는 특별한 아이템에 대한 사용자의 선호도를 예측하는 데 사용하는 기술이다. 이러한 협력적 여과 기술은 사용자 기반 접근 방식과 아이템 기반 접근 방식으로 구분할 수 있으며, 많은 상업적인 추천 시스템에서 광범위하게 사용되고 있다. 본 논문에서는 저차원 선형 모델을 사용하여 사용자 기반과 아이템 기반을 통합하는 하이브리드 협력적 여과 방법을 제안한다. 제안한 방법에서는 저차원 선형모델 중 비음수 행렬 분해(NMF)를 이용하여 기존의 협력적 여과 시스템의 문제점인 희박상과 대용량성의 문제점을 해결한다. 협력적 여과 시스템에서 NMF를 이용하는 방법은 사용자를 의미 관계로 표현할 때 유용하게 사용되나 사용자-아이템 행렬의 평가값에 따라 정확도가 낮아질 수 있으며, 모델 기반의 방법이기 때문아 계산 과정이 복잡하여 동적인 추천이 불가능하다는 단점을 갖는다. 이러한 단점을 보완하기 위하여 제안된 방법에서는 NMF에 의해 군집된 그룹을 대상으로 TF-lDF를 이용하여 그룹의 특징을 추출한다. 또한, 아이템 기반에서 아이템간의 유사도를 계산하기 위하여 상호정보량(mutual information)을 이용한다. 오프라인 상에서 훈련집합의 사용자를 군집시키고 그룹의 특징을 추출한 후, 온라인 상에서 추출한 그룹의 특징을 이용하여 새로운 사용자를 가장 최적의 그룹으로 분류함으로써 사용자를 분류하는 데 걸리는 시간을 단축시켜 동적인 추천을 가능하게 하며, 사용자 기반과 아이템 기반을 병합함으로써 기존의 방법보다 정확도를 높인다.

Keywords

References

  1. Wang, J., de Vries, A. P., and Reinders, M. J. T., 'Unifying User-based and Item-based Collaborative Filtering Approaches by Similarity Fusion,' In Proceedings of SIGIR2006, 2006
  2. Breese, J. S., Heckerman, D., and Kadie, C., 'Empirical analysis of predictive algorithms for collaborative filtering,' In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 1998
  3. Wei, Y. Z., Moreau, L., and Jennings, N. R., 'Learning users' interests by quality classification in market-based recommender systems,' IEEE Trans on Knowledge and Data Engineering, vol.17, no.12, pp.1678-1688, 2005 https://doi.org/10.1109/TKDE.2005.200
  4. Sawar, B. M., Karypis, G., Konstan, J. A., and Riedl, J., 'Application of dimensionality reduction in recommender system – A case study,' In roceedings of ACM WebKDD, 2000
  5. 김종훈, 김용집, 정경용, 임기욱, 이정현, '분류 속성과 Naive Bayesian을 이용한 사용자와 아이템 기반의 협력적 필터링', 한국콘텐츠학회논문지, 제7권, 제11호, 2007
  6. Karypis, G., 'Evaluation of item-based top-N recommendation algorithms,' In Proceedings of the ACM Conference on Information and Knowledge Management, 2000
  7. Canny, J., 'Collaborative Filtering with Privacy via Factor Analysis,' In Proceedings of the 25th ACM SIGIR, 2002
  8. Zhang, S., Wang, W., Ford, J., and Makedon, F., 'Learning from Incomplete Rating Using Nonnegative Matrix Factorization,' In Proceedings of SDM2006, 2006
  9. Chen, G., Wang, F., Zhang C., 'Collaborative filtering using orthogonal nonnegative matrix trifactorization,' Information Processing and Management: an International Journal, vol.45, no.3, 2009 https://doi.org/10.1016/j.ipm.2008.12.004
  10. Wu, M., 'Collaborative Filtering via Ensembles of Matrix Factorizations,' In Proceedings of KDD Cup and Workshop 2007, 2007
  11. Lee, D. and Seung, H., 'Algorithms for nonnegative matrix factorization,' Advances in Neural Information Processing Systems, pp.556-562, 2001
  12. Liu, W. and Yi, J., 'Existing and New algorithms for nonnegative matrix factorization,' Tech. rep., Department of Computer Sciences, University of Texas at Austin, 2003
  13. Xu, W., Liu, X., and Gong, Y, 'Document clustering based on non-negative matrix factorization,' In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, 2003
  14. George, T and Meruge, S., 'A Scalable Collaborative Filtering Framework based on Co-clustering,' In Proceedings of the 5th IEEE Conference on Data Mining (ICDM), 2005
  15. MovieLens collaborative filtering data set, 'Http://www.cs.umn.edu/Research/GroupLens/index.html,' GROUPLENS RESEARCH PROJECT, 2000
  16. Salton, G. and McGill, M. J., Introduction to Modern Information Retrieval, McGraw-Hill, 1983
  17. Churck, K. W. and Hanks, P., 'Word association norms, mutual information, and lexicography,' Computational Linguistics, vol.16, no.1, 1990
  18. Shannon, C. E., 'A mathematical theory of communication,' Bell System Technical Journal, vol.27, pp.379-423, 1948
  19. Torkkola, K. and Campbell, W. M., 'Mutual Information in Learning Feature Transformations,' In Proceedings of Int'l Conf. Machine Learning, 2000
  20. Deshpande, M. and Karypis, G., 'Item-based top-n recommendation algorithms,' ACM Trans. Inf. Syst., vol.22, no.1, 2004
  21. Sarwar, B., Karypis, G., Konstan, J., and Riedl, J., 'Item-based collaborative filtering recommendation algorithms,' In Proceedings of the WWW Conference, 2001
  22. Linden, G., Smith, B. and York, J., 'Amazon.com recommendations: Item-to-item collaborative filtering,' IEEE Internet Computing, 2003
  23. Kim, H., Lee, H., and Seo, J., 'Improving FAQ Retrieval Using Query Log Clustering in semantic space,' In Proceedings of AIRS 2005, pp.233-245, 2005
  24. Herlocker, J., Konstan, J., Terveen, L., and Riedl, J., 'Evaluating Collaborative Filtering Recommender Systems,' ACM Transactions on Information Systems, vol.22, no.1, pp.5-53, 2004 https://doi.org/10.1145/963770.963772
  25. Sarwar, B. M., Konstan, J. A., Borchers, A., Herlocker, J., Miller, B., and Riedl, J., 'Using Filtering Agents to Improve Prediction Quality in the GroupLens Research Collaborative Filtering System,' In Proceedings of CSCW'98, 1998
  26. Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A., 'Indexing by latent semantic analysis,' Journal of the american society of Information Science, vol.41, no.6, 1990
  27. Amershi, S. and Conati, C., 'Unsupervised and supervised machine learning in user modeling for intelligent learning environments,' In Proceedings of the 2007 International Conference on Intelligent User Interfaces, 2007
  28. Rashid, AI M., Lan, S. K., Karypis, G., and Riedl, J., 'ClustKNN: A Highly Scalable Hybrid Model& Memory Based CF Algorithm,' In Proceedings of. WebKDD, 2006