DOI QR코드

DOI QR Code

A Reinforcement Learning Approach to Collaborative Filtering Considering Time-sequence of Ratings

평가의 시간 순서를 고려한 강화 학습 기반 협력적 여과

  • Received : 2011.05.18
  • Accepted : 2011.07.31
  • Published : 2012.02.29

Abstract

In recent years, there has been increasing interest in recommender systems which provide users with personalized suggestions for products or services. In particular, researches of collaborative filtering analyzing relations between users and items has become more active because of the Netflix Prize competition. This paper presents the reinforcement learning approach for collaborative filtering. By applying reinforcement learning techniques to the movie rating, we discovered the connection between a time sequence of past ratings and current ratings. For this, we first formulated the collaborative filtering problem as a Markov Decision Process. And then we trained the learning model which reflects the connection between the time sequence of past ratings and current ratings using Q-learning. The experimental results indicate that there is a significant effect on current ratings by the time sequence of past ratings.

최근 사용자의 흥미에 맞는 아이템이나 서비스를 추천해 주는 추천 시스템에 대한 관심이 높아지고 있다. 최근 종료된 Netflix 경연대회(Netflix Prize)가 이 분야에 대한 연구자들의 연구 의욕을 고취시켰고, 특히 협력적 여과(Collaborative Filtering) 방법은 아이템의 종류에 상관없이 적용 가능한 범용성 때문에 활발히 연구되고 있다. 본 논문은 강화 학습을 이용해서 추천 시스템의 협력적 여과 문제를 푸는 방법을 제안한다. 강화 학습을 통해, 영화 평점 데이터에서 각 사용자가 평점을 매긴 순서에 따른 평점 간의 연관 관계를 학습하고자 하였다. 이를 위해 협력적 여과문제를 마르코프 결정 과정(Markov Decision Process)로 수학적으로 모델링하였고, 강화 학습의 가장 대표적인 알고리즘인 Q-learning을 사용해서 평가의 순서의 연관 관계를 학습하였다. 그리고 실제로 평가의 순서가 평가에 미치는 영향이 있음을 실험을 통해서 검증하였다.

Keywords

References

  1. B. M. Sarwar and G. Karypis, J. A. Konstan, and J. T Riedl., "Application of Dimensionality Reduction in Recommender System-A Case Study," ACM WebKDD 2000 Web Mining for E-Commerce Workshop, 2000.
  2. B. M. Sarwar and G. Karypis, J. A. Konstan, and J.T. Riedl, "Item-based collaborative filtering recommendation algorithms," Proceedings of the 10th international conference on World Wide Web, pp.285-295, 2001.
  3. Netflix Prize, http://www.netflixprize.com
  4. A. Paterek, "Improving regularized singular value decomposition for collaborative Filtering", KDD-Cup and Workshop, ACM press, 2007.
  5. R, Salakhutdinov, A. Mnih and G. Hinton, "Restricted Boltzmann machines for collaborative Filtering", Proceedings of the 24th International Conference on Machine Learning, 2007.
  6. R. Bell and Y. Koren, "Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights", IEEE International Conference on Data Mining, IEEE, 2007.
  7. G. Gorrell and B. Webb, "Generalized hebbian algorithm for incremental latent semantic analysis", Proceedings of Interspeech, 2006.
  8. B. Webb, "Netflix update: Try this at home", http://sifter.org/simon/journal/20061211.html, 2006.
  9. Y. Koren, "Factorization meets the neighborhood: a multifaceted collaborative filtering model", Proceedings of the 14th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, pp.426-434, 2008.
  10. R. Bellman, "A Markovian Decision Process", Journal of Mathematics and Mechanics 6, 1957.
  11. C. Watkins, "Learning from Delayed Rewards", PhD thesis, Cambridge University, Cambridge, England, 1989.
  12. MovieLens, http://www.movielens.umn.edu
  13. L. Hansen and P. Salamon, "Neural Network Ensembles", IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.12, pp.993-1001, 1990. https://doi.org/10.1109/34.58871
  14. G. Shani, D. Heckerman and R. Brafman, "An MDP-based recommender system", Journal of Machine Learning Research, Vol.6, No.2, pp.1265-1295, 2006.