DOI QR코드

DOI QR Code

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems

소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결

  • Kim, Minsung (School of Business, Yonsei University) ;
  • Im, Il (School of Business, Yonsei University)
  • 김민성 (연세대학교 경영대학) ;
  • 임일 (연세대학교 경영대학)
  • Received : 2014.06.15
  • Accepted : 2014.06.22
  • Published : 2014.06.30

Abstract

Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

. Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

상품 검색시간의 단축과 쇼핑에 투입되는 노력의 감소 등, 온라인 쇼핑이 주는 장점에 대한 긍정적인 인식이 확산되면서 전자상거래(e-commerce)의 중요성이 부각되는 추세이다. 전자상거래 기업들은 고객확보를 위해 다양한 인터넷 고객관계 관리(eCRM) 활동을 전개하고 있는데, 개인화된 추천 서비스의 제공은 그 중 하나이다. 정확한 추천 시스템의 구축은 전자상거래 기업의 성과를 좌우하는 중요한 요소이기 때문에, 추천 서비스의 정확도를 높이기 위한 다양한 알고리즘들이 연구되어 왔다. 특히 협업필터링(collaborative filtering: CF)은 가장 성공적인 추천기법으로 알려져 있다. 그러나 고객이 상품을 구매한 과거의 전자상거래 기록을 바탕으로 미래의 추천을 하기 때문에 많은 단점들이 존재한다. 신규 고객의 경우 유사한 구매 성향을 가진 고객들을 찾기 어렵고 (Cold-Start problem), 상품 수에 비해 구매기록이 부족할 경우 상관관계를 도출할 데이터가 희박하게 되어(Sparsity) 추천성능이 떨어지게 된다. 취향이 독특한 사용자를 뜻하는 'Gray Sheep'에 의한 추천성능의 저하도 그 중 하나이다. 이러한 문제인식을 토대로, 본 연구에서는 소셜 네트워크 분석기법 (Social Network Analysis: SNA)과 협업필터링을 결합하여 데이터셋의 특이 취향 사용자 (Gray Sheep) 문제를 해소하는 방법을 제시한다. 취향이 독특한 고객들의 구매데이터를 소셜 네트워크 분석지표를 활용하여 전체 데이터에서 분리해낸다. 그리고 분리한 데이터와 나머지 데이터인 두 가지 데이터셋에 대하여 각기 다른 유사도 기법과 트레이닝 셋을 적용한다. 이러한 방법을 사용한 추천성능의 향상을 검증하기 위하여 미국 미네소타 대학 GroupLens 연구팀에 의해 수집된 무비렌즈 데이터(http://movielens.org)를 활용하였다. 검증결과, 일반적인 협업필터링 추천시스템에 비하여 이 기법을 활용한 협업필터링의 추천성능이 향상됨을 확인하였다.

Keywords

References

  1. Ahn, S. M., I. H. Kim, B. G. Choi, Y. H. Cho, E. H. Kim, and M. Y. Kim, "Understanding the Performance of Collaborative Filtering Recommendation through Social Network Analysis" Society for e-Business Studies, Vol.17, No.2(2012), 129-147. https://doi.org/10.7838/jsebs.2012.17.2.129
  2. Barragans-Martinez, A. B., E. Costa-Montenegro, J. C. Burguillo, M. Rey-Lopez, F. A. Mikic-Fonte and A. Peleteiro, "A Hybrid Content-Based and Item-Based Collaborative Filtering Approach to Recommend Tv Programs Enhanced with Singular Value Decomposition," Information Sciences, Vol.180, No.22(2010), 4290-4311. https://doi.org/10.1016/j.ins.2010.07.024
  3. Claypool, M., A. Gokhale, T. Miranda, P. Murnikov, D. Netes, and M. Sartin. "Combining Content-Based and Collaborative Filters in an Online Newspaper," Proceedings of ACM SIGIR workshop on recommender systems, (1999).
  4. Dorogovtsev, S. N. and J. F. F. Mendes, "Evolution of Networks," Advances in Physics, Vol.51, No.4 (2002), 1079-1187. https://doi.org/10.1080/00018730110112519
  5. Ghazanfar, M. A. and A. Prügel-Bennett, "Leveraging Clustering Approaches to Solve the Gray-Sheep Users Problem in Recommender Systems," Expert Systems with Applications, Vol.41, No.7 (2014): 3261-3275. https://doi.org/10.1016/j.eswa.2013.11.010
  6. Goldberg, D., D. Nichols, B. M. Oki, and D. Terry, "Using Collaborative Filtering to Weave an Information Tapestry," Communications of the Acm, Vol.35, No.12(1992), 61-70.
  7. Herlocker, J. L., J. A. Konstan, and J. T. Riedl, "An Empirical Analysis of Design Choices in Neighborhood-Based Collaborative Filtering Algorithms," Information Retrieval, Vol.5, No.4 (2002), 287-310. https://doi.org/10.1023/A:1020443909834
  8. Herlocker, J. L., J. A. Konstan, L. G. Terveen and J. T. Riedl, "Evaluating collaborative filtering recommender systems," ACM Transactions on Information Systems (TOIS), Vol.22, No.1(2004), 5-53. https://doi.org/10.1145/963770.963772
  9. Hung, L. P., "A Personalized Recommendation System Based on Product Taxonomy for One-to-One Marketing Online," Expert Systems with Applications, Vol.29, No.2(2005), 383-392. https://doi.org/10.1016/j.eswa.2005.04.016
  10. Im, I. and A. Hars, "Does a One-Size Recommendation System Fit All? The Effectiveness of Collaborative Filtering Based Recommendation Systems across Different Domains and Search Modes," ACM Transactions on Information Systems (TOIS), Vol.26, No.1(2007).
  11. Kim, H. K., J. K. Kim, and Q.-Y. Chen, "A Network Approach to Derive Product Relations and Analyze Topological Characteristics", Journal of Intelligence and Information Systems, Vol.15, No.4 (2009), 159-182.
  12. Kim, Y. H., Social Network Analysis, Pakyoungsa, Seoul, 2013.
  13. Konstan, J. A., B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and J. Riedl, "Grouplens: Applying Collaborative Filtering to Usenet News," Communications of the ACM, Vol.40, No.3(1997), 77-87. https://doi.org/10.1145/245108.245126
  14. Konstan, J. A., J. Riedl, A. Borchers, and J. L. Herlocker, "Recommender Systems: A Grouplens Perspective," In Recommender Systems: Papers from the 1998 Workshop (AAAI Technical Report WS-98-08), (1998), 60-64.
  15. Lee, Y. J., S. H. Lee,and C. J. Wang, "Improving Sparsity Problem of Collaborative Filtering in Educational Contents Recommendation System,", Proceedings of Korea Information Science Society, Vol.30, No.1(A)(2003), 830-832.
  16. Newman, M. E. J., "The Structure and Function of Complex Networks," Siam Review, Vol.45, No.2 (2003), 167-256. https://doi.org/10.1137/S003614450342480
  17. Park, J. H. and Y. H. Cho, "Social Network Analysis for the Effective Adoption of Recommender Systems," Journal of Intelligence and Information Systems, Vol.17, No.4(2011), 305-316.
  18. Pham, M. C., Y. Cao, R. Klamma, and M. Jarke, "A Clustering Approach for Collaborative Filtering Recommendation Using Social Network Analysis," Journal of Universal Computer Science, Vol.17, No.4(2011), 583-604.
  19. Sarwar, B., G. Karypis, J. Konstan, and J. Riedl, "Analysis of Recommendation Algorithms for E-Commerce," Proceedings of the 2nd ACM conference on Electronic commerce, (2000), 158-167.
  20. Shin, C. H., J. W. Lee, H. N. Yang, and I. Y. Choi, "The Research on Recommender for New Customers Using Collaborative Filtering and Social Network Analysis,", Journal of Intelligence and Information Systems, Vol.18, No.4(2012), 19-42.
  21. Sohn, D. W., Social Network Analysis, Kyungmoon publishers, Seoul, 2008.
  22. Su, X. and T. M. Khoshgoftaar, "Collaborative Filtering for Multi-Class Data Using Bayesian Networks," International Journal on Artificial Intelligence Tools, Vol.17, No.1(2008), 71-85. https://doi.org/10.1142/S0218213008003789

Cited by

  1. Enhancing Predictive Accuracy of Collaborative Filtering Algorithms using the Network Analysis of Trust Relationship among Users vol.22, pp.3, 2016, https://doi.org/10.13088/jiis.2016.22.3.113
  2. Movie Popularity Classification Based on Support Vector Machine Combined with Social Network Analysis vol.16, pp.3, 2017, https://doi.org/10.9716/kits.2017.16.3.167
  3. Evolution of recommender system over the time vol.23, pp.23, 2014, https://doi.org/10.1007/s00500-019-04143-8
  4. Evaluating the Quality of Recommendation System by Using Serendipity Measure vol.25, pp.4, 2014, https://doi.org/10.13088/jiis.2019.25.4.089