DOI QR코드

DOI QR Code

A Multimodal Profile Ensemble Approach to Development of Recommender Systems Using Big Data

빅데이터 기반 추천시스템 구현을 위한 다중 프로파일 앙상블 기법

  • Kim, Minjeong (Department of Data Science, Kookmin University) ;
  • Cho, Yoonho (School of Business Administration, Kookmin University)
  • 김민정 (국민대학교 데이터사이언스학과) ;
  • 조윤호 (국민대학교 경영대학 경영학부)
  • Received : 2015.11.25
  • Accepted : 2015.12.14
  • Published : 2015.12.30

Abstract

The recommender system is a system which recommends products to the customers who are likely to be interested in. Based on automated information filtering technology, various recommender systems have been developed. Collaborative filtering (CF), one of the most successful recommendation algorithms, has been applied in a number of different domains such as recommending Web pages, books, movies, music and products. But, it has been known that CF has a critical shortcoming. CF finds neighbors whose preferences are like those of the target customer and recommends products those customers have most liked. Thus, CF works properly only when there's a sufficient number of ratings on common product from customers. When there's a shortage of customer ratings, CF makes the formation of a neighborhood inaccurate, thereby resulting in poor recommendations. To improve the performance of CF based recommender systems, most of the related studies have been focused on the development of novel algorithms under the assumption of using a single profile, which is created from user's rating information for items, purchase transactions, or Web access logs. With the advent of big data, companies got to collect more data and to use a variety of information with big size. So, many companies recognize it very importantly to utilize big data because it makes companies to improve their competitiveness and to create new value. In particular, on the rise is the issue of utilizing personal big data in the recommender system. It is why personal big data facilitate more accurate identification of the preferences or behaviors of users. The proposed recommendation methodology is as follows: First, multimodal user profiles are created from personal big data in order to grasp the preferences and behavior of users from various viewpoints. We derive five user profiles based on the personal information such as rating, site preference, demographic, Internet usage, and topic in text. Next, the similarity between users is calculated based on the profiles and then neighbors of users are found from the results. One of three ensemble approaches is applied to calculate the similarity. Each ensemble approach uses the similarity of combined profile, the average similarity of each profile, and the weighted average similarity of each profile, respectively. Finally, the products that people among the neighborhood prefer most to are recommended to the target users. For the experiments, we used the demographic data and a very large volume of Web log transaction for 5,000 panel users of a company that is specialized to analyzing ranks of Web sites. R and SAS E-miner was used to implement the proposed recommender system and to conduct the topic analysis using the keyword search, respectively. To evaluate the recommendation performance, we used 60% of data for training and 40% of data for test. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. A widely used combination metric called F1 metric that gives equal weight to both recall and precision was employed for our evaluation. As the results of evaluation, the proposed methodology achieved the significant improvement over the single profile based CF algorithm. In particular, the ensemble approach using weighted average similarity shows the highest performance. That is, the rate of improvement in F1 is 16.9 percent for the ensemble approach using weighted average similarity and 8.1 percent for the ensemble approach using average similarity of each profile. From these results, we conclude that the multimodal profile ensemble approach is a viable solution to the problems encountered when there's a shortage of customer ratings. This study has significance in suggesting what kind of information could we use to create profile in the environment of big data and how could we combine and utilize them effectively. However, our methodology should be further studied to consider for its real-world application. We need to compare the differences in recommendation accuracy by applying the proposed method to different recommendation algorithms and then to identify which combination of them would show the best performance.

기존의 협업필터링 추천시스템 연구는 상품에 대한 고객의 평점(rating)이나 구매 여부 데이터로부터 하나의 프로파일을 생성하고 이를 기반으로 추천 성능을 향상시킬 수 있는 새로운 알고리즘을 개발하는 위주로 진행되어 왔다. 그러나 빅데이터 환경이 도래하면서 기업이 수집할 수 있는 고객 데이터가 풍부해지고 다양해짐에 따라, 보다 정확하게 고객의 선호도나 행태를 파악하는 것이 가능하게 되었고 이러한 데이터, 즉 퍼스널 빅데이터(personal big data)를 추천시스템에 활용하는 연구의 필요성이 대두되고 있다. 본 연구에서는 마케팅의 시장세분화 이론에 근거하여 퍼스널 빅데이터로부터 고객의 선호도나 행태를 다양한 관점에서 표현할 수 있는 5종의 다중 프로파일(multimodal profile)을 개발하고, 이를 활용하여 협업필터링 추천시스템의 성능을 개선하고자 한다. 제안하는 5종의 다중 프로파일은 프로파일 통합 유사도, 개별 프로파일 유사도 평균, 개별 프로파일 유사도 가중 평균이라는 세 가지 앙상블 기법을 통해 협업필터링의 이웃(neighborhood) 탐색과정에 적용된다. 실제 퍼스널 빅데이터에 본 연구에서 제안하는 방법론을 적용한 결과, 단일 프로파일을 사용하는 협업필터링 알고리즘보다 추천 성능이 상당히 개선되었으며 앙상블 방법 중에서는 개별 프로파일 유사도 가중 평균 기법이 가장 높은 추천 성능을 보여주었다. 본 연구는 빅데이터 환경에서 추천시스템을 개발하고자 할 때, 어떠한 성격의 데이터로부터 고객의 특성을 규명하는 프로파일을 만들고 이를 어떻게 결합하여 사용하는 것이 효과적인 지 처음으로 제안하였다는 점에서 그 의의가 있다.

Keywords

References

  1. Bar, A., G. Rokach, G. Shani, B. Shapira, and A. Schclar, "Improving simple collaborative filtering models using ensemble methods," Multiple Classifier Systems, Springer, (2013), 1-12.
  2. Billsus, D. and M. J. Pazzani, "Learning Collaborative Information Filters," ICML, Vol.98, (1998), 46-54.
  3. Bok, K. S. and J. S. Yoo, "Activation Policy and Case Study of Big Data," The Journal of Korean Institute of Communication Sciences, Vol.31, No.11(2014), 3-13.
  4. Cabral, B., R. D. Beltro, and M. G., Manzato, "Combining Multiple Metadata Types in Movies Recommendation Using Ensemble Algorithms," Proceedings of the 20th Brazilian Symposium on Multimedia and the Web, (2014), 231-238.
  5. Claycamp, H. J. and W. F. Massy, "A Theory of Market Segmentation," Journal of Marketing Research, Vol.5, No.4(1968), 388-394. https://doi.org/10.2307/3150263
  6. Goldberg, D., D. Nichols, B. M. Oki, and D. Terry, "Using Collaborative filtering to weave an information Tapestry," Communications of the ACM, Vol.35, No.12(1992), 61-70.
  7. Gower, J. C., "A General Coefficient of Similarity and Some of Its Properties," Biometrics, Vol.27, No.4(1971), 857-871. https://doi.org/10.2307/2528823
  8. Gurrin, C., A. F. Smeaton, and A. R. Doherty, "LifeLogging: Personal Big Data," Foundations and Trends in Information Retrieval, Vol.8, No.1(2014), 1-107. https://doi.org/10.1561/1500000033
  9. Herlocker, J. L., J. A. Konstan, and J. Riedl, "An algorithmic framework for performing collaborative filtering," Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, (1999), 230-237.
  10. Hyun, Y., N. Kim, and Y. Cho, "Interest-based Customer Segmentation Methodology Using Topic Modeling," Journal of Information Technology Applications & Management, Vol.22, No.1(2015), 77-93. https://doi.org/10.21219/JITAM.2015.22.1.077
  11. Herlocker, J. L., J. A. Konstan, L. G., Terveen, and J. Riedl, "Evaluating Collaborative Filtering Recommender Systems," ACM Transactions on Information Systems, Vol.22, No.1(2004), 5-53. https://doi.org/10.1145/963770.963772
  12. Kim, N.-H., "A Study on the Improvement of Web-log Analysis in Internet shopping-Mall," Proceedings of Korea Intelligent Information System Society, (2002), 134-139.
  13. Kim, J.-H., B.-H. Ahn, and D. Jeong, "A Recommender System using Mixed Filtering for Health Products," The Journal of Internet Electronic Commerce Research, Vol.12, No.2(2012), 109-124.
  14. Kim, K. H. and S. R., Oh, "Methodology for Applying Text Mining Techniques to Analyzing Online Customer Reviews for Market Segmentation," Journal of the Korea Contents Association, Vol.9, No.8(2009), 272-284. https://doi.org/10.5392/JKCA.2009.9.8.272
  15. Kim, Y., J. Moon, H. J. Lee, and C. S., Bae, "Knowledge Digest Engine for Personal Bigdata Analysis," Human Centric Technology and Service in Smart Space, Springer Netherlands, 2012.
  16. Lee, J. S. and S. D. Park, "Performance Improvement of a Movie Recommendation System using Genre-wise Collaborative Filtering," Journal of Intelligence and Information Systems, Vol.13, No.4(2007), 65-78.
  17. Lee, Y. and K.-j. Kim, "Product Recommender Systems using Multi-Model Ensemble Techniques," Journal of Intelligence and Information Systems, Vol.19, No.2(2013), 39-54. https://doi.org/10.13088/jiis.2013.19.2.039
  18. Linden, G., B. Simth, and J. York, "Amazon.com recommendations: Item-to-item collaborative filtering," IEEE Internet Computing, Vol.7, No.1(2003), 76-80. https://doi.org/10.1109/MIC.2003.1167344
  19. Mazanec, J. A. "Market Segmentation," J. Jafari(Ed), Encyclopedia of Tourism, London:Routledge, 2000.
  20. Middleton, S. E., Shadbolt, N. R., and De Roure, D. C., "Ontological User Profiling in Recommneder Systems," ACM Transactions on Information Systems, Vol.22, No.1(2004), 54-88. https://doi.org/10.1145/963770.963773
  21. Niwattanakul,S., J. Singthongchai, E. Naenudorn, and S, Wanapu, "Using Jaccard Coefficient for Keywords Similarity," Proceedings of the International MultiConference of Engineers and Computer Scientists, Vol.1(2013).
  22. Park, Y.-J., E.-J. Jung, and K.-N. Chang, "Customer Behavior Based Customer Profiling Technique for Personalized Products Recommendation," Korean Management Science Review, Vol.23, No.3(2006), 183-194.
  23. Pazzani, M., "A Framework for Collaborative, Content-Based, and Demographic Filtering," Artificial Intelligence Review, Vol.13, No.5-6(1999), 393-408. https://doi.org/10.1023/A:1006544522159
  24. Piotte, M. and M. Chabbert, "The Pramatic theory solution to the Netflix grand prize," Netflix prize documentation, 2009.
  25. Ward, J. S. and A. Barker, "Undefined By Data: A Survey of Big Data Definitions," The Computing Research Repository, 2013.
  26. Weng, S. S. and M. J. Liu, "Feature-based recommendations for one-to-one marketing," Expert Systems with Applications, Vol.26, No.4(2004), 493-508. https://doi.org/10.1016/j.eswa.2003.10.008

Cited by

  1. 온라인 구매 행태를 고려한 토픽 모델링 기반 도서 추천 vol.18, pp.4, 2015, https://doi.org/10.15813/kmr.2017.18.4.004
  2. 시각 정보를 활용한 딥러닝 기반 추천 시스템 vol.21, pp.3, 2015, https://doi.org/10.15813/kmr.2020.21.3.002