DOI QR코드

DOI QR Code

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution

불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측

  • Received : 2014.11.07
  • Accepted : 2014.12.28
  • Published : 2015.03.31

Abstract

Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

고객반응 예측모형은 마케팅 프로모션을 제공할 목표고객을 효과적으로 선정할 수 있도록 하여 프로모션의 효과를 극대화 할 수 있도록 해준다. 오늘날과 같은 빅데이터 환경에서는 데이터 마이닝 기법을 적용하여 고객반응 예측모형을 구축하고 있으며 본 연구에서는 사례기반추론 기반의 고객반응 예측모형을 제시하였다. 일반적으로 사례기반추론 기반의 예측모형은 타 인공지능기법에 비해 성과가 낮다고 알려져 있으나 입력변수의 중요도에 따라 가중치를 상이하게 적용함으로써 예측성과를 향상시킬 수 있다. 본 연구에서는 프로모션에 대한 고객의 반응여부에 영향을 미치는 중요도에 따라 입력변수의 가중치를 산출하여 적용하였으며 동일한 가중치를 적용한 예측모형과의 성과를 비교하였다. 목욕세제 판매데이터를 사용하여 고객반응 예측모형을 개발하고 로짓모형의 계수를 적용하여 입력변수의 중요도에 따라 가중치를 산출하였다. 실증분석 결과 각 변수의 중요도에 기반하여 가중치를 적용한 예측모형이 동일한 가중치를 적용한 예측모형보다 높은 예측성과를 보여주었다. 또한 고객 반응예측 모형과 같이 실생활의 분류문제에서는 두 범주에 속하는 데이터의 수가 현격한 차이를 보이는 불균형 데이터가 대부분이다. 이러한 데이터의 불균형 문제는 기계학습 알고리즘의 성능을 저하시키는 요인으로 작용하며 본 연구에서 제안한 Weighted CBR이 불균형 환경에서도 안정적으로 적용할 수 있는지 검증하였다. 전체데이터에서 100개의 데이터를 무작위로 추출한 불균형 환경에서 100번 반복하여 예측성과를 비교해 본 결과 본 연구에서 제안한 Weighted CBR은 불균형 환경에서도 일관된 우수한 성과를 보여주었다.

Keywords

References

  1. Ahn, H. and K.-j. Kim, "Using genetic algorithms to optimize nearest neighbors for data mining," Annals of Operations Research, Vol.163, No. 1(2008), 5-18. https://doi.org/10.1007/s10479-008-0325-2
  2. Ahn, H., K.-j. Kim, and I. Han, "Purchase Prediction Model using the Support Vector Machine," Journal of Intelligence and Information Systems, Vol.11, No.3(2005), 69-81.
  3. Allen, B. P., "Case-based reasoning: Business applications," Communications of the ACM, Vol. 37, No. 3(1994), 40-42. https://doi.org/10.1145/175247.175250
  4. Barandela, J., S. Sanchez, V. Garcaa, and E. Rangel, "Strategies for Learning in Class Imbalance Problems," Pattern Recognition, Vol. 36, No.3(2003), 849-851. https://doi.org/10.1016/S0031-3203(02)00257-1
  5. Chan, S. L., W. H. Ip, and V. Cho, "A Model for Predicting Customer Value from Perspectives of Product Attractiveness and Marketing Strategy," Expert Systems with Applications, Vol. 37, No. 2(2010), 1207-1215. https://doi.org/10.1016/j.eswa.2009.06.030
  6. Chang, C.-C. and C.-J. Lin, LIBSVM--A Library for Support Vector Machines, 2001. Available at http://www.csie.ntu.edu.tw/-cjlin/libsvm (Accessed6November,2014).
  7. Cheung, K.-W, J. T. Kwok, M. H. Law, and K.-C. Tsui, "Mining Customer Product Ratings for Personalized Marketing," Decision Support Systems, Vol. 35, No. 2(2003), 231-243. https://doi.org/10.1016/S0167-9236(02)00108-2
  8. Chiu, C., "A Case-based Customer Classification Approach for Direct Marketing," Expert Systems with Applications, Vol. 22, No. 2(2002), 163-168. https://doi.org/10.1016/S0957-4174(01)00052-5
  9. Chiu, C., P.-C. Chang, and N.-H. Chiu, "A case-based expert support system for due-date assignment in a wafer fabrication factory," Journal of Intelligent Manufacturing, Vol. 14, No. 3-4(2003), 287-296. https://doi.org/10.1023/A:1024693524603
  10. Chung, S. and Y. Suh, "Development of a Medial Care Cost Prediction Model for Cancer Patients Using Case-Based Reasoning," Asia Pacific Journal of Information Systems, Vol.16, No.2(2006), 69-84.
  11. Coenen, F., G. Swinnen, K. Vanhoof, and G. Wets, "The Improvement of Response Modeling: Combining Rule-induction and Case-based Reasoning," Expert Systems with Applications, Vol. 18, No. 4(2000), 307-313. https://doi.org/10.1016/S0957-4174(00)00012-9
  12. Conul, F. F., B. D. Kim, and M. Shi, "Mailing Smarter to Catalog Customer," Journal of Interactive Marketing, Vol. 14, No.2(2000), 2-16. https://doi.org/10.1002/(SICI)1520-6653(200021)14:2<2::AID-DIR1>3.0.CO;2-N
  13. Cui, G., M. L. Wong, and H.-K. Lui, "Machine Learning for Direct Marketing Response Models: Bayesian Networks with Evolutionary Programming," Management Science, Vol. 52, No. 4(2006), 597-612. https://doi.org/10.1287/mnsc.1060.0514
  14. Ha, K., S. Cho, and D. MacLachlan, "Response Models based on Bagging Neural Networks," Journal of Interactive Marketing, Vol. 19, No. 1(2005), 17-30. https://doi.org/10.1002/dir.20028
  15. Hong, H. and S. Cho, "Case-Based Reasoning Approaches by Considering Variable Covariance Structure and Variable Weight: Corporate Bankruptcy Prediction," Korean Management Review, Vol.38, No.5(2009), 1165-1184.
  16. Hong, T. and J. Park, "Integrating the Customer Response Model in Direct Marketing Using Case-Based Reasoning," The Journal of Information Systems, Vol.18, No.3(2009), 375-399.
  17. Hwang, S. and S. Cho, "Clustering-based Reference Set Reduction for k-nearest Neighbor," Lecture Notes in Computer Science, Vol. 4492(2007), 880-888.
  18. Jang, Y. S., J. W. Kim, and J. Hur, "Combined Application of Data Imbalance Reduction Techniques Using Genetic Algorithm," Journal of Intelligence and Information Systems, Vol.14, No.3(2008), 133-154.
  19. Jarmulak, J., S. Craw, and R. Rowe, "Self-optimizing CBR Retrieval," Proceedings of the Twelfth IEEE International Conference on Tools with Artificial Intelligence, Vancouver, Canada, (2000), 376-383.
  20. Jo, T. and N. Japkowicz, "Class Imbalances versus Small Disjuncts," SIGKDD Explorations Newsletter, Vol. 6, No. 1(2004), 40-49. https://doi.org/10.1145/1007730.1007737
  21. Kang, P., H.-j. Lee, and S. Cho, "SVM Ensemble Techniques for Class Imbalance Problem," Proceedings of Korea Information Science Society Conference, Vol.31, No.2(2004), 706-708.
  22. Kim, D., H.-j. Lee, and S. Cho, "Response Modeling with Support Vector Regression," Expert Systems with Applications, Vol. 34, No. 2(2008), 1102-1108. https://doi.org/10.1016/j.eswa.2006.12.019
  23. Kim, K.-J., "Toward global optimization of case-based reasoning systems for financial forecasting," Applied intelligence, Vol. 21, No. 3(2004), 239-249. https://doi.org/10.1023/B:APIN.0000043557.93085.72
  24. Kim, Y. and W. N. Street, "An Intelligent System for Customer Targeting a Data Mining Approach," Decision Support Systems, Vol. 37, No. 2(2004), 215-228. https://doi.org/10.1016/S0167-9236(03)00008-3
  25. Kolodner, J., Case-Based Reasoning, Morgan Kaufman Publishers, 1993.
  26. Lee, H.-j. and S. Cho, "Focusing on Non-respondents: Response Modeling with Novelty Detectors," Expert Systems with Applications, Vol. 33, No. 2(2007), 522-530. https://doi.org/10.1016/j.eswa.2006.05.016
  27. Lee, J. S. and J. G. Kwon, "A Hybrid SVM Classifier for Imbalanced Data Sets," Journal of Intelligence and Information Systems, Vol.19, No.2(2013). 125-140. https://doi.org/10.13088/jiis.2013.19.2.125
  28. Liu, Y., A. An, and X. Huang, "Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles," Lecture Notes in Computer Science, Vol. 3918(2006), 107-118.
  29. Park, C.-S. and I. Han, "A Case-Based Reasoning with the Feature Weights Derived by Analytic Hierarchy Process for Bankruptcy Prediction," Expert Systems with Applications, Vol. 23, No. 3(2002), 255-264. https://doi.org/10.1016/S0957-4174(02)00045-3
  30. Roh, T.-H., M.-H. Yoo, and I.-G. Han, "Integrating rough set theory and case-based reasoning for the corporate credit evaluation," The Journal of Information Systems, Vol.14, No.1(2005), 41-65.
  31. Rumelhart, D. E. and J. L. McClelland, Parallel Distributing Processing: Exploration in the Microstructure of Cognition, Cambridge, MA: MIT Press. Vol. 1(1986).
  32. Shin, H. and S. Cho, "Response Modeling with Support Vector Machine," Expert Systems with Applications, Vol. 30, No. 4(2006), 746-760. https://doi.org/10.1016/j.eswa.2005.07.037
  33. Shin, K.-s., and I. Han, "Case-based reasoning supported by genetic algorithms for corporate bond rating," Expert Systems with Applications, Vol. 16, No. 2(1999), 85-95. https://doi.org/10.1016/S0957-4174(98)00063-3
  34. Shmueli, G., N. R. Patel, and P. C. Bruce, Data Mining for Business Intelligence, Wiley, 2007.
  35. Vapnik, V., The Nature of Statistical Learning Theory, Springer, 1995
  36. Weiss, G. M. and F. Provost, "The effect of class distribution on classifier learning," Technical Report, Department of Computer Science, Rutgers University, 2001.
  37. Yu, E. and S. Cho, "Constructing Response Model using Ensemble based on Feature Subset Selection," Expert Systems with Applications, Vol. 30, No. 2(2006), 352-360. https://doi.org/10.1016/j.eswa.2005.07.026