Comparative Study of Dimension Reduction Methods for Highly Imbalanced Overlapping Churn Data

Lee, Sujee;Koo, Bonhyo;Jung, Kyu-Hwan;

doi:10.7232/iems.2014.13.4.454

Industrial Engineering and Management Systems

Volume 13 Issue 4
/
Pages.454-462
/
2014
/
1598-7248(pISSN)
/
2234-6473(eISSN)

Korean Institute of Industrial Engineers (대한산업공학회)

DOI QR Code

Comparative Study of Dimension Reduction Methods for Highly Imbalanced Overlapping Churn Data

Lee, Sujee (Department of Industrial Engineering, Seoul National University) ;
Koo, Bonhyo (Department of Industrial Engineering, Seoul National University) ;
Jung, Kyu-Hwan (Samsung Advanced Institute of Technology, Samsung Electronics Co. Ltd.)

Received : 2014.11.16
Accepted : 2014.12.01
Published : 2014.12.30

https://doi.org/10.7232/iems.2014.13.4.454 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Retention of possible churning customer is one of the most important issues in customer relationship management, so companies try to predict churn customers using their large-scale high-dimensional data. This study focuses on dealing with large data sets by reducing the dimensionality. By using six different dimension reduction methods-Principal Component Analysis (PCA), factor analysis (FA), locally linear embedding (LLE), local tangent space alignment (LTSA), locally preserving projections (LPP), and deep auto-encoder-our experiments apply each dimension reduction method to the training data, build a classification model using the mapped data and then measure the performance using hit rate to compare the dimension reduction methods. In the result, PCA shows good performance despite its simplicity, and the deep auto-encoder gives the best overall performance. These results can be explained by the characteristics of the churn prediction data that is highly correlated and overlapped over the classes. We also proposed a simple out-of-sample extension method for the nonlinear dimension reduction methods, LLE and LTSA, utilizing the characteristic of the data.

Keywords

References

Bengio, Y. (2007), Learning deep architectures for AI, Technical Report 1312, Universite de Montreal, Canada.
Bengio, Y., Paiement, J. F., Vincent, P., Delalleau, O., Le Roux, N., and Ouimet, M. (2004), Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering, Advances in Neural Information Processing Systems, 16, 177-184.
Bhattacharya, C. B. (1998), When customers are members: customer retention in paid membership contexts, Journal of the Academy of Marketing Science, 26(1), 31-44. https://doi.org/10.1177/0092070398261004
Chang, C. C. and Lin, C. J. (2011), LIBSVM: a library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2(3), 27.
Ghahramani, Z. and Hinton, G. E. (1996), The EM algorithm for mixtures of factor analyzers, Technical Report CRG-TR-96-1, University of Toronto, Canada.
He, X. and Niyogi, P. (2004), Locality preserving projections, Advances in Neural Information Processing Systems, 16, 153-160.
Hinton, G. E. and Salakhutdinov, R. R. (2006), Reducing the dimensionality of data with neural networks, Science, 313(5786), 504-507. https://doi.org/10.1126/science.1127647
Hotelling, H. (1933), Analysis of a complex of statistical variables into principal components, Journal of Educational Psychology, 24(6), 417-441. https://doi.org/10.1037/h0071325
Hsu, C. W., Chang, C. C., and Lin, C. J. (2003), A practical guide to support vector classification, Technical Report, Department of Computer Science, National Taiwan University, Taiwan.
Kaiser, H. F. (1960), The application of electronic computers to factor analysis, Educational and Psychological Measurement, 20, 141-151. https://doi.org/10.1177/001316446002000116
Kim, K. and Lee, J. (2012), Sequential manifold learning for efficient churn prediction, Expert Systems with Applications, 39(18), 13328-13337. https://doi.org/10.1016/j.eswa.2012.05.069
Kim, N., Jung, K. H., Kim, Y. S., and Lee, J. (2012), Uniformly subsampled ensemble (USE) for churn management: theory and implementation, Expert Systems with Applications, 39(15), 11839-11845. https://doi.org/10.1016/j.eswa.2012.01.203
Kim, Y. (2006), Toward a successful CRM: variable selection, sampling, and ensemble, Decision Support Systems, 41(2), 542-553. https://doi.org/10.1016/j.dss.2004.09.008
Lee, H., Lee, Y., Cho, H., Im, K., and Kim, Y. S. (2011), Mining churning behaviors and developing retention strategies based on a partial least squares (PLS) model, Decision Support Systems, 52(1), 207-216. https://doi.org/10.1016/j.dss.2011.07.005
Levina, E. and Bickel, P. J. (2004), Maximum likelihood estimation of intrinsic dimension, Advances in Neural Information Processing Systems, 17, 777-784.
Pearson, K. (1901), On lines and planes of closest fit to systems of points in space, Philosophical Magazine Series 6, 2(11), 559-572. https://doi.org/10.1080/14786440109462720
Reinartz, W., Krafft, M., and Hoyer, W. D. (2004), The customer relationship management process: its measurement and impact on performance, Journal of Marketing Research, 41(3), 293-305. https://doi.org/10.1509/jmkr.41.3.293.35991
Rosset, S., Neumann, E., Eick, U., Vatnik, N., and Idan, I. (2001), Evaluation of prediction models for marketing campaigns, Proceedings of the 7th ACM SIG KDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, 456-461.
Rossi, P. E., McCulloch, R., and Allenby, G. (1996), The value of household information in target marketing, Marketing Science, 15(3), 321-340. https://doi.org/10.1287/mksc.15.4.321
Roweis, S. T. and Saul, L. K. (2000), Nonlinear dimensionality reduction by locally linear embedding, Science, 290(5500), 2323-2326. https://doi.org/10.1126/science.290.5500.2323
Spearman, C. (1904), 'General intelligence,' objectively determined and measured, American Journal of Psychology, 15(2), 201-292. https://doi.org/10.2307/1412107
van der Maaten, L. J., Postma, E. O., and van den Herik, H. J. (2009), Dimensionality reduction: a comparative review, Journal of Machine Learning Research, 10(1-41), 66-71.
Zhang, Z. and Zha, H. (2002), Principal manifolds and nonlinear dimensionality reduction via tangent space alignment, SIAM Journal of Scientific Computing, 26(1), 313-338.

Industrial Engineering and Management Systems

Comparative Study of Dimension Reduction Methods for Highly Imbalanced Overlapping Churn Data

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)