• Title/Summary/Keyword: Ensemble

Search Result 1,373, Processing Time 0.02 seconds

A Methodology of Customer Churn Prediction based on Two-Dimensional Loyalty Segmentation (이차원 고객충성도 세그먼트 기반의 고객이탈예측 방법론)

  • Kim, Hyung Su;Hong, Seung Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.111-126
    • /
    • 2020
  • Most industries have recently become aware of the importance of customer lifetime value as they are exposed to a competitive environment. As a result, preventing customers from churn is becoming a more important business issue than securing new customers. This is because maintaining churn customers is far more economical than securing new customers, and in fact, the acquisition cost of new customers is known to be five to six times higher than the maintenance cost of churn customers. Also, Companies that effectively prevent customer churn and improve customer retention rates are known to have a positive effect on not only increasing the company's profitability but also improving its brand image by improving customer satisfaction. Predicting customer churn, which had been conducted as a sub-research area for CRM, has recently become more important as a big data-based performance marketing theme due to the development of business machine learning technology. Until now, research on customer churn prediction has been carried out actively in such sectors as the mobile telecommunication industry, the financial industry, the distribution industry, and the game industry, which are highly competitive and urgent to manage churn. In addition, These churn prediction studies were focused on improving the performance of the churn prediction model itself, such as simply comparing the performance of various models, exploring features that are effective in forecasting departures, or developing new ensemble techniques, and were limited in terms of practical utilization because most studies considered the entire customer group as a group and developed a predictive model. As such, the main purpose of the existing related research was to improve the performance of the predictive model itself, and there was a relatively lack of research to improve the overall customer churn prediction process. In fact, customers in the business have different behavior characteristics due to heterogeneous transaction patterns, and the resulting churn rate is different, so it is unreasonable to assume the entire customer as a single customer group. Therefore, it is desirable to segment customers according to customer classification criteria, such as loyalty, and to operate an appropriate churn prediction model individually, in order to carry out effective customer churn predictions in heterogeneous industries. Of course, in some studies, there are studies in which customers are subdivided using clustering techniques and applied a churn prediction model for individual customer groups. Although this process of predicting churn can produce better predictions than a single predict model for the entire customer population, there is still room for improvement in that clustering is a mechanical, exploratory grouping technique that calculates distances based on inputs and does not reflect the strategic intent of an entity such as loyalties. This study proposes a segment-based customer departure prediction process (CCP/2DL: Customer Churn Prediction based on Two-Dimensional Loyalty segmentation) based on two-dimensional customer loyalty, assuming that successful customer churn management can be better done through improvements in the overall process than through the performance of the model itself. CCP/2DL is a series of churn prediction processes that segment two-way, quantitative and qualitative loyalty-based customer, conduct secondary grouping of customer segments according to churn patterns, and then independently apply heterogeneous churn prediction models for each churn pattern group. Performance comparisons were performed with the most commonly applied the General churn prediction process and the Clustering-based churn prediction process to assess the relative excellence of the proposed churn prediction process. The General churn prediction process used in this study refers to the process of predicting a single group of customers simply intended to be predicted as a machine learning model, using the most commonly used churn predicting method. And the Clustering-based churn prediction process is a method of first using clustering techniques to segment customers and implement a churn prediction model for each individual group. In cooperation with a global NGO, the proposed CCP/2DL performance showed better performance than other methodologies for predicting churn. This churn prediction process is not only effective in predicting churn, but can also be a strategic basis for obtaining a variety of customer observations and carrying out other related performance marketing activities.

The Prediction of Export Credit Guarantee Accident using Machine Learning (기계학습을 이용한 수출신용보증 사고예측)

  • Cho, Jaeyoung;Joo, Jihwan;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.83-102
    • /
    • 2021
  • The government recently announced various policies for developing big-data and artificial intelligence fields to provide a great opportunity to the public with respect to disclosure of high-quality data within public institutions. KSURE(Korea Trade Insurance Corporation) is a major public institution for financial policy in Korea, and thus the company is strongly committed to backing export companies with various systems. Nevertheless, there are still fewer cases of realized business model based on big-data analyses. In this situation, this paper aims to develop a new business model which can be applied to an ex-ante prediction for the likelihood of the insurance accident of credit guarantee. We utilize internal data from KSURE which supports export companies in Korea and apply machine learning models. Then, we conduct performance comparison among the predictive models including Logistic Regression, Random Forest, XGBoost, LightGBM, and DNN(Deep Neural Network). For decades, many researchers have tried to find better models which can help to predict bankruptcy since the ex-ante prediction is crucial for corporate managers, investors, creditors, and other stakeholders. The development of the prediction for financial distress or bankruptcy was originated from Smith(1930), Fitzpatrick(1932), or Merwin(1942). One of the most famous models is the Altman's Z-score model(Altman, 1968) which was based on the multiple discriminant analysis. This model is widely used in both research and practice by this time. The author suggests the score model that utilizes five key financial ratios to predict the probability of bankruptcy in the next two years. Ohlson(1980) introduces logit model to complement some limitations of previous models. Furthermore, Elmer and Borowski(1988) develop and examine a rule-based, automated system which conducts the financial analysis of savings and loans. Since the 1980s, researchers in Korea have started to examine analyses on the prediction of financial distress or bankruptcy. Kim(1987) analyzes financial ratios and develops the prediction model. Also, Han et al.(1995, 1996, 1997, 2003, 2005, 2006) construct the prediction model using various techniques including artificial neural network. Yang(1996) introduces multiple discriminant analysis and logit model. Besides, Kim and Kim(2001) utilize artificial neural network techniques for ex-ante prediction of insolvent enterprises. After that, many scholars have been trying to predict financial distress or bankruptcy more precisely based on diverse models such as Random Forest or SVM. One major distinction of our research from the previous research is that we focus on examining the predicted probability of default for each sample case, not only on investigating the classification accuracy of each model for the entire sample. Most predictive models in this paper show that the level of the accuracy of classification is about 70% based on the entire sample. To be specific, LightGBM model shows the highest accuracy of 71.1% and Logit model indicates the lowest accuracy of 69%. However, we confirm that there are open to multiple interpretations. In the context of the business, we have to put more emphasis on efforts to minimize type 2 error which causes more harmful operating losses for the guaranty company. Thus, we also compare the classification accuracy by splitting predicted probability of the default into ten equal intervals. When we examine the classification accuracy for each interval, Logit model has the highest accuracy of 100% for 0~10% of the predicted probability of the default, however, Logit model has a relatively lower accuracy of 61.5% for 90~100% of the predicted probability of the default. On the other hand, Random Forest, XGBoost, LightGBM, and DNN indicate more desirable results since they indicate a higher level of accuracy for both 0~10% and 90~100% of the predicted probability of the default but have a lower level of accuracy around 50% of the predicted probability of the default. When it comes to the distribution of samples for each predicted probability of the default, both LightGBM and XGBoost models have a relatively large number of samples for both 0~10% and 90~100% of the predicted probability of the default. Although Random Forest model has an advantage with regard to the perspective of classification accuracy with small number of cases, LightGBM or XGBoost could become a more desirable model since they classify large number of cases into the two extreme intervals of the predicted probability of the default, even allowing for their relatively low classification accuracy. Considering the importance of type 2 error and total prediction accuracy, XGBoost and DNN show superior performance. Next, Random Forest and LightGBM show good results, but logistic regression shows the worst performance. However, each predictive model has a comparative advantage in terms of various evaluation standards. For instance, Random Forest model shows almost 100% accuracy for samples which are expected to have a high level of the probability of default. Collectively, we can construct more comprehensive ensemble models which contain multiple classification machine learning models and conduct majority voting for maximizing its overall performance.

A Study of the Time-Space and Appreciation for the Performance Culture of Gwanseo Region in Late Joseon Period: Focusing on Analysis of Terminology (조선후기 관서지방의 공연 시공간과 향유에 관한 연구)

  • Song, Hye-jin
    • (The) Research of the performance art and culture
    • /
    • no.22
    • /
    • pp.287-325
    • /
    • 2011
  • This paper studies the time-space and appreciation of the performance culture of Gwanseo region, which is considered to have formed a characteristic culture in late Joseon period. For this purpose, 4 gasa written in hangeul (Korean alphabet), as well as 4 yeonhaeng gasa, 108 articles of Gwanseoakbu were examined. Plus, among the 9 types of yeonhaengrok (Documents of Performance culture) written in Chinese character, those parts which describe the performance traits have been analyzed. Then, 'main list of terminology' has been deduced based on the categorization according to the following points : 1) subjects of performance and appreciation 2) time and period of performance 3) space of performance 4) contents of performance 5) background and motive for performance and 6) method of performance. Through this process, various 'nouns' and 'predicate verbs' in relation to performance culture emerged, which were systemized according to types of performance elements and categories. Major terminology includes predicate verbs and symbolic verbs such as nokuihongsang,' 'baekdaehongjang,' 'jeolsaekgeumga,' 'cheonga,' 'hwaryu,' 'gamuja,' and 'tongsoja,' as well as the terms already known such as gisaeng, iwon, yangbang, akgong, and jeonak, which refer to musicians and dancers. Subjects of performance were divided into performers and listeners, categorized into concert, music, and dance, according to performance form. In the case for music, it was divided into instrumental or vocal, solo or accompanied (byeongju, self-accompaniment). In the case for vocal music, noteworthy was the inclusion of profesional artist's singing (called gwangdae or uchang). The record of 23 names of popular artists from Gwanseo region, with mention of special talents for each person, reflects the degree of activeness and artistic level of the province. Depending on the appreciating patrons, the audience were indicated as the terms including 'yugaek (party guest),' jwasang,' 'on jwaseok,' and 'sonnim (guests).' It seems that appraisal for a certain performance was very much affected by the tastes, views, and disposition of the appreciating patrons. Therefore it is interesting to observe different comparative reviews of concerts of different regions given by literary figures, offering various criticism on identical performance. In terms of performance space, it has been divided into natural or architectural space, doing justice to special performance sites such as a famous pavilion or an on-the-boat performance. Specific terms related to the scale and brightness of stage, as well as stage props and cast, based on descriptions of performance space were found. The performance space, including famous pavilions; Yeongwangjeong, Bubyeokru, Baeksangru, Wolparu, and Uigeomjeong, which are all well-known tourist sites of Gwanseo province, have been often visited by viceroys. governors, and envoys during a tour or trip. This, and the fact that full-scale performances were regularly held here, and that more than 15 different kinds of boats which were used for boat concert are mentioned, all confirm the general popularity of boat concerts at the time. Performance time, categorized by season or time of day (am/pm/night) and analyzed in terms of time of occurrence and duration, there were no special limitation as to when to have a performance. Most morning concerts were held as part of official duties for the envoys, after their meeting session, whereas evening concerts were more lengthy in duration, with a greater number of people in the audience. In the case of boat concert, samples include day-time concert and performances that began during the day and which lasted till later in the evening. Major terminology related to performance time and season includes descriptions of time of day (morning, evening, night) and mention of sunset, twilight, moonlight, stars, candles, and lamps. Such terms which reflect the flow of time contributed in making a concert more lively. Terminology for the contents of performance was mostly words like 'instrumental,' 'pungak,' or 'pungnyu.' Besides, contextual expressions gave hints as to whether there were dance, singing, ensemble, solo, and duets. Words for dance and singing used in Gwanseo province were almost identical to those used for gasa and jeongjae in the capital, Hanyang. However, many sentences reveal that performances of 'hangjangmu' of hongmunyeon, sword dance, and baettaragi were on a top-quality level. Moreover, chants in hanmun Chinese character and folk songs, which are characteristic for this region, show unique features of local musical performance. It is judged that understanding the purpose and background of a performance is important in grasping the foundation and continuity of local culture. Concerts were usually either related to official protocol for 'greeting,' 'sending-off,' 'reports,' and 'patrols' or for private enjoyment. The rituals for Gwanseo province characteristically features river crossing ceremony on the Daedong river, which has been closely documented by many. What is more, the Gwanseo region featured continued coming and goings of Pyeongan envoys and local officers, as well as ambassadors to and fro China, which required an organized and full-scale performance of music and dance. The method of performance varied from a large-scale, official ones, for which female entertainers and a great banquet in addition to musicians were required, to private gatherings that are more intimate. A performance may take the form of 'taking turns' or 'a competition,' reflecting the dynamic nature of the musical culture at the time. This study, which is deduction of terminology in relation to the time-space and appreciation culture of musical performances of Gwanseo region in late Joseon period, should be expanded in the future into research on 'the performance culture unique to Gwanseo region,' in relation to the financial and administrative aspects of the province, as well as everyday lifestyle. Furthermore, it could proceed to a more intensive research by a comparative study with related literary documents and pictorial data, which could serve as the foundation for understanding the use of space and stage, as well as the performance format characteristic to Korean traditional performing arts.