Search | Korea Science

Hybrid Learning Architectures for Advanced Data Mining:An Application to Binary Classification for Fraud Management (개선된 데이터마이닝을 위한 혼합 학습구조의 제시)

Kim, Steven H.;Shin, Sung-Woo
- Journal of Information Technology Application
- /
- v.1
- /
- pp.173-211
- /
- 1999
The task of classification permeates all walks of life, from business and economics to science and public policy. In this context, nonlinear techniques from artificial intelligence have often proven to be more effective than the methods of classical statistics. The objective of knowledge discovery and data mining is to support decision making through the effective use of information. The automated approach to knowledge discovery is especially useful when dealing with large data sets or complex relationships. For many applications, automated software may find subtle patterns which escape the notice of manual analysis, or whose complexity exceeds the cognitive capabilities of humans. This paper explores the utility of a collaborative learning approach involving integrated models in the preprocessing and postprocessing stages. For instance, a genetic algorithm effects feature-weight optimization in a preprocessing module. Moreover, an inductive tree, artificial neural network (ANN), and k-nearest neighbor (kNN) techniques serve as postprocessing modules. More specifically, the postprocessors act as second0order classifiers which determine the best first-order classifier on a case-by-case basis. In addition to the second-order models, a voting scheme is investigated as a simple, but efficient, postprocessing model. The first-order models consist of statistical and machine learning models such as logistic regression (logit), multivariate discriminant analysis (MDA), ANN, and kNN. The genetic algorithm, inductive decision tree, and voting scheme act as kernel modules for collaborative learning. These ideas are explored against the background of a practical application relating to financial fraud management which exemplifies a binary classification problem.
PDF

Electricity Price Prediction Based on Semi-Supervised Learning and Neural Network Algorithms (준지도 학습 및 신경망 알고리즘을 이용한 전기가격 예측)

Kim, Hang Seok;Shin, Hyun Jung
- Journal of Korean Institute of Industrial Engineers
- /
- v.39 no.1
- /
- pp.30-45
- /
- 2013
Predicting monthly electricity price has been a significant factor of decision-making for plant resource management, fuel purchase plan, plans to plant, operating plan budget, and so on. In this paper, we propose a sophisticated prediction model in terms of the technique of modeling and the variety of the collected variables. The proposed model hybridizes the semi-supervised learning and the artificial neural network algorithms. The former is the most recent and a spotlighted algorithm in data mining and machine learning fields, and the latter is known as one of the well-established algorithms in the fields. Diverse economic/financial indexes such as the crude oil prices, LNG prices, exchange rates, composite indexes of representative global stock markets, etc. are collected and used for the semi-supervised learning which predicts the up-down movement of the price. Whereas various climatic indexes such as temperature, rainfall, sunlight, air pressure, etc, are used for the artificial neural network which predicts the real-values of the price. The resulting values are hybridized in the proposed model. The excellency of the model was empirically verified with the monthly data of electricity price provided by the Korea Energy Economics Institute.
https://doi.org/10.7232/JKIIE.2013.39.1.030 인용 PDF KSCI

A Securities Company's Customer Churn Prediction Model and Causal Inference with SHAP Value (증권 금융 상품 거래 고객의 이탈 예측 및 원인 추론)

Na, Kwangtek;Lee, Jinyoung;Kim, Eunchan;Lee, Hyochan
- The Journal of Bigdata
- /
- v.5 no.2
- /
- pp.215-229
- /
- 2020
The interest in machine learning is growing in all industries, but it is difficult to apply it to real-world tasks because of inexplicability. This paper introduces a case of developing a financial customer churn prediction model for a securities company, and introduces the research results on an attempt to develop a machine learning model that can be explained using the SHAP Value methodology and derivation of interpretability. In this study, a total of six customer churn models are compared and analyzed, and the cause of customer churn is inferred through the classification and data analysis of SHAP Value and the type of customer asset change. Based on the results of this study, it would be possible to use it as a basis for comprehensive judgment, such as using the Value of the deviation prediction result that can infer the cause of the marketing manager's actual customer marketing in the future and establishing a target marketing strategy for each customer.
https://doi.org/10.36498/kbigdt.2020.5.2.215 인용 PDF KSCI

Study on Prediction of Attendance Using Machine Learning (머신러닝을 이용한 관중 수요 예측에 관한 연구)

Yoo, Ji-Hyun
- Journal of IKEEE
- /
- v.23 no.4
- /
- pp.1243-1249
- /
- 2019
People who gathered to enjoy a specific event or content are called audiences or spectators, and show various propensity according to the characteristics of the crowd. Although there is such a difference, in general, the number of attendance is directly related to the business aspect, which enables stable financial operation for the sale of contents through various incomes, such as the admission fee and the use of other facilities. Therefore, prediction of audience can be used as a major factor in marketing and budgeting strategies. In this study, we review several existing models for predicting the number of attendance and propose an efficient machine learning model. In addition, we studied daily attendance prediction and abnormal attendance prediction using combine DNN(Deep Neural Network) and RF(Random Forest) model.
https://doi.org/10.7471/ikeee.2019.23.4.1243 인용 PDF KSCI

Work life balance practices and the link to innovation and productivity: A comprehensive literature review

Hatcher, Ryan;Hwang, Yo-Sung
- The Journal of Economics, Marketing and Management
- /
- v.7 no.1
- /
- pp.26-38
- /
- 2019
Purpose - This paper is to review recent literature, by conducting a thorough investigation of the limitations and implications for future research on work-life balance with the focus and linkages between work-life balance practices, machine learning and emotional intelligence, work-life conflict, the correlations between work-life enrichment and work-life balance practices, the relationships between employee job satisfaction and work-life balance, the links between work-life balance and the managerial support. Research design, data, and methodology - The paper will further detail linkages between work-life balance and organizational performance outcomes productivity and innovation. Previous literatures have paid attentions to the link of HR practices and organizational outcomes such as productivity, flexibility, and financial performance, but the understanding needs to be extended to involve innovation performance. Dealing with employees' emotions using different machine learning techniques is one of the phenomenal researches in today's world. Here, we examine how far the employees are conscious of their own self and found the ideas and views of an individual about themselves and others. Without proper knowledge about their personality it will be very difficult for an individual to manage their own emotions. This study also aims at finding out the individual abilities to manage their emotions in order to perform well. Conclusions - A theoretical conceptual framework has been built by integrating the existing literature to explain a number of factors which are closely associated with work-life balance. The conceptual model illustrates how the work-life balance interplays with performance and interrelates with the aforementioned factors.
https://doi.org/10.20482/jemm.2019.7.1.26. 인용 PDF

Examining the Effects of Vocabulary on Crowdfunding Success: A Comparison of Cultural and Commercial Campaigns

Xiang Gao;Weige Huang;Bin, Li;Sunghan Ryu
- Asia pacific journal of information systems
- /
- v.32 no.2
- /
- pp.275-306
- /
- 2022
Crowdfunding has emerged as an important financing source for diverse cultural projects and commercial ventures in the early stages. Unlike traditional investment evaluation, where structured financial data is critical, such information is typically unavailable for crowdfunding campaigns. Instead, campaign creators prepare pitches containing essential information about themselves and the campaigns, which are crucial in attracting and persuading contributors. Prior literature has examined the effects of different aspects in campaign pitches, but a comprehensive understanding of the theme is lacking. This study aims to fill this gap by identifying the lexicon of frequently used vocabulary in campaign pitches and examining how they are associated with crowdfunding success. Moreover, we examine how the association differs between culture and commercial crowdfunding campaigns. We randomly collected 50,000 campaigns from the cultural and commercial categories on Kickstarter and extracted the 100 most used verbs in the campaign pitches. Based on a machine learning approach combined with principal component analysis, we constructed sets of verbal factors statistically significant in predicting crowdfunding success. The findings also show that cultural and commercial campaigns consist of different verbal components with different effects on crowdfunding success.
https://doi.org/10.14329/apjis.2022.32.2.275 인용 PDF

A Comparative Study of Phishing Websites Classification Based on Classifier Ensemble

Tama, Bayu Adhi;Rhee, Kyung-Hyune
- Journal of Korea Multimedia Society
- /
- v.21 no.5
- /
- pp.617-625
- /
- 2018
Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.
https://doi.org/10.9717/kmms.2018.21.5.617 인용 PDF KSCI

A Comparative Study of Phishing Websites Classification Based on Classifier Ensembles

Tama, Bayu Adhi;Rhee, Kyung-Hyune
- Journal of Multimedia Information System
- /
- v.5 no.2
- /
- pp.99-104
- /
- 2018
Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.
https://doi.org/10.9717/JMIS.2018.5.2.99 인용 PDF KSCI

Machine Learning Based Stock Price Fluctuation Prediction Models of KOSDAQ-listed Companies Using Online News, Macroeconomic Indicators, Financial Market Indicators, Technical Indicators, and Social Interest Indicators (온라인 뉴스와 거시경제 지표, 금융 지표, 기술적 지표, 관심도 지표를 이용한 코스닥 상장 기업의 기계학습 기반 주가 변동 예측)

Kim, Hwa Ryun;Hong, Seung Hye;Hong, Helen
- Journal of Korea Multimedia Society
- /
- v.24 no.3
- /
- pp.448-459
- /
- 2021
In this paper, we propose a method of predicting the next-day stock price fluctuations of 10 KOSDAQ-listed companies in 5G, autonomous driving, and electricity sectors by training SVM, XGBoost, and LightGBM models from macroeconomic·financial market indicators, technical indicators, social interest indicators, and daily positive indices extracted from online news. In the three experiments to find out the usefulness of social interest indicators and daily positive indices, the average accuracy improved when each indicator and index was added to the models. In addition, when feature selection was performed to analyze the superiority of the extracted features, the average importance ranking of the social interest indicator and daily positive index was 5.45 and 1.08, respectively, it showed higher importance than the macroeconomic financial market indicators and technical indicators. With the results of these experiments, we confirmed the effectiveness of the social interest indicators as alternative data and the daily positive index for predicting stock price fluctuation.
https://doi.org/10.9717/kmms.2020.24.3.448 인용 PDF KSCI HTML

The Prediction of Export Credit Guarantee Accident using Machine Learning (기계학습을 이용한 수출신용보증 사고예측)

Cho, Jaeyoung;Joo, Jihwan;Han, Ingoo
- Journal of Intelligence and Information Systems
- /
- v.27 no.1
- /
- pp.83-102
- /
- 2021
The government recently announced various policies for developing big-data and artificial intelligence fields to provide a great opportunity to the public with respect to disclosure of high-quality data within public institutions. KSURE(Korea Trade Insurance Corporation) is a major public institution for financial policy in Korea, and thus the company is strongly committed to backing export companies with various systems. Nevertheless, there are still fewer cases of realized business model based on big-data analyses. In this situation, this paper aims to develop a new business model which can be applied to an ex-ante prediction for the likelihood of the insurance accident of credit guarantee. We utilize internal data from KSURE which supports export companies in Korea and apply machine learning models. Then, we conduct performance comparison among the predictive models including Logistic Regression, Random Forest, XGBoost, LightGBM, and DNN(Deep Neural Network). For decades, many researchers have tried to find better models which can help to predict bankruptcy since the ex-ante prediction is crucial for corporate managers, investors, creditors, and other stakeholders. The development of the prediction for financial distress or bankruptcy was originated from Smith(1930), Fitzpatrick(1932), or Merwin(1942). One of the most famous models is the Altman's Z-score model(Altman, 1968) which was based on the multiple discriminant analysis. This model is widely used in both research and practice by this time. The author suggests the score model that utilizes five key financial ratios to predict the probability of bankruptcy in the next two years. Ohlson(1980) introduces logit model to complement some limitations of previous models. Furthermore, Elmer and Borowski(1988) develop and examine a rule-based, automated system which conducts the financial analysis of savings and loans. Since the 1980s, researchers in Korea have started to examine analyses on the prediction of financial distress or bankruptcy. Kim(1987) analyzes financial ratios and develops the prediction model. Also, Han et al.(1995, 1996, 1997, 2003, 2005, 2006) construct the prediction model using various techniques including artificial neural network. Yang(1996) introduces multiple discriminant analysis and logit model. Besides, Kim and Kim(2001) utilize artificial neural network techniques for ex-ante prediction of insolvent enterprises. After that, many scholars have been trying to predict financial distress or bankruptcy more precisely based on diverse models such as Random Forest or SVM. One major distinction of our research from the previous research is that we focus on examining the predicted probability of default for each sample case, not only on investigating the classification accuracy of each model for the entire sample. Most predictive models in this paper show that the level of the accuracy of classification is about 70% based on the entire sample. To be specific, LightGBM model shows the highest accuracy of 71.1% and Logit model indicates the lowest accuracy of 69%. However, we confirm that there are open to multiple interpretations. In the context of the business, we have to put more emphasis on efforts to minimize type 2 error which causes more harmful operating losses for the guaranty company. Thus, we also compare the classification accuracy by splitting predicted probability of the default into ten equal intervals. When we examine the classification accuracy for each interval, Logit model has the highest accuracy of 100% for 0~10% of the predicted probability of the default, however, Logit model has a relatively lower accuracy of 61.5% for 90~100% of the predicted probability of the default. On the other hand, Random Forest, XGBoost, LightGBM, and DNN indicate more desirable results since they indicate a higher level of accuracy for both 0~10% and 90~100% of the predicted probability of the default but have a lower level of accuracy around 50% of the predicted probability of the default. When it comes to the distribution of samples for each predicted probability of the default, both LightGBM and XGBoost models have a relatively large number of samples for both 0~10% and 90~100% of the predicted probability of the default. Although Random Forest model has an advantage with regard to the perspective of classification accuracy with small number of cases, LightGBM or XGBoost could become a more desirable model since they classify large number of cases into the two extreme intervals of the predicted probability of the default, even allowing for their relatively low classification accuracy. Considering the importance of type 2 error and total prediction accuracy, XGBoost and DNN show superior performance. Next, Random Forest and LightGBM show good results, but logistic regression shows the worst performance. However, each predictive model has a comparative advantage in terms of various evaluation standards. For instance, Random Forest model shows almost 100% accuracy for samples which are expected to have a high level of the probability of default. Collectively, we can construct more comprehensive ensemble models which contain multiple classification machine learning models and conduct majority voting for maximizing its overall performance.
https://doi.org/10.13088/jiis.2021.27.1.083 인용 PDF KSCI

Search Result 145, Processing Time 0.036 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)