• Title/Summary/Keyword: Decision making tree

Search Result 200, Processing Time 0.029 seconds

Data Mining Tool for Stock Investors' Decision Support (주식 투자자의 의사결정 지원을 위한 데이터마이닝 도구)

  • Kim, Sung-Dong
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.2
    • /
    • pp.472-482
    • /
    • 2012
  • There are many investors in the stock market, and more and more people get interested in the stock investment. In order to avoid risks and make profit in the stock investment, we have to determine several aspects using various information. That is, we have to select profitable stocks and determine appropriate buying/selling prices and holding period. This paper proposes a data mining tool for the investors' decision support. The data mining tool makes stock investors apply machine learning techniques and generate stock price prediction model. Also it helps determine buying/selling prices and holding period. It supports individual investor's own decision making using past data. Using the proposed tool, users can manage stock data, generate their own stock price prediction models, and establish trading policy via investment simulation. Users can select technical indicators which they think affect future stock price. Then they can generate stock price prediction models using the indicators and test the models. They also perform investment simulation using proper models to find appropriate trading policy consisting of buying/selling prices and holding period. Using the proposed data mining tool, stock investors can expect more profit with the help of stock price prediction model and trading policy validated on past data, instead of with an emotional decision.

A Comparative Analysis of Ensemble Learning-Based Classification Models for Explainable Term Deposit Subscription Forecasting (설명 가능한 정기예금 가입 여부 예측을 위한 앙상블 학습 기반 분류 모델들의 비교 분석)

  • Shin, Zian;Moon, Jihoon;Rho, Seungmin
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.3
    • /
    • pp.97-117
    • /
    • 2021
  • Predicting term deposit subscriptions is one of representative financial marketing in banks, and banks can build a prediction model using various customer information. In order to improve the classification accuracy for term deposit subscriptions, many studies have been conducted based on machine learning techniques. However, even if these models can achieve satisfactory performance, utilizing them is not an easy task in the industry when their decision-making process is not adequately explained. To address this issue, this paper proposes an explainable scheme for term deposit subscription forecasting. For this, we first construct several classification models using decision tree-based ensemble learning methods, which yield excellent performance in tabular data, such as random forest, gradient boosting machine (GBM), extreme gradient boosting (XGB), and light gradient boosting machine (LightGBM). We then analyze their classification performance in depth through 10-fold cross-validation. After that, we provide the rationale for interpreting the influence of customer information and the decision-making process by applying Shapley additive explanation (SHAP), an explainable artificial intelligence technique, to the best classification model. To verify the practicality and validity of our scheme, experiments were conducted with the bank marketing dataset provided by Kaggle; we applied the SHAP to the GBM and LightGBM models, respectively, according to different dataset configurations and then performed their analysis and visualization for explainable term deposit subscriptions.

Interpreting Bounded Rationality in Business and Industrial Marketing Contexts: Executive Training Case Studies (집행관배훈안례연구(阐述工商业背景下的有限合理性):집행관배훈안례연구(执行官培训案例研究))

  • Woodside, Arch G.;Lai, Wen-Hsiang;Kim, Kyung-Hoon;Jung, Deuk-Keyo
    • Journal of Global Scholars of Marketing Science
    • /
    • v.19 no.3
    • /
    • pp.49-61
    • /
    • 2009
  • This article provides training exercises for executives into interpreting subroutine maps of executives' thinking in processing business and industrial marketing problems and opportunities. This study builds on premises that Schank proposes about learning and teaching including (1) learning occurs by experiencing and the best instruction offers learners opportunities to distill their knowledge and skills from interactive stories in the form of goal.based scenarios, team projects, and understanding stories from experts. Also, (2) telling does not lead to learning because learning requires action-training environments should emphasize active engagement with stories, cases, and projects. Each training case study includes executive exposure to decision system analysis (DSA). The training case requires the executive to write a "Briefing Report" of a DSA map. Instructions to the executive trainee in writing the briefing report include coverage in the briefing report of (1) details of the essence of the DSA map and (2) a statement of warnings and opportunities that the executive map reader interprets within the DSA map. The length maximum for a briefing report is 500 words-an arbitrary rule that works well in executive training programs. Following this introduction, section two of the article briefly summarizes relevant literature on how humans think within contexts in response to problems and opportunities. Section three illustrates the creation and interpreting of DSA maps using a training exercise in pricing a chemical product to different OEM (original equipment manufacturer) customers. Section four presents a training exercise in pricing decisions by a petroleum manufacturing firm. Section five presents a training exercise in marketing strategies by an office furniture distributer along with buying strategies by business customers. Each of the three training exercises is based on research into information processing and decision making of executives operating in marketing contexts. Section six concludes the article with suggestions for use of this training case and for developing additional training cases for honing executives' decision-making skills. Todd and Gigerenzer propose that humans use simple heuristics because they enable adaptive behavior by exploiting the structure of information in natural decision environments. "Simplicity is a virtue, rather than a curse". Bounded rationality theorists emphasize the centrality of Simon's proposition, "Human rational behavior is shaped by a scissors whose blades are the structure of the task environments and the computational capabilities of the actor". Gigerenzer's view is relevant to Simon's environmental blade and to the environmental structures in the three cases in this article, "The term environment, here, does not refer to a description of the total physical and biological environment, but only to that part important to an organism, given its needs and goals." The present article directs attention to research that combines reports on the structure of task environments with the use of adaptive toolbox heuristics of actors. The DSA mapping approach here concerns the match between strategy and an environment-the development and understanding of ecological rationality theory. Aspiration adaptation theory is central to this approach. Aspiration adaptation theory models decision making as a multi-goal problem without aggregation of the goals into a complete preference order over all decision alternatives. The three case studies in this article permit the learner to apply propositions in aspiration level rules in reaching a decision. Aspiration adaptation takes the form of a sequence of adjustment steps. An adjustment step shifts the current aspiration level to a neighboring point on an aspiration grid by a change in only one goal variable. An upward adjustment step is an increase and a downward adjustment step is a decrease of a goal variable. Creating and using aspiration adaptation levels is integral to bounded rationality theory. The present article increases understanding and expertise of both aspiration adaptation and bounded rationality theories by providing learner experiences and practice in using propositions in both theories. Practice in ranking CTSs and writing TOP gists from DSA maps serves to clarify and deepen Selten's view, "Clearly, aspiration adaptation must enter the picture as an integrated part of the search for a solution." The body of "direct research" by Mintzberg, Gladwin's ethnographic decision tree modeling, and Huff's work on mapping strategic thought are suggestions on where to look for research that considers both the structure of the environment and the computational capabilities of the actors making decisions in these environments. Such research on bounded rationality permits both further development of theory in how and why decisions are made in real life and the development of learning exercises in the use of heuristics occurring in natural environments. The exercises in the present article encourage learning skills and principles of using fast and frugal heuristics in contexts of their intended use. The exercises respond to Schank's wisdom, "In a deep sense, education isn't about knowledge or getting students to know what has happened. It is about getting them to feel what has happened. This is not easy to do. Education, as it is in schools today, is emotionless. This is a huge problem." The three cases and accompanying set of exercise questions adhere to Schank's view, "Processes are best taught by actually engaging in them, which can often mean, for mental processing, active discussion."

  • PDF

Evaluation of Classification Algorithm Performance of Sentiment Analysis Using Entropy Score (엔트로피 점수를 이용한 감성분석 분류알고리즘의 수행도 평가)

  • Park, Man-Hee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.9
    • /
    • pp.1153-1158
    • /
    • 2018
  • Online customer evaluations and social media information among a variety of information sources are critical for businesses as it influences the customer's decision making. There are limitations on the time and money that the survey will ask to identify a variety of customers' needs and complaints. The customer review data at online shopping malls provide the ideal data sources for analyzing customer sentiment about their products. In this study, we collected product reviews data on the smartphone of Samsung and Apple from Amazon. We applied five classification algorithms which are used as representative sentiment analysis techniques in previous studies. The five algorithms are based on support vector machines, bagging, random forest, classification or regression tree and maximum entropy. In this study, we proposed entropy score which can comprehensively evaluate the performance of classification algorithm. As a result of evaluating five algorithms using an entropy score, the SVMs algorithm's entropy score was ranked highest.

Union and Division using Technique in Fingerprint Recognition Identification System

  • Park, Byung-Jun;Park, Jong-Min;Lee, Jung-Oh
    • Journal of information and communication convergence engineering
    • /
    • v.5 no.2
    • /
    • pp.140-143
    • /
    • 2007
  • Fingerprint Recognition System is made up of Off-line treatment and On-line treatment; the one is registering all the information of there trieving features which are retrieved in the digitalized fingerprint getting out of the analog fingerprint through the fingerprint acquisition device and the other is the treatment making the decision whether the users are approved to be accessed to the system or not with matching them with the fingerprint features which are retrieved and database from the input fingerprint when the users are approaching the system to use. In matching between On-line and Off-line treatment, the most important thing is which features we are going to use as the standard. Therefore, we have been using "Delta" and "Core" as this standard until now, but there might have been some deficits not to exist in every person when we set them up as the standards. In order to handle the users who do not have those features, we are still using the matching method which enables us to make up of the spanning tree or the triangulation with the relations of the spanned feature. However, there are some overheads of the time on these methods and it is not sure whether they make the correct matching or not. In this paper, introduces a new data structure, called Union and Division, representing binary fingerprint image. Minutiae detecting procedure using Union and Division takes, on the average, 32% of the consuming time taken by a minutiae detecting procedure without using Union and Division.

Predicting Factors on Performance Confidence of Cardiopulmonary Resuscitation in Community Members (지역사회 주민의 심폐소생술 수행 자신감 예측요인)

  • Lee, Su-Jin
    • Journal of Digital Contents Society
    • /
    • v.19 no.9
    • /
    • pp.1699-1705
    • /
    • 2018
  • This study is a descriptive investigatory study that secondarily analyzes the community health survey in order to identify the characteristics of confidence regarding the execution of cardiopulmonary resuscitation among community members of Korea. Study subjects included 357,176 people who were aware of cardiopulmonary resuscitation based on 2014 and 2016 community health survey. The collected data were analyzed for composite sample frequency and decision-making tree using SPSS WIN 25.0 program. According to the results of this study, a confidence regarding execution of cardiopulmonary resuscitation of the community members was higher if the subject has experienced cardiopulmonary resuscitation education, trained on mannequin within the past 2 years, received cardiopulmonary resuscitation education within the past 2 years, is of male sex, and is 41.5 years of age or younger.

A Method of Predicting Service Time Based on Voice of Customer Data (고객의 소리(VOC) 데이터를 활용한 서비스 처리 시간 예측방법)

  • Kim, Jeonghun;Kwon, Ohbyung
    • Journal of Information Technology Services
    • /
    • v.15 no.1
    • /
    • pp.197-210
    • /
    • 2016
  • With the advent of text analytics, VOC (Voice of Customer) data become an important resource which provides the managers and marketing practitioners with consumer's veiled opinion and requirements. In other words, making relevant use of VOC data potentially improves the customer responsiveness and satisfaction, each of which eventually improves business performance. However, unstructured data set such as customers' complaints in VOC data have seldom used in marketing practices such as predicting service time as an index of service quality. Because the VOC data which contains unstructured data is too complicated form. Also that needs convert unstructured data from structure data which difficult process. Hence, this study aims to propose a prediction model to improve the estimation accuracy of the level of customer satisfaction by combining unstructured from textmining with structured data features in VOC. Also the relationship between the unstructured, structured data and service processing time through the regression analysis. Text mining techniques, sentiment analysis, keyword extraction, classification algorithms, decision tree and multiple regression are considered and compared. For the experiment, we used actual VOC data in a company.

A study on integration of semantic topic based Knowledge model (의미적 토픽 기반 지식모델의 통합에 관한 연구)

  • Chun, Seung-Su;Lee, Sang-Jin;Bae, Sang-Tea
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.181-183
    • /
    • 2012
  • 최근 자연어 및 정형언어 처리, 인공지능 알고리즘 등을 활용한 효율적인 의미 기반 지식모델의 생성과 분석 방법이 제시되고 있다. 이러한 의미 기반 지식모델은 효율적 의사결정트리(Decision Making Tree)와 특정 상황에 대한 체계적인 문제해결(Problem Solving) 경로 분석에 활용된다. 특히 다양한 복잡계 및 사회 연계망 분석에 있어 정적 지표 생성과 회귀 분석, 행위적 모델을 통한 추이분석, 거시예측을 지원하는 모의실험(Simulation) 모형의 기반이 된다. 본 연구에서는 이러한 의미 기반 지식모델을 통합에 있어 텍스트 마이닝을 통해 도출된 토픽(Topic) 모델 간 통합 방법과 정형적 알고리즘을 제시한다. 이를 위해 먼저, 텍스트 마이닝을 통해 도출되는 키워드 맵을 동치적 지식맵으로 변환하고 이를 의미적 지식모델로 통합하는 방법을 설명한다. 또한 키워드 맵으로부터 유의미한 토픽 맵을 투영하는 방법과 의미적 동치 모델을 유도하는 알고리즘을 제안한다. 통합된 의미 기반 지식모델은 토픽 간의 구조적 규칙과 정도 중심성, 근접 중심성, 매개 중심성 등 관계적 의미분석이 가능하며 대규모 비정형 문서의 의미 분석과 활용에 실질적인 기반 연구가 될 수 있다.

Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety

  • Yeom, Ha-Neul;Hwang, Myunggwon;Hwang, Mi-Nyeong;Jung, Hanmin
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.3
    • /
    • pp.29-39
    • /
    • 2014
  • In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech.

The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction (입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구)

  • Park, Jungsu
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.5
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.