• Title/Summary/Keyword: Decision Tree Regression

Search Result 328, Processing Time 0.284 seconds

A Development of PM10 Forecasting System (미세먼지 예보시스템 개발)

  • Koo, Youn-Seo;Yun, Hui-Young;Kwon, Hee-Yong;Yu, Suk-Hyun
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.26 no.6
    • /
    • pp.666-682
    • /
    • 2010
  • The forecasting system for Today's and Tomorrow's PM10 was developed based on the statistical model and the forecasting was performed at 9 AM to predict Today's 24 hour average PM10 concentration and at 5 PM to predict Tomorrow's 24 hour average PM10. The Today's forecasting model was operated based on measured air quality and meteorological data while Tomorrow's model was run by monitored data as well as the meteorological data calculated from the weather forecasting model such as MM5 (Mesoscale Meteorological Model version 5). The observed air quality data at ambient air quality monitoring stations as well as measured and forecasted meteorological data were reviewed to find the relationship with target PM10 concentrations by the regression analysis. The PM concentration, wind speed, precipitation rate, mixing height and dew-point deficit temperature were major variables to determine the level of PM10 and the wind direction at 500 hpa height was also a good indicator to identify the influence of long-range transport from other countries. The neural network, regression model, and decision tree method were used as the forecasting models to predict the class of a comprehensive air quality index and the final forecasting index was determined by the most frequent index among the three model's predicted indexes. The accuracy, false alarm rate, and probability of detection in Tomorrow's model were 72.4%, 0.0%, and 42.9% while those in Today's model were 80.8%, 12.5%, and 77.8%, respectively. The statistical model had the limitation to predict the rapid changing PM10 concentration by long-range transport from the outside of Korea and in this case the chemical transport model would be an alternative method.

A Study of Factors Associated with Software Developers Job Turnover (데이터마이닝을 활용한 소프트웨어 개발인력의 업무 지속수행의도 결정요인 분석)

  • Jeon, In-Ho;Park, Sun W.;Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.191-204
    • /
    • 2015
  • According to the '2013 Performance Assessment Report on the Financial Program' from the National Assembly Budget Office, the unfilled recruitment ratio of Software(SW) Developers in South Korea was 25% in the 2012 fiscal year. Moreover, the unfilled recruitment ratio of highly-qualified SW developers reaches almost 80%. This phenomenon is intensified in small and medium enterprises consisting of less than 300 employees. Young job-seekers in South Korea are increasingly avoiding becoming a SW developer and even the current SW developers want to change careers, which hinders the national development of IT industries. The Korean government has recently realized the problem and implemented policies to foster young SW developers. Due to this effort, it has become easier to find young SW developers at the beginning-level. However, it is still hard to recruit highly-qualified SW developers for many IT companies. This is because in order to become a SW developing expert, having a long term experiences are important. Thus, improving job continuity intentions of current SW developers is more important than fostering new SW developers. Therefore, this study surveyed the job continuity intentions of SW developers and analyzed the factors associated with them. As a method, we carried out a survey from September 2014 to October 2014, which was targeted on 130 SW developers who were working in IT industries in South Korea. We gathered the demographic information and characteristics of the respondents, work environments of a SW industry, and social positions for SW developers. Afterward, a regression analysis and a decision tree method were performed to analyze the data. These two methods are widely used data mining techniques, which have explanation ability and are mutually complementary. We first performed a linear regression method to find the important factors assaociated with a job continuity intension of SW developers. The result showed that an 'expected age' to work as a SW developer were the most significant factor associated with the job continuity intention. We supposed that the major cause of this phenomenon is the structural problem of IT industries in South Korea, which requires SW developers to change the work field from developing area to management as they are promoted. Also, a 'motivation' to become a SW developer and a 'personality (introverted tendency)' of a SW developer are highly importantly factors associated with the job continuity intention. Next, the decision tree method was performed to extract the characteristics of highly motivated developers and the low motivated ones. We used well-known C4.5 algorithm for decision tree analysis. The results showed that 'motivation', 'personality', and 'expected age' were also important factors influencing the job continuity intentions, which was similar to the results of the regression analysis. In addition to that, the 'ability to learn' new technology was a crucial factor for the decision rules of job continuity. In other words, a person with high ability to learn new technology tends to work as a SW developer for a longer period of time. The decision rule also showed that a 'social position' of SW developers and a 'prospect' of SW industry were minor factors influencing job continuity intensions. On the other hand, 'type of an employment (regular position/ non-regular position)' and 'type of company (ordering company/ service providing company)' did not affect the job continuity intension in both methods. In this research, we demonstrated the job continuity intentions of SW developers, who were actually working at IT companies in South Korea, and we analyzed the factors associated with them. These results can be used for human resource management in many IT companies when recruiting or fostering highly-qualified SW experts. It can also help to build SW developer fostering policy and to solve the problem of unfilled recruitment of SW Developers in South Korea.

Design of Contact Scheduling System(CSS) for Customer Retention (고객유지를 위한 접촉스케줄링시스템의 설계)

  • Lee, Jee-Sik;Cho, You-Jung
    • Journal of Intelligence and Information Systems
    • /
    • v.11 no.3
    • /
    • pp.83-101
    • /
    • 2005
  • Customer retention is one of the major issues in life insurance industry, in which competition is increasingly fierce. There are many things for the life insurers to do many things to retain the customers. One of those things is to make sure to keep in touch with all customers. When an insurance-planner resigned, his/her customers must be taken care of by some planner-assistants. This article outlines the design of Contact Scheduling System (CSS) that supports planner-assistants for contacting the customers. Planner-assistants are unable to share the resigned insurance-planner's experience and knowledge regarding the customer relationship management. The CSS developed by employing both Classification And Regression Tree (CART) technique and Sequential Pattern Mining (SPM) technique has a two-stage process. In the first stage, it segments the customers into eight groups by CART model. Then it generates contact scheduling information consisting of contact-purpose, contact-interval and contact-channel, according to the segment's typical contact pattern. Contact-purpose is derived by schedule-driven, event-driven, or business-rule-driven. Schedule-driven contact is determined by SPM model. In the operation of CSS in a realistic situation, it shows a practicality in supporting planner-assistants to keep in touch with the customers efficiently and effectively.

  • PDF

Development of a Soil Moisture Estimation Model Using Artificial Neural Networks and Classification and Regression Tree(CART) (의사결정나무 분류와 인공신경망을 이용한 토양수분 산정모형 개발)

  • Kim, Gwangseob;Park, Jung-A
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.31 no.2B
    • /
    • pp.155-163
    • /
    • 2011
  • In this study, a soil moisture estimation model was developed using a decision tree model, an artificial neural networks (ANN) model, remotely sensed data, and ground network data of daily precipitation, soil moisture and surface temperature. Soil moisture data of the Yongdam dam basin (5 sites) were used for model validation. Satellite remote sensing data and geographical data and meteorological data were used in the classification and regression tree (CART) model for data classification and the ANNs model was applied for clustered data to estimate soil moisture. Soil moisture data of Jucheon, Bugui, Sangjeon, Ahncheon sites were used for training and the correlation coefficient between soil moisture estimates and observations was between 0.92 to 0.96, root mean square error was between 1.00 to 1.88%, and mean absolute error was between 0.75 to 1.45%. Cheoncheon2 site was used for validation. Test statistics showed that the correlation coefficient, the root mean square error, the mean absolute error were 0.91, 3.19%, and 2.72% respectively. Results demonstrated that the developed soil moisture model using CART and ANN was able to apply for the estimation of soil moisture distribution.

The Analysis of Factors which Affect Business Survey Index Using Regression Trees (회귀나무를 이용한 기업경기실사지수의 영향요인 분석)

  • Chang, Young-Jae
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.1
    • /
    • pp.63-71
    • /
    • 2010
  • Business entrepreneurs reflect their views of domestic and foreign economic activities on their operation for the growth of their business. The decision, forecasting, and planning based on their economic sentiment affect business operation such as production, investment, and hiring and consequently affect condition of national economy. Business survey index(BSI) is compiled to get the information of business entrepreneurs' economic sentiment for the analysis of business condition. BSI has been used as an important variable in the short-term forecasting models for business cycle analysis, especially during the the period of extreme business fluctuations. Recent financial crisis has arised extreme business fluctuations similar to those caused by currency crisis at the end of 1997, and brought back the importance of BSI as a variable for the economic forecasting. In this paper, the meaning of BSI as an economic sentiment index is reviewed and a GUIDE regression tree is constructed to find out the factors which affect on BSI. The result shows that the variables related to the stability of financial market such as kospi index(Korea composite stock price index) and exchange rate as well as manufacturing operation ratio and consumer goods sales are main factors which affect business entrepreneurs' economic sentiment.

Video Ranking Model: a Data-Mining Solution with the Understood User Engagement

  • Chen, Yongyu;Chen, Jianxin;Zhou, Liang;Yan, Ying;Huang, Ruochen;Zhang, Wei
    • Journal of Multimedia Information System
    • /
    • v.1 no.1
    • /
    • pp.67-75
    • /
    • 2014
  • Nowadays as video services grow rapidly, it is important for the service providers to provide customized services. Video ranking plays a key role for the service providers to attract the subscribers. In this paper we propose a weekly video ranking mechanism based on the quantified user engagement. The traditional QoE ranking mechanism is relatively subjective and usually is accomplished by grading, while QoS is relatively objective and is accomplished by analyzing the quality metrics. The goal of this paper is to establish a ranking mechanism which combines the both advantages of QoS and QoE according to the third-party data collection platform. We use data mining method to classify and analyze the collected data. In order to apply into the actual situation, we first group the videos and then use the regression tree and the decision tree (CART) to narrow down the number of them to a reasonable scale. After that we introduce the analytic hierarchy process (AHP) model and use Elo rating system to improve the fairness of our system. Questionnaire results verify that the proposed solution not only simplifies the computation but also increases the credibility of the system.

  • PDF

Soil Moisture Estimation Using CART Algorithm and Ancillary Data (CART기법과 보조자료를 이용한 토양수분 추정)

  • Kim, Gwang-Seob;Park, Han-Gyun
    • Journal of Korea Water Resources Association
    • /
    • v.43 no.7
    • /
    • pp.597-608
    • /
    • 2010
  • In this study, a method for soil moisture estimation was proposed to obtain the nationwide soil moisture distribution map using on-site soil moisture observations, rainfall, surface temperature, NDVI, land cover, effective soil depth, and CART (Classification And Regression Tree) algorithm. The method was applied to the Yong-dam dam basin since the soil moisture data (4 sites) of the basin were reliable. Soil moisture observations of 3 sites (Bu-gui, San-jeon, Cheon-cheon2) were used for training the algorithm and 1 site (Gye-buk2) was used for the algorithm validation. The correlation coefficient between the observed and estimated data of soil moisture in the validation sites is about 0.737. Results show that even though there are limitations of the lack of reliable soil moisture observation for various land use, soil type, and topographic conditions, the soil moisture estimation method using ancillary data and CART algorithm can be a reasonable approach since the algorithm provided a fairly good estimation of soil moisture distribution for the study area.

Exploring the Management Component of Rural Small Business in the 6th Industry at Each Stage of Growth (6차산업 경영체 성장단계별 핵심경영요소 탐색)

  • Kim, Jung-Tae
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.12 no.6
    • /
    • pp.123-138
    • /
    • 2017
  • This study aims to identify the characteristic variables of businesses that would impact the choice of their type in the 6th industry and analyze how they work. To this end, this study analyzed data of 752 businesses certified as belonging to the 6th industry in 2015 through the classification and regression tree (CART) algorithm in decision tree analysis. The results of analysis showed that the type of agricultural product processing affected shaping the type of the 6th industry at the early stage of growth while the type of agricultural product processing, the type of service, region and sales volumes at the stage of growth and service strategy and the type of agricultural product processing at the stage of maturity. These findings empirically identified key business factors that could support businesses in the 6th industry at each stage of growth and presented a direction forward for support of the 6th industry.

  • PDF

Determination of Optimum Investment level for Safely Management by Process Risk Assessment at Gas Governor Station (가스공급기지에서 공정 위험성 평가에 의한 최적 안전관리 투자수준 결정)

  • Kim Tae-Ok;Jang Seo-Il
    • Journal of the Korean Institute of Gas
    • /
    • v.7 no.3 s.20
    • /
    • pp.1-6
    • /
    • 2003
  • This study has suggested a decision method which determine optimum investment level for safety management by process risk assessment at gas governor station. Hazard and operability study(HAZOP), fault tree analysis(FTA) and consequence analysis(CA) were carried out and potential accident cost and benefit for safety management were estimated. As a result, we could be found the trend of safety cost and benefit by the nonlinear regression method and could be determined the optimum investment level for safety management from analysis of safety management cost and potential accident cost.

  • PDF

A Machine Learning-based Customer Classification Model for Effective Online Free Sample Promotions (온라인 무료 샘플 판촉의 효과적 활용을 위한 기계학습 기반 고객분류예측 모형)

  • Won, Ha-Ram;Kim, Moo-Jeon;Ahn, Hyunchul
    • The Journal of Information Systems
    • /
    • v.27 no.3
    • /
    • pp.63-80
    • /
    • 2018
  • Purpose The purpose of this study is to build a machine learning-based customer classification model to promote customer expansion effect of the free sample promotion. Specifically, the proposed model classifies potential target customers who are expected to purchase the products included in the free sample promotion after receiving the free samples. Design/methodology/approach This study proposes to build a customer classification model for determining customers suitable for providing free samples by using various machine learning techniques such as logistic regression, multiple discriminant analysis, case-based reasoning, decision tree, artificial neural network, and support vector machine. To validate the usefulness of the proposed model, we apply it to a real-world free sample-based target marketing case of a Korean major cosmetic retail company. Findings Experimental results show that a machine learning-based customer classification model presents satisfactory accuracy ranging from 70% to 75%. In particular, support vector machine is found to be the most effective machine learning technique for free sample-based target marketing model. Our study sheds a light on customer relationship management strategies using free sample promotions.