• Title/Summary/Keyword: Decision Tree Regression

Search Result 328, Processing Time 0.024 seconds

A Study on Strategy for success of tourism e-marketplace (관광 e-마켓플레이스의 성공전략에 관한 연구)

  • Hong, Ji-Whan;Kim, Keun-Hyung
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.333-336
    • /
    • 2006
  • E-marketplace is a kind of B2B e-Business system that supports business transactions among companies. If e-marketplace is revitalized, we expect not only the development of related industry but also decrease of transaction cost among companies. It is necessary for the introduction and revitalization of e-marketplace in tourist industry from this point of view. Participants of tour e-marketplace are tour-related companies(travel agencies, lodging enterprises, shipping enterprises, etc.). Also tourists want to search a variety of tour products or contents. So tour e-marketplace has characteristics of B2C e-Business systems as well as B2B e-Business systems at once. The purpose of this study is to classify success factors that determine characteristics of tour e-marketplace through statistics survey from e-marketplace factors related tourism websites. First of all, we analyze success factors of B2B and B2C e-marketplace. Then we will set up influence factors of tour e-marketplace and conduct a survey about success factors of tour e-marketplace. Therefore, we could expect to find these good attributes in tour e-marketplace success through logistic regression and decision tree analysis from source data.

  • PDF

Forecasting of Customer's Purchasing Intention Using Support Vector Machine (Support Vector Machine 기법을 이용한 고객의 구매의도 예측)

  • Kim, Jin-Hwa;Nam, Ki-Chan;Lee, Sang-Jong
    • Information Systems Review
    • /
    • v.10 no.2
    • /
    • pp.137-158
    • /
    • 2008
  • Rapid development of various information technologies creates new opportunities in online and offline markets. In this changing market environment, customers have various demands on new products and services. Therefore, their power and influence on the markets grow stronger each year. Companies have paid great attention to customer relationship management. Especially, personalized product recommendation systems, which recommend products and services based on customer's private information or purchasing behaviors in stores, is an important asset to most companies. CRM is one of the important business processes where reliable information is mined from customer database. Data mining techniques such as artificial intelligence are popular tools used to extract useful information and knowledge from these customer databases. In this research, we propose a recommendation system that predicts customer's purchase intention. Then, customer's purchasing intention of specific product is predicted by using data mining techniques using receipt data set. The performance of this suggested method is compared with that of other data mining technologies.

Comparison of Hospital Standardized Mortality Ratio Using National Hospital Discharge Injury Data (퇴원손상심층조사 자료를 이용한 의료기관 중증도 보정 사망비 비교)

  • Park, Jong-Ho;Kim, Yoo-Mi;Kim, Sung-Soo;Kim, Won-Joong;Kang, Sung-Hong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.4
    • /
    • pp.1739-1750
    • /
    • 2012
  • This study was to develop the assessment of medical service outcome using administration data through compared with hospital standardized mortality ratios(HSMR) in various hospitals. This study analyzed 63,664 cases of Hospital Discharge Injury Data of 2007 and 2008, provided by Korea Centers for Disease Control and Prevention. We used data mining technique and compared decision tree and logistic regression for developing risk-adjustment model of in-hospital mortality. Our Analysis shows that gender, length of stay, Elixhauser comorbidity index, hospitalization path, and primary diagnosis are main variables which influence mortality ratio. By comparing hospital standardized mortality ratios(HSMR) with standardized variables, we found concrete differences (55.6-201.6) of hospital standardized mortality ratios(HSMR) among hospitals. This proves that there are quality-gaps of medical service among hospitals. This study outcome should be utilized more to achieve the improvement of the quality of medical service.

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

Determinants of employee's wage using hierarchical linear model (위계적 선형모형을 이용한 대졸 신규취업자 임금 결정요인 분석)

  • Park, Sungik;Cho, Jangsik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.65-75
    • /
    • 2015
  • This paper analyzes the determinants of wage for the college and university graduates utilizing both individual-level and industry-level variables. We note that wage determination has multi-level structure in the sense that individual wage is influenced by individual-level variables (level-1) and industry-level (level-2) variables. Then, the assumption that individual wage is independent in the classical regression is violated. Therefore, this paper utilizes the hierarchical linear model (HLM). The major results are the followings. First, the multiple correspondence analysis including level-1 and 2 variables reveals that both level 1 and level 2 variables affects individual wages judging from the fact that the values of level 1 and level 2 variables differ across the different level of individual wage groups. Second, the decision tree analysis including level-1 and 2 variables shows that the most influential variable in wage determination is industry-level wage and the next is industry-level working hour, ages and sex in the decling order in. This suggests that the utilization of the HLM is appropriate since the characteristics of industry is important in determining the individual wage. Third, it is shown that the HLM model is the best compared to the other models which do not take level-1 and level-2 variables simultaneously into account.

Designing of the Statistical Models for Imprinting Patterns of Quantitative Traits Loci (QTL) in Swine (돼지에 있어서 양적 형질 유전자좌(QTL) 발현 특성 분석을 위한 통계적 검정 모형 설정)

  • Yoon D. H.;Kong H. S.;Cho Y. M.;Lee J. W.;Choi I. S.;Lee H. K.;Jeon G. J.;Oh S. J.;Cheong I. C.
    • Journal of Embryo Transfer
    • /
    • v.19 no.3
    • /
    • pp.291-299
    • /
    • 2004
  • Characterization of quantitative trait loci (QTL) was investigated in the experimental cross population between Berkshire and Yorkshire breed. A total of 512 F$_2$ offspring from 65 matting of F$_1$ parents were phenotyped the carcass traits included average daily gain (ADG), average backfat thickness (ABF), tenth rip backfat thickness (TRF), loin eye area (LEA), and last rip backfat thickness (LRF). All animals were genotyped for 125 markers across the genome. Marker linkage maps were derived and used in QTL analysis based on line cross least squares regression interval mapping. A decision tree to identify QTL with imprinting effects was developed based on tests against the Mendelian mode of QTL expression. To set the evidence of QTL presence, empirical significance thresholds were derived at chromosome-wise and genome-wise levels using specialized permutation strategies. Significance thresholds derived by the permutation test were validated in the data set based on simulation of a pedigree and data structure similar to the Berkshire-Yorkshire population. Genome scan revealed significant evidences for 13 imprinted QTLs affecting growth and body compositions of which nine were identified to be QTL with paternally expressed inheritance mode. Four of QTLs in the loin eye area (LEA), and tenth rip backfat thickness (TRF), a maternally expressed QTL were found on chromosome 10 and 12. These results support the useful statistical models to analyse the imprinting far the QTLs related carcass trait.

Key Food Selection for Assessement of Oral Health Related Quality of Life among Some Korean Elderly (일부 한국 노인 구강건강 관련 삶의 질 평가를 위한 핵심 음식 선택)

  • Hwang, Soo-Jeong
    • Journal of dental hygiene science
    • /
    • v.16 no.5
    • /
    • pp.361-369
    • /
    • 2016
  • Oral health can influence on diverse food intake, and food intake affect oral health related quality of life. The aim of this study was to select key foods to be able to represent oral health related quality of life in Korea. We used the data of 503 Korean older persons to participate in the oral health promotion programme in 2009. The low consumption or low intake foods with criteria in 2012 National Nutrition Statistics were eliminated among 30 foods of food intake ability (FIA) at first. Decision tree model, correlation analysis, factor analysis, and internal reliablity test were used for oral health related quailty of life (OHRQoL) key food selection. We selected 13 foods-hard persimmon, dried peanut, pickled radish, caramel, rib of pork, glutinous rice cake, cabbage kimchi, apple, yellow melon, boiled chicken meat, boiled fish, mandarin, noodles as OHRQoL Key Foods 13. Thirty foods of FIA and OHRQoL Key Foods 13 displayed the same pattern of variation among sociodemographic groups. In a regression model, both of 30 foods of FIA and OHRQoL Key Foods 13 influenced on oral health impact profile-14. The findings suggest that OHRQoL Key Foods 13 have good reliability and validity and be able to use in oral health survey.

A Study on Self-sufficiency for Hospital Injury Inpatients in Korea (우리나라 의료기관 입원손상환자의 자체충족도에 관한 연구)

  • Lee, Hee-Won;Park, Jong-Ho;Kang, Sung-Hong;Kim, Won-Joong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.12
    • /
    • pp.5779-5788
    • /
    • 2011
  • This study was conducted to comprehend the current status of regional self-sufficiency of Hospital injury inpatients and, based on this, to prepare some measures for improving the self-sufficiency. For this purpose, 2005 & 2008 Patient Survey data, regional medical utilization data of National Health Insurance Corporation, yearbook of Central Emergency Medical Center and evaluation results of emergency medical institutions were obtained. Frequency analysis, cross-tabulation, decision tree and logistic regression techniques were used in the analysis of data. Self-sufficiency in 'metropolitan city/Do' area was lowest for Chungcheongnam-do for the year 2005 and 2008, followed by Gyeongsangbuk-do, Gyeonggi-do and Jeollanam-do. As for the self-sufficiency in 'Si/Gun/Gu' area with regard to local medical supply, for both 2005 and 2008, It was higher when general hospital, district emergency medical center, regional emergency medical center and regional emergency medical institution existed in the residential area. It was also found that, the higher the quality level of local emergency medical institution, the higher the self-sufficiency. It was confirmed that, when promoting the national policy for injury patients, priority should be placed on 'Do' area where the level of emergency medical supply was low, and that enhancing the quality level of emergency medical institutions was helpful for the improvement of self-sufficiency.

Group Classification on Management Behavior of Diabetic Mellitus (당뇨 환자의 관리행태에 대한 군집 분류)

  • Kang, Sung-Hong;Choi, Soon-Ho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.2
    • /
    • pp.765-774
    • /
    • 2011
  • The purpose of this study is to provide informative statistics which can be used for effective Diabetes Management Programs. We collected and analyzed the data of 666 diabetic people who had participated in Korean National Health and Nutrition Examination Survey in 2007 and 2008. Group classification on management behavior of Diabetic Mellitus is based on the K-means clustering method. The Decision Tree method and Multiple Regression Analysis were used to study factors of the management behavior of Diabetic Mellitus. Diabetic people were largely classified into three categories: Health Behavior Program Group, Focused Management Program Group, and Complication Test Program Group. First, Health Behavior Program Group means that even though drug therapy and complication test are being well performed, people should still need to improve their health behavior such as exercising regularly and avoid drinking and smoking. Second, Focused Management Program Group means that they show an uncooperative attitude about treatment and complication test and also take a passive action to improve their health behavior. Third, Complication Test Program Group means that they take a positive attitude about treatment and improving their health behavior but they pay no attention to complication test to detect acute and chronic disease early. The main factor for group classification was to prove whether they have hyperlipidemia or not. This varied widely with an individual's gender, income, age, occupation, and self rated health. To improve the rate of diabetic management, specialized diabetic management programs should be applied depending on each group's character.

Exploring Feature Selection Methods for Effective Emotion Mining (효과적 이모션마이닝을 위한 속성선택 방법에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.3
    • /
    • pp.107-117
    • /
    • 2019
  • In the era of SNS, many people relies on it to express their emotions about various kinds of products and services. Therefore, for the companies eagerly seeking to investigate how their products and services are perceived in the market, emotion mining tasks using dataset from SNSs become important much more than ever. Basically, emotion mining is a branch of sentiment analysis which is based on BOW (bag-of-words) and TF-IDF. However, there are few studies on the emotion mining which adopt feature selection (FS) methods to look for optimal set of features ensuring better results. In this sense, this study aims to propose FS methods to conduct emotion mining tasks more effectively with better outcomes. This study uses Twitter and SemEval2007 dataset for the sake of emotion mining experiments. We applied three FS methods such as CFS (Correlation based FS), IG (Information Gain), and ReliefF. Emotion mining results were obtained from applying the selected features to nine classifiers. When applying DT (decision tree) to Tweet dataset, accuracy increases with CFS, IG, and ReliefF methods. When applying LR (logistic regression) to SemEval2007 dataset, accuracy increases with ReliefF method.