• Title/Summary/Keyword: decision tree regression

Search Result 324, Processing Time 0.026 seconds

Determinants of employee's wage using hierarchical linear model (위계적 선형모형을 이용한 대졸 신규취업자 임금 결정요인 분석)

  • Park, Sungik;Cho, Jangsik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.65-75
    • /
    • 2015
  • This paper analyzes the determinants of wage for the college and university graduates utilizing both individual-level and industry-level variables. We note that wage determination has multi-level structure in the sense that individual wage is influenced by individual-level variables (level-1) and industry-level (level-2) variables. Then, the assumption that individual wage is independent in the classical regression is violated. Therefore, this paper utilizes the hierarchical linear model (HLM). The major results are the followings. First, the multiple correspondence analysis including level-1 and 2 variables reveals that both level 1 and level 2 variables affects individual wages judging from the fact that the values of level 1 and level 2 variables differ across the different level of individual wage groups. Second, the decision tree analysis including level-1 and 2 variables shows that the most influential variable in wage determination is industry-level wage and the next is industry-level working hour, ages and sex in the decling order in. This suggests that the utilization of the HLM is appropriate since the characteristics of industry is important in determining the individual wage. Third, it is shown that the HLM model is the best compared to the other models which do not take level-1 and level-2 variables simultaneously into account.

Designing of the Statistical Models for Imprinting Patterns of Quantitative Traits Loci (QTL) in Swine (돼지에 있어서 양적 형질 유전자좌(QTL) 발현 특성 분석을 위한 통계적 검정 모형 설정)

  • Yoon D. H.;Kong H. S.;Cho Y. M.;Lee J. W.;Choi I. S.;Lee H. K.;Jeon G. J.;Oh S. J.;Cheong I. C.
    • Journal of Embryo Transfer
    • /
    • v.19 no.3
    • /
    • pp.291-299
    • /
    • 2004
  • Characterization of quantitative trait loci (QTL) was investigated in the experimental cross population between Berkshire and Yorkshire breed. A total of 512 F$_2$ offspring from 65 matting of F$_1$ parents were phenotyped the carcass traits included average daily gain (ADG), average backfat thickness (ABF), tenth rip backfat thickness (TRF), loin eye area (LEA), and last rip backfat thickness (LRF). All animals were genotyped for 125 markers across the genome. Marker linkage maps were derived and used in QTL analysis based on line cross least squares regression interval mapping. A decision tree to identify QTL with imprinting effects was developed based on tests against the Mendelian mode of QTL expression. To set the evidence of QTL presence, empirical significance thresholds were derived at chromosome-wise and genome-wise levels using specialized permutation strategies. Significance thresholds derived by the permutation test were validated in the data set based on simulation of a pedigree and data structure similar to the Berkshire-Yorkshire population. Genome scan revealed significant evidences for 13 imprinted QTLs affecting growth and body compositions of which nine were identified to be QTL with paternally expressed inheritance mode. Four of QTLs in the loin eye area (LEA), and tenth rip backfat thickness (TRF), a maternally expressed QTL were found on chromosome 10 and 12. These results support the useful statistical models to analyse the imprinting far the QTLs related carcass trait.

Key Food Selection for Assessement of Oral Health Related Quality of Life among Some Korean Elderly (일부 한국 노인 구강건강 관련 삶의 질 평가를 위한 핵심 음식 선택)

  • Hwang, Soo-Jeong
    • Journal of dental hygiene science
    • /
    • v.16 no.5
    • /
    • pp.361-369
    • /
    • 2016
  • Oral health can influence on diverse food intake, and food intake affect oral health related quality of life. The aim of this study was to select key foods to be able to represent oral health related quality of life in Korea. We used the data of 503 Korean older persons to participate in the oral health promotion programme in 2009. The low consumption or low intake foods with criteria in 2012 National Nutrition Statistics were eliminated among 30 foods of food intake ability (FIA) at first. Decision tree model, correlation analysis, factor analysis, and internal reliablity test were used for oral health related quailty of life (OHRQoL) key food selection. We selected 13 foods-hard persimmon, dried peanut, pickled radish, caramel, rib of pork, glutinous rice cake, cabbage kimchi, apple, yellow melon, boiled chicken meat, boiled fish, mandarin, noodles as OHRQoL Key Foods 13. Thirty foods of FIA and OHRQoL Key Foods 13 displayed the same pattern of variation among sociodemographic groups. In a regression model, both of 30 foods of FIA and OHRQoL Key Foods 13 influenced on oral health impact profile-14. The findings suggest that OHRQoL Key Foods 13 have good reliability and validity and be able to use in oral health survey.

A Study on Self-sufficiency for Hospital Injury Inpatients in Korea (우리나라 의료기관 입원손상환자의 자체충족도에 관한 연구)

  • Lee, Hee-Won;Park, Jong-Ho;Kang, Sung-Hong;Kim, Won-Joong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.12
    • /
    • pp.5779-5788
    • /
    • 2011
  • This study was conducted to comprehend the current status of regional self-sufficiency of Hospital injury inpatients and, based on this, to prepare some measures for improving the self-sufficiency. For this purpose, 2005 & 2008 Patient Survey data, regional medical utilization data of National Health Insurance Corporation, yearbook of Central Emergency Medical Center and evaluation results of emergency medical institutions were obtained. Frequency analysis, cross-tabulation, decision tree and logistic regression techniques were used in the analysis of data. Self-sufficiency in 'metropolitan city/Do' area was lowest for Chungcheongnam-do for the year 2005 and 2008, followed by Gyeongsangbuk-do, Gyeonggi-do and Jeollanam-do. As for the self-sufficiency in 'Si/Gun/Gu' area with regard to local medical supply, for both 2005 and 2008, It was higher when general hospital, district emergency medical center, regional emergency medical center and regional emergency medical institution existed in the residential area. It was also found that, the higher the quality level of local emergency medical institution, the higher the self-sufficiency. It was confirmed that, when promoting the national policy for injury patients, priority should be placed on 'Do' area where the level of emergency medical supply was low, and that enhancing the quality level of emergency medical institutions was helpful for the improvement of self-sufficiency.

Group Classification on Management Behavior of Diabetic Mellitus (당뇨 환자의 관리행태에 대한 군집 분류)

  • Kang, Sung-Hong;Choi, Soon-Ho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.2
    • /
    • pp.765-774
    • /
    • 2011
  • The purpose of this study is to provide informative statistics which can be used for effective Diabetes Management Programs. We collected and analyzed the data of 666 diabetic people who had participated in Korean National Health and Nutrition Examination Survey in 2007 and 2008. Group classification on management behavior of Diabetic Mellitus is based on the K-means clustering method. The Decision Tree method and Multiple Regression Analysis were used to study factors of the management behavior of Diabetic Mellitus. Diabetic people were largely classified into three categories: Health Behavior Program Group, Focused Management Program Group, and Complication Test Program Group. First, Health Behavior Program Group means that even though drug therapy and complication test are being well performed, people should still need to improve their health behavior such as exercising regularly and avoid drinking and smoking. Second, Focused Management Program Group means that they show an uncooperative attitude about treatment and complication test and also take a passive action to improve their health behavior. Third, Complication Test Program Group means that they take a positive attitude about treatment and improving their health behavior but they pay no attention to complication test to detect acute and chronic disease early. The main factor for group classification was to prove whether they have hyperlipidemia or not. This varied widely with an individual's gender, income, age, occupation, and self rated health. To improve the rate of diabetic management, specialized diabetic management programs should be applied depending on each group's character.

Exploring Feature Selection Methods for Effective Emotion Mining (효과적 이모션마이닝을 위한 속성선택 방법에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.3
    • /
    • pp.107-117
    • /
    • 2019
  • In the era of SNS, many people relies on it to express their emotions about various kinds of products and services. Therefore, for the companies eagerly seeking to investigate how their products and services are perceived in the market, emotion mining tasks using dataset from SNSs become important much more than ever. Basically, emotion mining is a branch of sentiment analysis which is based on BOW (bag-of-words) and TF-IDF. However, there are few studies on the emotion mining which adopt feature selection (FS) methods to look for optimal set of features ensuring better results. In this sense, this study aims to propose FS methods to conduct emotion mining tasks more effectively with better outcomes. This study uses Twitter and SemEval2007 dataset for the sake of emotion mining experiments. We applied three FS methods such as CFS (Correlation based FS), IG (Information Gain), and ReliefF. Emotion mining results were obtained from applying the selected features to nine classifiers. When applying DT (decision tree) to Tweet dataset, accuracy increases with CFS, IG, and ReliefF methods. When applying LR (logistic regression) to SemEval2007 dataset, accuracy increases with ReliefF method.

Development of prediction model identifying high-risk older persons in need of long-term care (장기요양 필요 발생의 고위험 대상자 발굴을 위한 예측모형 개발)

  • Song, Mi Kyung;Park, Yeongwoo;Han, Eun-Jeong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.457-468
    • /
    • 2022
  • In aged society, it is important to prevent older people from being disability needing long-term care. The purpose of this study is to develop a prediction model to discover high-risk groups who are likely to be beneficiaries of Long-Term Care Insurance. This study is a retrospective study using database of National Health Insurance Service (NHIS) collected in the past of the study subjects. The study subjects are 7,724,101, the population over 65 years of age registered for medical insurance. To develop the prediction model, we used logistic regression, decision tree, random forest, and multi-layer perceptron neural network. Finally, random forest was selected as the prediction model based on the performances of models obtained through internal and external validation. Random forest could predict about 90% of the older people in need of long-term care using DB without any information from the assessment of eligibility for long-term care. The findings might be useful in evidencebased health management for prevention services and can contribute to preemptively discovering those who need preventive services in older people.

Metabolic Diseases Classification Models according to Food Consumption using Machine Learning (머신러닝을 활용한 식품소비에 따른 대사성 질환 분류 모델)

  • Hong, Jun Ho;Lee, Kyung Hee;Lee, Hye Rim;Cheong, Hwan Suk;Cho, Wan-Sup
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.3
    • /
    • pp.354-360
    • /
    • 2022
  • Metabolic disease is a disease with a prevalence of 26% in Korean, and has three of the five states of abdominal obesity, hypertension, hunger glycemic disorder, high neutral fat, and low HDL cholesterol at the same time. This paper links the consumer panel data of the Rural Development Agency(RDA) and the medical care data of the National Health Insurance Service(NHIS) to generate a classification model that can be divided into a metabolic disease group and a control group through food consumption characteristics, and attempts to compare the differences. Many existing domestic and foreign studies related to metabolic diseases and food consumption characteristics are disease correlation studies of specific food groups and specific ingredients, and this paper is logistic considering all food groups included in the general diet. We created a classification model using regression, a decision tree-based classification model, and a classification model using XGBoost. Of the three models, the high-precision model is the XGBoost classification model, but the accuracy was not high at less than 0.7. As a future study, it is necessary to extend the observation period for food consumption in the patient group to more than 5 years and to study the metabolic disease classification model after converting the food consumed into nutritional characteristics.

Prediction Model for unfavorable Outcome in Spontaneous Intracerebral Hemorrhage Based on Machine Learning

  • Shengli Li;Jianan Zhang;Xiaoqun Hou;Yongyi Wang;Tong Li;Zhiming Xu;Feng Chen;Yong Zhou;Weimin Wang;Mingxing Liu
    • Journal of Korean Neurosurgical Society
    • /
    • v.67 no.1
    • /
    • pp.94-102
    • /
    • 2024
  • Objective : The spontaneous intracerebral hemorrhage (ICH) remains a significant cause of mortality and morbidity throughout the world. The purpose of this retrospective study is to develop multiple models for predicting ICH outcomes using machine learning (ML). Methods : Between January 2014 and October 2021, we included ICH patients identified by computed tomography or magnetic resonance imaging and treated with surgery. At the 6-month check-up, outcomes were assessed using the modified Rankin Scale. In this study, four ML models, including Support Vector Machine (SVM), Decision Tree C5.0, Artificial Neural Network, Logistic Regression were used to build ICH prediction models. In order to evaluate the reliability and the ML models, we calculated the area under the receiver operating characteristic curve (AUC), specificity, sensitivity, accuracy, positive likelihood ratio (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR). Results : We identified 71 patients who had favorable outcomes and 156 who had unfavorable outcomes. The results showed that the SVM model achieved the best comprehensive prediction efficiency. For the SVM model, the AUC, accuracy, specificity, sensitivity, PLR, NLR, and DOR were 0.91, 0.92, 0.92, 0.93, 11.63, 0.076, and 153.03, respectively. For the SVM model, we found the importance value of time to operating room (TOR) was higher significantly than other variables. Conclusion : The analysis of clinical reliability showed that the SVM model achieved the best comprehensive prediction efficiency and the importance value of TOR was higher significantly than other variables.

VKOSPI Forecasting and Option Trading Application Using SVM (SVM을 이용한 VKOSPI 일 중 변화 예측과 실제 옵션 매매에의 적용)

  • Ra, Yun Seon;Choi, Heung Sik;Kim, Sun Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.177-192
    • /
    • 2016
  • Machine learning is a field of artificial intelligence. It refers to an area of computer science related to providing machines the ability to perform their own data analysis, decision making and forecasting. For example, one of the representative machine learning models is artificial neural network, which is a statistical learning algorithm inspired by the neural network structure of biology. In addition, there are other machine learning models such as decision tree model, naive bayes model and SVM(support vector machine) model. Among the machine learning models, we use SVM model in this study because it is mainly used for classification and regression analysis that fits well to our study. The core principle of SVM is to find a reasonable hyperplane that distinguishes different group in the data space. Given information about the data in any two groups, the SVM model judges to which group the new data belongs based on the hyperplane obtained from the given data set. Thus, the more the amount of meaningful data, the better the machine learning ability. In recent years, many financial experts have focused on machine learning, seeing the possibility of combining with machine learning and the financial field where vast amounts of financial data exist. Machine learning techniques have been proved to be powerful in describing the non-stationary and chaotic stock price dynamics. A lot of researches have been successfully conducted on forecasting of stock prices using machine learning algorithms. Recently, financial companies have begun to provide Robo-Advisor service, a compound word of Robot and Advisor, which can perform various financial tasks through advanced algorithms using rapidly changing huge amount of data. Robo-Adviser's main task is to advise the investors about the investor's personal investment propensity and to provide the service to manage the portfolio automatically. In this study, we propose a method of forecasting the Korean volatility index, VKOSPI, using the SVM model, which is one of the machine learning methods, and applying it to real option trading to increase the trading performance. VKOSPI is a measure of the future volatility of the KOSPI 200 index based on KOSPI 200 index option prices. VKOSPI is similar to the VIX index, which is based on S&P 500 option price in the United States. The Korea Exchange(KRX) calculates and announce the real-time VKOSPI index. VKOSPI is the same as the usual volatility and affects the option prices. The direction of VKOSPI and option prices show positive relation regardless of the option type (call and put options with various striking prices). If the volatility increases, all of the call and put option premium increases because the probability of the option's exercise possibility increases. The investor can know the rising value of the option price with respect to the volatility rising value in real time through Vega, a Black-Scholes's measurement index of an option's sensitivity to changes in the volatility. Therefore, accurate forecasting of VKOSPI movements is one of the important factors that can generate profit in option trading. In this study, we verified through real option data that the accurate forecast of VKOSPI is able to make a big profit in real option trading. To the best of our knowledge, there have been no studies on the idea of predicting the direction of VKOSPI based on machine learning and introducing the idea of applying it to actual option trading. In this study predicted daily VKOSPI changes through SVM model and then made intraday option strangle position, which gives profit as option prices reduce, only when VKOSPI is expected to decline during daytime. We analyzed the results and tested whether it is applicable to real option trading based on SVM's prediction. The results showed the prediction accuracy of VKOSPI was 57.83% on average, and the number of position entry times was 43.2 times, which is less than half of the benchmark (100 times). A small number of trading is an indicator of trading efficiency. In addition, the experiment proved that the trading performance was significantly higher than the benchmark.