• Title/Summary/Keyword: decision tree regression

Search Result 324, Processing Time 0.033 seconds

Convergence-based analysis on geographical variations of the smoking rates (융복합 기반의 지역간 흡연율의 변이 분석)

  • Lim, Ji-Hye;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.13 no.8
    • /
    • pp.375-385
    • /
    • 2015
  • This study aims to identify geographical variations and factors that affect smoking rates. The data are collected from the Community Health Survey conducted between 2009 and 2011 by Korea Centers for Disease Control and Prevention and other government organizations. Correlation and multiple regression analysis were used to examine the factors influencing smoking rates. For the purpose of investigating regional variations, we employed a decision tree model. The study has found that the significant factors associated with geographical variations in the smoking rates were the rate of hazardous drinking, the completion rate of hypertension education, the experience rate of anti-smoking campaigns, stress awareness rate, hypertension prevalence, health insurance cost, diabetes prevalence, obesity rate, and strength training rate. Convergence-based analysis on geographical variations of the smoking rates is highly important when the regionally customized healthcare programs is implemented. In the future, it is necessary to develop effective program and customized approach for the regions of high smoking rates. Our study is expected to be used as meaningful data for the design of effective health care programs and assessments to lead effective non-smoking program.

A Convergence Study in the Severity-adjusted Mortality Ratio on inpatients with multiple chronic conditions (복합만성질환 입원환자의 중증도 보정 사망비에 대한 융복합 연구)

  • Seo, Young-Suk;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.245-257
    • /
    • 2015
  • This study was to develop the predictive model for severity-adjusted mortality of inpatients with multiple chronic conditions and analyse the factors on the variation of hospital standardized mortality ratio(HSMR) to propose the plan to reduce the variation. We collect the data "Korean National Hospital Discharge In-depth Injury Survey" from 2008 to 2010 and select the final 110,700 objects of study who have chronic diseases for principal diagnosis and who are over the age of 30 with more than 2 chronic diseases including principal diagnosis. We designed a severity-adjusted mortality predictive model with using data-mining methods (logistic regression analysis, decision tree and neural network method). In this study, we used the predictive model for severity-adjusted mortality ratio by the decision tree using Elixhauser comorbidity index. As the result of the hospital standardized mortality ratio(HSMR) of inpatients with multiple chronic conditions, there were statistically significant differences in HSMR by the insurance type, bed number of hospital, and the location of hospital. We should find the method based on the result of this study to manage mortality ratio of inpatients with multiple chronic conditions efficiently as the national level. So we should make an effort to increase the quality of medical treatment for inpatients with multiple chronic diseases and to reduce growing medical expenses.

A Prediction Model for the Development of Cataract Using Random Forests (Random Forests 기법을 이용한 백내장 예측모형 - 일개 대학병원 건강검진 수검자료에서 -)

  • Han, Eun-Jeong;Song, Ki-Jun;Kim, Dong-Geon
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.771-780
    • /
    • 2009
  • Cataract is the main cause of blindness and visual impairment, especially, age-related cataract accounts for about half of the 32 million cases of blindness worldwide. As the life expectancy and the expansion of the elderly population are increasing, the cases of cataract increase as well, which causes a serious economic and social problem throughout the country. However, the incidence of cataract can be reduced dramatically through early diagnosis and prevention. In this study, we developed a prediction model of cataracts for early diagnosis using hospital data of 3,237 subjects who received the screening test first and then later visited medical center for cataract check-ups cataract between 1994 and 2005. To develop the prediction model, we used random forests and compared the predictive performance of this model with other common discriminant models such as logistic regression, discriminant model, decision tree, naive Bayes, and two popular ensemble model, bagging and arcing. The accuracy of random forests was 67.16%, sensitivity was 72.28%, and main factors included in this model were age, diabetes, WBC, platelet, triglyceride, BMI and so on. The results showed that it could predict about 70% of cataract existence by screening test without any information from direct eye examination by ophthalmologist. We expect that our model may contribute to diagnose cataract and help preventing cataract in early stages.

Prediction of Carcass Yield by Ultrasound in Hanwoo (초음파 측정에 의한 한우의 도체육량 예측)

  • Rhee, Y. J.;Jeon, K. J.;Choi, S. B.;Seok, H. K.;Kim, S. J.;Lee, S. K.;Song, Y. H.
    • Journal of Animal Science and Technology
    • /
    • v.45 no.2
    • /
    • pp.335-342
    • /
    • 2003
  • This study was conducted to predict the carcass yield traits using ultrasound before slaughter and to enhance the prediction accuracy of carcass yield grade by applying various strategies. For this experiment, five hundred seventy three Hanwoo steers of 24 months of age were used. Difference between ultrasound result and carcass measure of BFT and LMA was 0.6$\pm$1.65mm and 0.7$\pm$5.56cm2, respectively. Correlation coefficient between ultrasound result and carcass measure of BFT and LMA was 0.86 and 0.82, respectively (p<0.001). Results for improving predictions of yield grade by four methods-the Korean yield grade index equation, fat depth alone, regression and decision tree methods were 80.3%, 81.3%, 80.1% and 81.8%, respectively. We conclude that the decision tree method can easily predict yield grade and is also useful for increasing prediction accuracy rate.

Study on Detection for Cochlodinium polykrikoides Red Tide using the GOCI image and Machine Learning Technique (GOCI 영상과 기계학습 기법을 이용한 Cochlodinium polykrikoides 적조 탐지 기법 연구)

  • Unuzaya, Enkhjargal;Bak, Su-Ho;Hwang, Do-Hyun;Jeong, Min-Ji;Kim, Na-Kyeong;Yoon, Hong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.6
    • /
    • pp.1089-1098
    • /
    • 2020
  • In this study, we propose a method to detect red tide Cochlodinium Polykrikoide using by machine learning and geostationary marine satellite images. To learn the machine learning model, GOCI Level 2 data were used, and the red tide location data of the National Fisheries Research and Development Institute was used. The machine learning model used logistic regression model, decision tree model, and random forest model. As a result of the performance evaluation, compared to the traditional GOCI image-based red tide detection algorithm without machine learning (Son et al., 2012) (75%), it was confirmed that the accuracy was improved by about 13~22%p (88~98%). In addition, as a result of comparing and analyzing the detection performance between machine learning models, the random forest model (98%) showed the highest detection accuracy.It is believed that this machine learning-based red tide detection algorithm can be used to detect red tide early in the future and track and monitor its movement and spread.

The Study on Hypertension Cure Rate Management Centering around Wellness Local Community : With GwangJu as a Central Figure (웰니스 지역사회 중심의 고혈압 치료율 관리 방안에 관한 연구 : 광주광역시 중심으로)

  • Yang, Yu-Jeong;Park, Jong-Ho
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.8
    • /
    • pp.351-361
    • /
    • 2021
  • This study was conducted to identify the factors of hypertension treatment in Gwangju and to establish a hypertension cure rate management plan by using local community health surveys to provide the hypertension cure rate management plan centering around the wellness local community. The research collected 13,714 Gwangju research data among a total of 685,820 local community health surveys of KDCA (Korea Disease Control and Prevention Agency) from 2017 to 2019. Among the data, 2,941 subjects, those with diagnosed hypertension aged over 30, were selected and analyzed through SAS 9.4, SAS Enterprise Miner 15.1. The results are as follows. The differences in hypertension diagnosis cure rate in Gwangju based on the subjects' socioeconomic characteristics were shown in gender, age, marital status, level of educational attainment, economic activity status, and monthly income. The significant differences in hypertension cure rate based on health behavior characteristics were shown in current smoking, monthly alcohol consumption, high-risk drinking, breakfast, recognition of good health level, diabetes and treatment, annual unmet medical needs, and annual health center use. As a result of the logistic regression analysis and interactive decision tree analysis to identify the factors affecting hypertension treatment, the research found that the factors that appear are age, marital status, diabetes and treatment, and annual unmet medical needs. Accordingly, to increase the recognition of the importance of hypertension treatment to people of young ages and not to develop complications, public health-educational effort in Gwangju is needed with an effective preparation plan.

A Study on the Prediction of Mortality Rate after Lung Cancer Diagnosis for Men and Women in 80s, 90s, and 100s Based on Deep Learning (딥러닝 기반 80대·90대·100대 남녀 대상 폐암 진단 후 사망률 예측에 관한 연구)

  • Kyung-Keun Byun;Doeg-Gyu Lee;Se-Young Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.2
    • /
    • pp.87-96
    • /
    • 2023
  • Recently, research on predicting the treatment results of diseases using deep learning technology is also active in the medical community. However, small patient data and specific deep learning algorithms were selected and utilized, and research was conducted to show meaningful results under specific conditions. In this study, in order to generalize the research results, patients were further expanded and subdivided to derive the results of a study predicting mortality after lung cancer diagnosis for men and women in their 80s, 90s, and 100s. Using AutoML, which provides large-scale medical information and various deep learning algorithms from the Health Insurance Review and Assessment Service, five algorithms such as Decision Tree, Random Forest, Gradient Boosting, XGBoost, and Logistic Registration were created to predict mortality rates for 84 months after lung cancer diagnosis. As a result of the study, men in their 80s and 90s had a higher mortality prediction rate than women, and women in their 100s had a higher mortality prediction rate than men. And the factor that has the greatest influence on the mortality rate was analyzed as the treatment period.

Matching prediction on Korean professional volleyball league (한국 프로배구 연맹의 경기 예측 및 영향요인 분석)

  • Heesook Kim;Nakyung Lee;Jiyoon Lee;Jongwoo Song
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.3
    • /
    • pp.323-338
    • /
    • 2024
  • This study analyzes the Korean professional volleyball league and predict match outcomes using popular machine learning classification methods. Match data from the 2012/2013 to 2022/2023 seasons for both male and female leagues were collected, including match details. Two different data structures were applied to the models: Separating matches results into two teams and performance differentials between the home and away teams. These two data structures were applied to construct a total of four predictive models, encompassing both male and female leagues. As specific variable values used in the models are unavailable before the end of matches, the results of the most recent 3 to 4 matches, up until just before today's match, were preprocessed and utilized as variables. Logistc Regrssion, Decision Tree, Bagging, Random Forest, Xgboost, Adaboost, and Light GBM, were employed for classification, and the model employing Random Forest showed the highest predictive performance. The results indicated that while significant variables varied by gender and data structure, set success rate, blocking points scored, and the number of faults were consistently crucial. Notably, our win-loss prediction model's distinctiveness lies in its ability to provide pre-match forecasts rather than post-event predictions.

Analysis of Survivability for Combatants during Offensive Operations at the Tactical Level (전술제대 공격작전간 전투원 생존성에 관한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Kim, GakGyu
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.5
    • /
    • pp.921-932
    • /
    • 2015
  • This study analyzed military personnel survivability in regards to offensive operations according to the scientific military training data of a reinforced infantry battalion. Scientific battle training was conducted at the Korea Combat Training Center (KCTC) training facility and utilized scientific military training equipment that included MILES and the main exercise control system. The training audience freely engaged an OPFOR who is an expert at tactics and weapon systems. It provides a statistical analysis of data in regards to state-of-the-art military training because the scientific battle training system saves and utilizes all training zone data for analysis and after action review as well as offers training control during the training period. The methodologies used the Cox PH modeling (which does not require parametric distribution assumptions) and decision tree modeling for survival data such as CART, GUIDE, and CTREE for richer and easier interpretation. The variables that violate the PH assumption were stratified and analyzed. Since the Cox PH model result was not easy to interpret the period of service, additional interpretation was attempted through univariate local regression. CART, GUIDE, and CTREE formed different tree models which allow for various interpretations.

Monetary policy synchronization of Korea and United States reflected in the statements (통화정책 결정문에 나타난 한미 통화정책 동조화 현상 분석)

  • Chang, Youngjae
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.1
    • /
    • pp.115-126
    • /
    • 2021
  • Central banks communicate with the market through a statement on the direction of monetary policy while implementing monetary policy. The rapid contraction of the global economy due to the recent Covid-19 pandemic could be compared to the crisis situation during the 2008 global financial crisis. In this paper, we analyzed the text data from the monetary policy statements of the Bank of Korea and Fed reflecting monetary policy directions focusing on how they were affected in the face of a global crisis. For analysis, we collected the text data of the two countries' monetary policy direction reports published from October 1999 to September 2020. We examined the semantic features using word cloud and word embedding, and analyzed the trend of the similarity between two countries' documents through a piecewise regression tree model. The visualization result shows that both the Bank of Korea and the US Fed have published the statements with refined words of clear meaning for transparent and effective communication with the market. The analysis of the dissimilarity trend of documents in both countries also shows that there exists a sense of synchronization between them as the rapid changes in the global economic environment affect monetary policy.