• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.029 seconds

Developing the high-risk drinking predictive model in Korea using the data mining technique (데이터마이닝 기법을 활용한 한국인의 고위험 음주 예측모형 개발 연구)

  • Park, Il-Su;Han, Jun-Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1337-1348
    • /
    • 2017
  • In this paper, we develop the high-risk drinking predictive model in Korea using the cross-sectional data from Korea Community Health Survey (2014). We perform the logistic regression analysis, the decision tree analysis, and the neural network analysis using the data mining technique. The results of logistic regression analysis showed that men in their forties had a high risk and the risk of office workers and sales workers were high. Especially, current smokers had higher risk of high-risk drinking. Neural network analysis and logistic regression were the most significant in terms of AUROC (area under a receiver operation characteristic curve) among the three models. The high-risk drinking predictive model developed in this study and the selection method of the high-risk intensive drinking group can be the basis for providing more effective health care services such as hazardous drinking prevention education, and improvement of drinking program.

Effect of Mothers' Oral Health Knowledge and Behaviour on Dental Caries in Their Preschool Children (데이터마이닝을 이용한 유치치아우식증 관련요인 분석)

  • Kim, Jin-Soo;Kim, Hyo-Jin;Jorn, Hong-Suk
    • Journal of Korean society of Dental Hygiene
    • /
    • v.5 no.2
    • /
    • pp.171-184
    • /
    • 2005
  • In order to investigate correlation between mother's dental ca re for her children and their dental caries, this study was conducted wi th the dental examination record of 365 children who showed the same number of questionnaires with those examined for dental conditions and questionnaires written by mothers among children between three and six years of age and their mothers in Yeoncheon, Gyeonggi province in June 2004 to estimate frequency and percentage of general properties of subjects and mother's oral health care behaviors for her children by research items, to carry out cross-tabulation analysis and correlation analysis following Chi-square distribution for the presence of dental caries in deciduous teeth and oral health care behaviors, and to use decision tree analysis among data mining techniques for those factors associated with the presence of dental caries in deciduous teeth, and drew the following conclusions. 1. For mother's oral health care behaviors and attitudes for her children, 225 mothers(61.6%) confirmed their children's teeth-brushing; 278(76.2%) used no fluorine; and 286(78.6%) observed their children's teeth, 322 mothers(88.2%) instructed their children in teeth-brushing while 268 (73.4%) provided dental care, 232 mothers(63.7%) treated their children's cavity; 290(79.4%) believed that their children had good dental conditions; and 294(80.5%) answered that they began to provide their children with dental care in deciduous teeth. 2. As for the presence of dental caries in deciduous teeth and dental health care behaviors, there were statistically significant differences in employment, confirmation after teeth-brushing, teeth observation, instruction in time for teeth-brushing, use of fluorine, cavity treatment, time for dental care, and perception of dental conditions(p<0.05). 3. As for correlation between dental caries in deciduous teeth and oral health care behaviors, mothers who worked, who believed that their children didn't have good dental condition, and who thought that it was necessary to begin to provide dental care in permanent teeth were found to get their children to suffer from dental caries in deciduous teeth. Besides, those who failed to confirm teeth-brushing, who used no fluorine, and who failed to observe teeth and gave no instruction in time for teeth-brushing were shown to get their children to suffer from dental caries in deciduous teeth. 4. Variables to determine the presence of dental caries in deciduous teeth were classified by cavity treatment, mother's employment, time for dental care, and observation of children's teeth. The first node to determine the presence of dental caries in deciduous teeth was found to be cavity treatment; the next criteria for classification after cavity treatment were shown to be mother's employment and time for dental care. In case of children with no cavity, they were found to be mother's employment and teeth observation.

  • PDF

Big Data Analysis for Strategic Use of Urban Brands: Case Study Seoul city brand "I SEOUL U" (도시 브랜드의 전략적 활용을 위한 빅데이터 분석 : 서울시 도시 브랜드 "I SEOUL U" 사례)

  • Lim, Haewen
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.197-213
    • /
    • 2022
  • In this study, text mining analysis was performed on online big data for recognition and assessment of urban brand I Seoul U. To this end, TEXTOM, a processing program for data acquisition and analysis was used, and the 'I SEOUL U' keyword was selected as an analysis keyword. Keyword analysis shows the keywords associated with I Seoul U to be as follows: First, as a business and marketing term, keywords include pop-up store, gallery, co-branding, (festival, etc.), commodities, private companies and online. Second, as an event-related term, keywords include Han River, tree-planting day, tree planting, Hongdae, Christmas, Mapo, Jung-gu, Sejong University, and festival. Third, as a promotional term, keywords include robotics engineer Dr. Dennis Hong, Government, Art and Korea. In the N Gram analysis, as the city brand of Seoul, I Seoul U, in the public interest, was found to contribute to the commercial activities of private companies. In connection-oriented analysis, business and marketing, events, and promotions have been derived as categories. In matrix analysis, it was found that the products of the pop-up store are mainly developed, and products in the form of co-branding were being developed. In the topic modeling, a total of 10 topics were extracted and needs for commercial utilization and information for event festivals were mostly found.

A Study of Influencing Factors on World Handball Win-Loss using the Decision Tree Analysis (의사결정나무 분석을 통한 세계핸드볼 승패결정요인 분석)

  • Kim, Hyunchul
    • Journal of Digital Convergence
    • /
    • v.19 no.5
    • /
    • pp.461-468
    • /
    • 2021
  • The purpose of this study is to collect official records of the 2019 Men's and Women's Handball World Championships to identify important shooting variables that determine the team's record of winning or losing. After collecting 192 games of men's and women's national teams from 24 countries and verifying the difference in competition records according to the winning and losing groups, the decision tree method, one of the data mining techniques, is analyzed. According to the analysis, the 9m shooting success rate and Near shooting success rate were the most important factors for both men and women. Men win 83.3% if the 9m shooting success rate is 32.5% or higher and the Near shooting success rate is 67.5%, and women win 75% if the 9m shooting success rate is 75% or more and the Near shooting success rate is 51%. Also, the women's yellow cards are considered important variables that determine victory or defeat. In conclusion, both men and women were able to identify the factors of winning and losing decision shooting, but follow-up studies are needed considering the relativity of various record variables and performance in future handball.

Analyzing vocational outcomes of people with hearing impairments : A data mining approach (청각장애인의 취업결정요인 분석 연구 -데이터마이닝 기법(Exhaustive CHAID)의 적용)

  • Shin, Hyun-Uk
    • Journal of Digital Convergence
    • /
    • v.13 no.11
    • /
    • pp.449-459
    • /
    • 2015
  • The purpose of this study was to examine demographic, human capital and service factors affecting employment outcomes of people with hearing impairments. The total of 422 individuals (age from 20 years to 65 years) with hearing impairments were collected from the Panel Survey of Employment for the Disabled from Korea Employment Agency for the Disabled. The dependent variable is employment outcomes. The predictor variables include a set of personal history, human capital and rehabilitation service variables. The chi-squared automatic interaction detector (CHAID) analysis revealed that the status of the national basic livelihood security played a determining role in predicting the employment of people with hearing impairments. Also, it was found that the three factors of the status on the national basic livelihood security, needed help about activities of dailey living, licenses & employment service factors created bigger synergy effect when they inter-complemented one another.

Spatial Information Data Construction and Data Mining Analysis for Topography Investigation of Land Characteristics (토지특성 고저조사를 위한 공간정보 데이터 구축과 데이터 마이닝 분석)

  • Choi, Jin Ho;Kim, Jun Hyun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.507-516
    • /
    • 2019
  • The investigation of land characteristics is an important task for the calculation of officially land prices and standard comparison table of land price. Therefore, it should be done objectively and consistently. However, the current investigation system is mainly done by researcher's subjective judgment. Therefore, the objectivity and consistency of this investigation is not guaranteed and questionable. In this study, we first defined the problem by analyzing the current land topography investigation method. In addition, in order to investigate the land topography, the geometry of the parcel is quantified by spatial information and applied to the decision tree based method(C4.5) to produce the final result. This study intended to extract the parcel characteristics data of the topographic by the use of spatial information and to apply the information to the C4.5, there by suggesting a method for addressing the problems. The findings showed approximately 93.5% between the results of topography classification estimated with rules learned by C4.5.

The effective management of length of stay for patients with acute myocardial infarction in the era of digital hospital (디지털 병원시대의 급성심근경색증 환자 재원일수의 효율적 관리 방안)

  • Choi, Hee-Sun;Lim, Ji-Hye;Kim, Won-Joong;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.10 no.1
    • /
    • pp.413-422
    • /
    • 2012
  • In this study, we developed the severity-adjusted length of stay (LOS) model for acute myocardial infarction patients using data from the hospital discharge survey and proposed management of medical quality and development of policy. The dataset was taken from 2,309 database of the hospital discharge survey from 2004 to 2006. The severity-adjusted LOS model for the acute myocardial infarction (AMI) patients was developed by data mining analysis. From decision making tree model, the main reasons for LOS of AMI patients were CABG and comorbidity. The difference between severity-adjusted LOS from the ensemble model and real LOS was compared and it was confirmed that insurance type and location of hospital were statistically associated with LOS. And to conclude, hospitals should develop the severity-adjusted LOS model for frequent diseases to manage LOS variations efficiently and apply it into the medical information system.

The Variation Factors of Severity-Adjusted Length of Stay in CABG (관상동맥우회술 시행환자의 중증도 보정 재원일수 변이에 관한 연구)

  • Kim, Sun-Ja;Kang, Sung-Hong;Kim, Won-Joong;Kim, Yoo-Mi
    • Journal of Korean Society for Quality Management
    • /
    • v.39 no.3
    • /
    • pp.391-399
    • /
    • 2011
  • Our study was carried out to analyze the variation factors of severity-adjusted length of stay(LOS) in coronary artery bypass graft(CABG). The subjects were 932 CABG inpatients of the Korean National Hospital Discharge In-depth Injury Survey from 2004 through 2008. The data were analyzed using $x^2$ test and the severity-adjusted model was developed using data mining technique. The results of the study were as follows: male(71.1%), older than 61 years of age(61.6%), more than 500 beds(92.8%) and admitting via ambulatory care(70.0%) appeared to have higher rate than otherwise. In-hospital mortality of CABG inpatients was 2.8%. In addition, 46.4% of the patients received their care in other residence. The angina pectoris(45.6%) was found to be the highest in principle diagnosis, followed by chronic ischemic heart disease(36.9%) and acute myocardial infarction(12.0%). We developed severity-adjusted LOS model using the variables such as gender, age and comorbidity. Comparison of adjusted values in predicted LOS revealed that there were significant variations in LOS by location of hospital, bed size, and whether patients received the care in their residences. The variations of LOS can be explained as the indirect indicator for quality variation of medical process. It is suggested that the severity-adjusted LOS model developed in this study should be utilized as a useful method for benchmarking in hospital and it is necessary that national standard clinical practice guideline should be developed.

A Target Selection Model for the Counseling Services in Long-Term Care Insurance (노인장기요양보험 이용지원 상담 대상자 선정모형 개발)

  • Han, Eun-Jeong;Kim, Dong-Geon
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1063-1073
    • /
    • 2015
  • In the long-term care insurance (LTCI) system, National Health Insurance Service (NHIS) provide counseling services for beneficiaries and their family caregivers, which help them use LTC services appropriately. The purpose of this study was to develop a Target Selection Model for the Counseling Services based on needs of beneficiaries and their family caregivers. To develope models, we used data set of total 2,000 beneficiaries and family caregivers who have used the long-term care services in their home in March 2013 and completed questionnaires. The Target Selection Model was established through various data-mining models such as logistic regression, gradient boosting, Lasso, decision-tree model, Ensemble, and Neural network. Lasso model was selected as the final model because of the stability, high performance and availability. Our results might improve the satisfaction and the efficiency for the NHIS counseling services.

Early Detection of Lung Cancer Risk Using Data Mining

  • Ahmed, Kawsar;Abdullah-Al-Emran, Abdullah-Al-Emran;Jesmin, Tasnuba;Mukti, Roushney Fatima;Rahman, Md. Zamilur;Ahmed, Farzana
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.1
    • /
    • pp.595-598
    • /
    • 2013
  • Background: Lung cancer is the leading cause of cancer death worldwide Therefore, identification of genetic as well as environmental factors is very important in developing novel methods of lung cancer prevention. However, this is a multi-layered problem. Therefore a lung cancer risk prediction system is here proposed which is easy, cost effective and time saving. Materials and Methods: Initially 400 cancer and non-cancer patients' data were collected from different diagnostic centres, pre-processed and clustered using a K-means clustering algorithm for identifying relevant and non-relevant data. Next significant frequent patterns are discovered using AprioriTid and a decision tree algorithm. Results: Finally using the significant pattern prediction tools for a lung cancer prediction system were developed. This lung cancer risk prediction system should prove helpful in detection of a person's predisposition for lung cancer. Conclusions: Most of people of Bangladesh do not even know they have lung cancer and the majority of cases are diagnosed at late stages when cure is impossible. Therefore early prediction of lung cancer should play a pivotal role in the diagnosis process and for an effective preventive strategy.