• Title/Summary/Keyword: 의사결정 나무모형

Search Result 228, Processing Time 0.029 seconds

The big data method for flash flood warning (돌발홍수 예보를 위한 빅데이터 분석방법)

  • Park, Dain;Yoon, Sanghoo
    • Journal of Digital Convergence
    • /
    • v.15 no.11
    • /
    • pp.245-250
    • /
    • 2017
  • Flash floods is defined as the flooding of intense rainfall over a relatively small area that flows through river and valley rapidly in short time with no advance warning. So that it can cause damage property and casuality. This study is to establish the flash-flood warning system using 38 accident data, reported from the National Disaster Information Center and Land Surface Model(TOPLATS) between 2009 and 2012. Three variables were used in the Land Surface Model: precipitation, soil moisture, and surface runoff. The three variables of 6 hours preceding flash flood were reduced to 3 factors through factor analysis. Decision tree, random forest, Naive Bayes, Support Vector Machine, and logistic regression model are considered as big data methods. The prediction performance was evaluated by comparison of Accuracy, Kappa, TP Rate, FP Rate and F-Measure. The best method was suggested based on reproducibility evaluation at the each points of flash flood occurrence and predicted count versus actual count using 4 years data.

A Study on analysis of severity-adjustment length of stay in hospital for community-acquired pneumonia (지역사회획득 폐렴 환자의 중증도 보정 재원일수 분석)

  • Kim, Yoo-Mi;Choi, Yun-Kyoung;Kang, Sung-Hong;Kim, Won-Joong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.3
    • /
    • pp.1234-1243
    • /
    • 2011
  • Our study was carried out to develop the severity-adjustment model for length of stay in hospital for community-acquired pneumonia so that we analysed the factors on the variation in length of stay(LOS). The subjects were 5,353 community-acquired pneumonia inpatients of the Korean National Hospital Discharge In-depth Injury Survey data from 2004 through 2006. The data were analyzed using t-test and ANOVA and the severity-adjustment model was developed using data mining technique. There are differences according to gender, age, type of insurance, type of admission, but there is no difference of whether patients died in hospital. After yielding the standardized value of the difference between crude and expected length of stay, we analysed the variation of length of stay for community-acquired pneumonia. There was variation of LOS in regional differences and insurance type, though there was no variation according whether patients receive their care in their residences. The variation of length of stay controlling the case mix or severity of illness can be explained the factors of provider. This supply factors in LOS variations should be more studied for individual practice style or patient management practices and healthcare resources or environment. We expect that the severity-adjustment model using administrative databases should be more adapted in other diseases in practical.

An Empirical Study of Profiling Model for the SMEs with High Demand for Standards Using Data Mining (데이터마이닝을 이용한 표준정책 수요 중소기업의 프로파일링 연구: R&D 동기와 사업화 지원 정책을 중심으로)

  • Jun, Seung-pyo;Jung, JaeOong;Choi, San
    • Journal of Korea Technology Innovation Society
    • /
    • v.19 no.3
    • /
    • pp.511-544
    • /
    • 2016
  • Standards boost technological innovation by promoting information sharing, compatibility, stability and quality. Identifying groups of companies that particularly benefit from these functions of standards in their technological innovation and commercialization helps to customize planning and implementation of standards-related policies for demand groups. For this purpose, this study engages in profiling of SMEs whose R&D objective is to respond to standards as well as those who need to implement standards system for technological commercialization. Then it suggests a prediction model that can distinguish such companies from others. To this end, decision tree analysis is conducted for profiling of characteristics of subject SMEs through data mining. Subject SMEs include (1) those that engage in R&D to respond to standards (Group1) or (2) those in need of product standard or technological certification policies for commercialization purposes (Group 2). Then the study proposes a prediction model that can distinguish Groups 1 and 2 from others based on several variables by adopting discriminant analysis. The practicality of discriminant formula is statistically verified. The study suggests that Group 1 companies are distinguished in variables such as time spent on R&D planning, KoreanStandardIndustryClassification (KSIC) category, number of employees and novelty of technologies. Profiling result of Group 2 companies suggests that they are differentiated in variables such as KSIC category, major clients of the companies, time spent on R&D and ability to test and verify their technologies. The prediction model proposed herein is designed based on the outcomes of profiling and discriminant analysis. Its purpose is to serve in the planning or implementation processes of standards-related policies through providing objective information on companies in need of relevant support and thereby to enhance overall success rate of standards-related projects.

Artificial Intelligence Techniques for Predicting Online Peer-to-Peer(P2P) Loan Default (인공지능기법을 이용한 온라인 P2P 대출거래의 채무불이행 예측에 관한 실증연구)

  • Bae, Jae Kwon;Lee, Seung Yeon;Seo, Hee Jin
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.3
    • /
    • pp.207-224
    • /
    • 2018
  • In this article, an empirical study was conducted by using public dataset from Lending Club Corporation, the largest online peer-to-peer (P2P) lending in the world. We explore significant predictor variables related to P2P lending default that housing situation, length of employment, average current balance, debt-to-income ratio, loan amount, loan purpose, interest rate, public records, number of finance trades, total credit/credit limit, number of delinquent accounts, number of mortgage accounts, and number of bank card accounts are significant factors to loan funded successful on Lending Club platform. We developed online P2P lending default prediction models using discriminant analysis, logistic regression, neural networks, and decision trees (i.e., CART and C5.0) in order to predict P2P loan default. To verify the feasibility and effectiveness of P2P lending default prediction models, borrower loan data and credit data used in this study. Empirical results indicated that neural networks outperforms other classifiers such as discriminant analysis, logistic regression, CART, and C5.0. Neural networks always outperforms other classifiers in P2P loan default prediction.

Selection of the principal genotype with genetic algorithm (유전자 알고리즘에 의한 우수 유전자형 선별)

  • Lee, Jae-Young;Goh, Jin-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.4
    • /
    • pp.639-647
    • /
    • 2009
  • From development of computer science, genetic algorithm has been applied to many fields for search like non-linear problem based on various variables and optimization process. Among others, in the data mining field, there are methods to select the best input variables for model accuracy and various predict models which were merged by using the genetic algorithm. In the meantime, to improve and preserve quality of the Hanwoo (Korean cattle) which is represented the agricultural industry in our country, we need to find out outstanding economical traits of Hanwoo in having specific genotype of single nucleotide polymorphism (SNP) which is inherited to next generation. According to, This research proposed the selecting method to find genotype of SNPs marker which affects economical traits of the Hanwoo by using the genetic algorithm. And we selected the best genotypes of the principal SNPs marker by applying to real data on Hanwoo genetic.

  • PDF

A Study on Injury Severity Prediction for Car-to-Car Traffic Accidents (차대차 교통사고에 대한 상해 심각도 예측 연구)

  • Ko, Changwan;Kim, Hyeonmin;Jeong, Young-Seon;Kim, Jaehee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.4
    • /
    • pp.13-29
    • /
    • 2020
  • Automobiles have long been an essential part of daily life, but the social costs of car traffic accidents exceed 9% of the national budget of Korea. Hence, it is necessary to establish prevention and response system for car traffic accidents. In order to present a model that can classify and predict the degree of injury in car traffic accidents, we used big data analysis techniques of K-nearest neighbor, logistic regression analysis, naive bayes classifier, decision tree, and ensemble algorithm. The performances of the models were analyzed by using the data on the nationwide traffic accidents over the past three years. In particular, considering the difference in the number of data among the respective injury severity levels, we used down-sampling methods for the group with a large number of samples to enhance the accuracy of the classification of the models and then verified the statistical significance of the models using ANOVA.

Data Mining Analysis of Educational and Research Achievements of Korean Universities Using Public Open Data Services (정보공시 자료를 이용한 교육/연구성과 영향요인 추출 및 대학의 군집 분석)

  • Shin, Sun Mi;Kim, Hyeon Cheol
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.1
    • /
    • pp.117-130
    • /
    • 2014
  • The purpose of this study is to provide useful knowledge for improving indicators that represent competitiveness and educational competency of the university by deriving a new pattern or the meaningful results from the data of information disclosure of universities using statistical analysis and data mining techniques. To achieve this, a model of decision tree was made and various factors that affect education/research performance such as employment rate, the number of technology transfer and papers per full-time faculty were explored. In addition to this, the cluster analysis of universities was conducted using attributes related to evaluation of university. According to the analysis, common factors affecting higher education/research performance are following indicators ; incoming student recruitment rate, enrollment rate, and the number of students per full-time faculty. In the cluster analysis, when performed by the entire university, the size, location of the university respectively, clusters are mainly formed by well-known universities, art physical non-science and engineering religious leaders training universities, and others. The main influencing factors of this cluster are higher education/research performance indicators such as employment rate and the number of technology transfer.

  • PDF

Prediction Models of Mild Cognitive Impairment Using the Korea Longitudinal Study of Ageing (고령화연구패널조사를 이용한 경도인지장애 예측모형)

  • Park, Hyojin;Ha, Juyoung
    • Journal of Korean Academy of Nursing
    • /
    • v.50 no.2
    • /
    • pp.191-199
    • /
    • 2020
  • Purpose: The purpose of this study was to compare sociodemographic characteristics of a normal cognitive group and mild cognitive impairment group, and establish prediction models of Mild Cognitive Impairment (MCI). Methods: This study was a secondary data analysis research using data from "the 4th Korea Longitudinal Study of Ageing" of the Korea Employment Information Service. A total of 6,405 individuals, including 1,329 individuals with MCI and 5,076 individuals with normal cognitive abilities, were part of the study. Based on the panel survey items, the research used 28 variables. The methods of analysis included a χ2-test, logistic regression analysis, decision tree analysis, predicted error rate, and an ROC curve calculated using SPSS 23.0 and SAS 13.2. Results: In the MCI group, the mean age was 71.4 and 65.8% of the participants was women. There were statistically significant differences in gender, age, and education in both groups. Predictors of MCI determined by using a logistic regression analysis were gender, age, education, instrumental activity of daily living (IADL), perceived health status, participation group, cultural activities, and life satisfaction. Decision tree analysis of predictors of MCI identified education, age, life satisfaction, and IADL as predictors. Conclusion: The accuracy of logistic regression model for MCI is slightly higher than that of decision tree model. The implementation of the prediction model for MCI established in this study may be utilized to identify middle-aged and elderly people with risks of MCI. Therefore, this study may contribute to the prevention and reduction of dementia.

Breast Cancer Diagnosis using Naive Bayes Analysis Techniques (Naive Bayes 분석기법을 이용한 유방암 진단)

  • Park, Na-Young;Kim, Jang-Il;Jung, Yong-Gyu
    • Journal of Service Research and Studies
    • /
    • v.3 no.1
    • /
    • pp.87-93
    • /
    • 2013
  • Breast cancer is known as a disease that occurs in a lot of developed countries. However, in recent years, the incidence of Korea's modern woman is increased steadily. As well known, breast cancer usually occurs in women over 50. In the case of Korea, however, the incidence of 40s with young women is increased steadily than the West. Therefore, it is a very urgent task to build a manual to the accurate diagnosis of breast cancer in adult women in Korea. In this paper, we show how using data mining techniques to predict breast cancer. Data mining refers to the process of finding regular patterns or relationships among variables within the database. To this, sophisticated analysis using the model, you will find useful information that is easily revealed. In this paper, through experiments Deicion Tree Naive Bayes analysis techniques were compared using analysis techniques to diagnose breast cancer. Two algorithms was analyzed by applying C4.5 algorithm. Deicison Tree classification accuracy was fairly good. Naive Bayes classification method showed better accuracy compared to the Decision Tree method.

  • PDF

Machine Learning Model for Predicting the Residual Useful Lifetime of the CNC Milling Insert (공작기계의 절삭용 인서트의 잔여 유효 수명 예측 모형)

  • Won-Gun Choi;Heungseob Kim;Bong Jin Ko
    • Journal of Advanced Navigation Technology
    • /
    • v.27 no.1
    • /
    • pp.111-118
    • /
    • 2023
  • For the implementation of a smart factory, it is necessary to collect data by connecting various sensors and devices in the manufacturing environment and to diagnose or predict failures in production facilities through data analysis. In this paper, to predict the residual useful lifetime of milling insert used for machining products in CNC machine, weight k-NN algorithm, Decision Tree, SVR, XGBoost, Random forest, 1D-CNN, and frequency spectrum based on vibration signal are investigated. As the results of the paper, the frequency spectrum does not provide a reliable criterion for an accurate prediction of the residual useful lifetime of an insert. And the weighted k-nearest neighbor algorithm performed best with an MAE of 0.0013, MSE of 0.004, and RMSE of 0.0192. This is an error of 0.001 seconds of the remaining useful lifetime of the insert predicted by the weighted-nearest neighbor algorithm, and it is considered to be a level that can be applied to actual industrial sites.