• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.023 seconds

Disease Prediction of Depression and Heart Trouble using Data Mining Techniques and Factor Analysis (데이터마이닝 기법 및 요인분석을 이용한우울증 및 심장병 질환 예측)

  • Yousik Hong;Hyunsook Lee;Sang-Suk Lee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.4
    • /
    • pp.127-135
    • /
    • 2023
  • Nowadays, the number of patients committing suicide due to depression and stress is rapidly increasing. In addition, if stress and depression last for a long time, they are dangerous factors that can cause heart disease, brain disease, and high blood pressure. However, no matter how modern medicine has developed, it is a very difficult situation for patients with depression and heart disease without special drugs or treatments. Therefore, in many countries around the world, studies are being actively conducted to determine patients at risk of depression and patients at risk of suicide at an early stage using electrocardiogram, oxygen saturation, and brain wave analysis functions. In this paper, in order to analyze these problems, a computer simulation was performed to determine heart disease risk patients by establishing heart disease hypothesis data. In particular, in order to improve the predictive rate of heart disease by more than 10%, a simulation using fuzzy inference was performed.

Analysis of Leaf Node Ranking Methods for Spatial Event Prediction (의사결정트리에서 공간사건 예측을 위한 리프노드 등급 결정 방법 분석)

  • Yeon, Young-Kwang
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.17 no.4
    • /
    • pp.101-111
    • /
    • 2014
  • Spatial events are predictable using data mining classification algorithms. Decision trees have been used as one of representative classification algorithms. And they were normally used in the classification tasks that have label class values. However since using rule ranking methods, spatial prediction have been applied in the spatial prediction problems. This paper compared rule ranking methods for the spatial prediction application using a decision tree. For the comparison experiment, C4.5 decision tree algorithm, and rule ranking methods such as Laplace, M-estimate and m-branch were implemented. As a spatial prediction case study, landslide which is one of representative spatial event occurs in the natural environment was applied. Among the rule ranking methods, in the results of accuracy evaluation, m-branch showed the better accuracy than other methods. However in case of m-brach and M-estimate required additional time-consuming procedure for searching optimal parameter values. Thus according to the application areas, the methods can be selectively used. The spatial prediction using a decision tree can be used not only for spatial predictions, but also for causal analysis in the specific event occurrence location.

A Study on Factors of Internet Overdependence for Adults Using the Decision Tree Analysis Model (성인층의 인터넷 과의존 영향요인: 의사결정나무분석을 활용하여)

  • Seo, Hyung-Jun;Shin, Ji-Woong
    • Informatization Policy
    • /
    • v.25 no.2
    • /
    • pp.20-45
    • /
    • 2018
  • This study aims to find the factors of Internet overdependence in adults, through the decision tree analysis model, which is a data mining method using National Information Society Agency's raw data from the survey on Internet overdependence in 2016. As a result of the decision tree analysis, a total 16 nodes of Internet overdependence risk groups were identified. The main predicated variables were the amount of time spent per smart media usage in weekdays; amount of time spent per smart media usage in weekends; experiences of purchasing cash items; percentage of using smart media for leisure; negative personality; percentage of using smart media for information search and utilization; and awareness on good functions of the Internet, all of which in order had greater impact on the risk groups. Users in the highest risk node spent the smart media for more than 5 minutes per use and less than 5~10 minutes in weekdays, had experiences of cash item purchase, and had lower level of awareness on the good functions of the Internet. The analysis led to the following recommendations: First, even a short-time use has higher chances of causing Internet overdependence, and therefore, guidelines need to be developed based on research on the usage behavior rather than the usage time. Second, self-regulation is required because factors that affect overindulgence in games, such as the cash items, increase Internet overdependence. Third, using the Internet for leisure causes higher risk of overdependence and therefore, other means of leisure should be recommended.

Top-down Hierarchical Clustering using Multidimensional Indexes (다차원 색인을 이용한 하향식 계층 클러스터링)

  • Hwang, Jae-Jun;Mun, Yang-Se;Hwang, Gyu-Yeong
    • Journal of KIISE:Databases
    • /
    • v.29 no.5
    • /
    • pp.367-380
    • /
    • 2002
  • Due to recent increase in applications requiring huge amount of data such as spatial data analysis and image analysis, clustering on large databases has been actively studied. In a hierarchical clustering method, a tree representing hierarchical decomposition of the database is first created, and then, used for efficient clustering. Existing hierarchical clustering methods mainly adopted the bottom-up approach, which creates a tree from the bottom to the topmost level of the hierarchy. These bottom-up methods require at least one scan over the entire database in order to build the tree and need to search most nodes of the tree since the clustering algorithm starts from the leaf level. In this paper, we propose a novel top-down hierarchical clustering method that uses multidimensional indexes that are already maintained in most database applications. Generally, multidimensional indexes have the clustering property storing similar objects in the same (or adjacent) data pares. Using this property we can find adjacent objects without calculating distances among them. We first formally define the cluster based on the density of objects. For the definition, we propose the concept of the region contrast partition based on the density of the region. To speed up the clustering algorithm, we use the branch-and-bound algorithm. We propose the bounds and formally prove their correctness. Experimental results show that the proposed method is at least as effective in quality of clustering as BIRCH, a bottom-up hierarchical clustering method, while reducing the number of page accesses by up to 26~187 times depending on the size of the database. As a result, we believe that the proposed method significantly improves the clustering performance in large databases and is practically usable in various database applications.

Development of Healthcare Data Quality Control Algorithm Using Interactive Decision Tree: Focusing on Hypertension in Diabetes Mellitus Patients (대화식 의사결정나무를 이용한 보건의료 데이터 질 관리 알고리즘 개발: 당뇨환자의 고혈압 동반을 중심으로)

  • Hwang, Kyu-Yeon;Lee, Eun-Sook;Kim, Go-Won;Hong, Seong-Ok;Park, Jung-Sun;Kwak, Mi-Sook;Lee, Ye-Jin;Lim, Chae-Hyeok;Park, Tae-Hyun;Park, Jong-Ho;Kang, Sung-Hong
    • The Korean Journal of Health Service Management
    • /
    • v.10 no.3
    • /
    • pp.63-74
    • /
    • 2016
  • Objectives : There is a need to develop a data quality management algorithm to improve the quality of healthcare data using a data quality management system. In this study, we developed a data quality control algorithms associated with diseases related to hypertension in patients with diabetes mellitus. Methods : To make a data quality algorithm, we extracted the 2011 and 2012 discharge damage survey data from diabetes mellitus patients. Derived variables were created using the primary diagnosis, diagnostic unit, primary surgery and treatment, minor surgery and treatment items. Results : Significant factors in diabetes mellitus patients with hypertension were sex, age, ischemic heart disease, and diagnostic ultrasound of the heart. Depending on the decision tree results, we found four groups with extreme values for diabetes accompanying hypertension patients. Conclusions : There is a need to check the actual data contained in the Outlier (extreme value) groups to improve the quality of the data.

Location Generalization of Moving Objects for the Extraction of Significant Patterns (의미 패턴 추출을 위한 이동 객체의 위치 일반화)

  • Lee, Yon-Sik;Ko, Hyun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.1
    • /
    • pp.451-458
    • /
    • 2011
  • In order to provide the optimal location based services such as the optimal moving path search or the scheduling pattern prediction, the extraction of significant moving pattern which is considered the temporal and spatial properties of the location-based historical data of the moving objects is essential. In this paper, for the extraction of significant moving pattern we propose the location generalization method which translates the location attributes of moving object into the spatial scope information based on $R^*$-tree for more efficient patterning the continuous changes of the location of moving objects and for indexing to the 2-dimensional spatial scope. The proposed method generates the moving sequences which is satisfied the constraints of the time interval between the spatial scopes using the generalized spatial data, and extracts the significant moving patterns using them. And it can be an efficient method for the temporal pattern mining or the analysis of moving transition of the moving objects to provide the optimal location based services.

A study on the variation of severity adjusted LOS on Injry inpatient in Korea (손상입원환자의 중증도 보정 재원일수의 변이에 관한 연구)

  • Kim, Sung-Soo;Kim, Won-Joong;Kang, Sung-Hong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.6
    • /
    • pp.2668-2676
    • /
    • 2011
  • In order to analyze the variation in length of stay(LOS) of injury inpatients, we developed severity-adjusted LOS model using Korean National Discharge In-depth Injury Survey data of Center for Disease Control. Appling this model, we calculated predicted values and, after standardizing LOS using the differences from the actual values, analyzed the variation in LOS. Major factors affecting severity-adjusted LOS of injury inpatients were found to be severity, surgery(or no surgery), age, injury mechanism and channel of hospitalization. Result of analysis of the differences between the actual values and predicted values adjusted by decision tree model suggested that there were statistically significant differences by hospital size(number of beds), type of insurance and location of institution. In order to reduce the variation in LOS, efforts should be exerted in developing nationwide treatment protocol, inducing medical institutions to utilize it, and furthermore systematically evaluating it to reduce the variation continually.

Classification of False Alarms based on the Decision Tree for Improving the Performance of Intrusion Detection Systems (침입탐지시스템의 성능향상을 위한 결정트리 기반 오경보 분류)

  • Shin, Moon-Sun;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.34 no.6
    • /
    • pp.473-482
    • /
    • 2007
  • Network-based IDS(Intrusion Detection System) gathers network packet data and analyzes them into attack or normal. They raise alarm when possible intrusion happens. But they often output a large amount of low-level of incomplete alert information. Consequently, a large amount of incomplete alert information that can be unmanageable and also be mixed with false alerts can prevent intrusion response systems and security administrator from adequately understanding and analyzing the state of network security, and initiating appropriate response in a timely fashion. So it is important for the security administrator to reduce the redundancy of alerts, integrate and correlate security alerts, construct attack scenarios and present high-level aggregated information. False alarm rate is the ratio between the number of normal connections that are incorrectly misclassified as attacks and the total number of normal connections. In this paper we propose a false alarm classification model to reduce the false alarm rate using classification analysis of data mining techniques. The proposed model can classify the alarms from the intrusion detection systems into false alert or true attack. Our approach is useful to reduce false alerts and to improve the detection rate of network-based intrusion detection systems.

The Variation of Factors of severity-adjusted length of stay(LOS) in acute stroke patients (급성 뇌졸중 환자의 중증도 보정 재원일수 변이에 관한 연구)

  • Kang, Sung-Hong;Seok, Hyang-Sook;Kim, Won-Joong
    • Journal of Digital Convergence
    • /
    • v.11 no.6
    • /
    • pp.221-233
    • /
    • 2013
  • This study aims to develop the severity-adjusted length of stay(LOS) model for acute stroke patients using data from the hospital discharge survey and propose management of length of stay(LOS) for acute stroke patients and using for Hospital management. The dataset was taken from 23,134 database of the hospital discharge survey from 2004 to 2009. The severity-adjusted LOS model for the acute stroke patients was developed by data mining analysis. From decision making tree model, the main reasons for LOS of acute stroke patients were acute stroke type. The difference between severity-adjusted LOS from the decision making tree model and real LOS was compared and it was confirmed that insurance type and bed number of hospital, location of hospital were statistically associated with LOS. And to conclude, hospitals should manage the LOS of acute stroke patients applying it into the medical information system.

Study on the effectiveness of english-medium class (영어강의의 효과성에 대한 연구)

  • Cho, Jang Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.6
    • /
    • pp.1137-1144
    • /
    • 2012
  • Many universities stress gradually the importance of english-medium class in order to improve the international competitiveness and the internationalization of the university. In this paper, we compare english-medium class with korean class using course evaluation score. Also we analyze the factors that affect the effectiveness of the course evaluation score of english-medium class. First, logistic regression analysis is used to examine the main effects of subjects and individual characteristics. Also, decision tree analysis is used to examine the interaction effects for subjects and individual characteristics. The results of this paper are as follows. Grade, department category, class size, GPA and screening method affect the effectiveness of english-medium class. The highest effectiveness group of english-medium class is that grade is freshmen and department category is humanity. Also the group of the second highest effectiveness group is that grade is freshmen and department category is nature and art and GPA is high.