• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.027 seconds

A Study on Management of Student Retention Rate Using Association Rule Mining (연관관계 규칙을 이용한 학생 유지율 관리 방안 연구)

  • Kim, Jong-Man;Lee, Dong-Cheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.6
    • /
    • pp.67-77
    • /
    • 2018
  • Currently, there are many problems due to the decline in school-age population. Moreover, Korea has the largest number of universities compared to the population, and the university enrollment rate is also the highest in the world. As a result, the minimum student retention rate required for the survival of each university is becoming increasingly important. The purpose of this study was to examine the effects of reducing the number of graduates of education and the social climate that prioritizes employment. And to determine what the basic direction is for students to manage the student retention rate, which can be maintained from admission to graduation, to determine the optimal input variables, Based on the input parameters, we will make associative analysis using apriori algorithm to collect training data that is most suitable for maintenance rate management and make base data for development of the most efficient Deep Learning module based on it. The accuracy of Deep Learning was 75%, which is a measure of graduation using decision trees. In decision tree, factors that determine whether to graduate are graduated from general high school and students who are female and high in residence in urban area have high probability of graduation. As a result, the Deep Learning module developed rather than the decision tree was identified as a model for evaluating the graduation of students more efficiently.

Monetary policy synchronization of Korea and United States reflected in the statements (통화정책 결정문에 나타난 한미 통화정책 동조화 현상 분석)

  • Chang, Youngjae
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.1
    • /
    • pp.115-126
    • /
    • 2021
  • Central banks communicate with the market through a statement on the direction of monetary policy while implementing monetary policy. The rapid contraction of the global economy due to the recent Covid-19 pandemic could be compared to the crisis situation during the 2008 global financial crisis. In this paper, we analyzed the text data from the monetary policy statements of the Bank of Korea and Fed reflecting monetary policy directions focusing on how they were affected in the face of a global crisis. For analysis, we collected the text data of the two countries' monetary policy direction reports published from October 1999 to September 2020. We examined the semantic features using word cloud and word embedding, and analyzed the trend of the similarity between two countries' documents through a piecewise regression tree model. The visualization result shows that both the Bank of Korea and the US Fed have published the statements with refined words of clear meaning for transparent and effective communication with the market. The analysis of the dissimilarity trend of documents in both countries also shows that there exists a sense of synchronization between them as the rapid changes in the global economic environment affect monetary policy.

Forensic Image Classification using Data Mining Decision Tree (데이터 마이닝 결정나무를 이용한 포렌식 영상의 분류)

  • RHEE, Kang Hyeon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.7
    • /
    • pp.49-55
    • /
    • 2016
  • In digital forensic images, there is a serious problem that is distributed with various image types. For the problem solution, this paper proposes a classification algorithm of the forensic image types. The proposed algorithm extracts the 21-dim. feature vector with the contrast and energy from GLCM (Gray Level Co-occurrence Matrix), and the entropy of each image type. The classification test of the forensic images is performed with an exhaustive combination of the image types. Through the experiments, TP (True Positive) and FN (False Negative) is detected respectively. While it is confirmed that performed class evaluation of the proposed algorithm is rated as 'Excellent(A)' because of the AUROC (Area Under Receiver Operating Characteristic Curve) is 0.9980 by the sensitivity and the 1-specificity. Also, the minimum average decision error is 0.1349. Also, at the minimum average decision error is 0.0179, the whole forensic image types which are involved then, our classification effectiveness is high.

Automated Development of Rank-Based Concept Hierarchical Structures using Wikipedia Links (위키피디아 링크를 이용한 랭크 기반 개념 계층구조의 자동 구축)

  • Lee, Ga-hee;Kim, Han-joon
    • The Journal of Society for e-Business Studies
    • /
    • v.20 no.4
    • /
    • pp.61-76
    • /
    • 2015
  • In general, we have utilized the hierarchical concept tree as a crucial data structure for indexing huge amount of textual data. This paper proposes a generality rank-based method that can automatically develop hierarchical concept structures with the Wikipedia data. The goal of the method is to regard each of Wikipedia articles as a concept and to generate hierarchical relationships among concepts. In order to estimate the generality of concepts, we have devised a special ranking function that mainly uses the number of hyperlinks among Wikipedia articles. The ranking function is effectively used for computing the probabilistic subsumption among concepts, which allows to generate relatively more stable hierarchical structures. Eventually, a set of concept pairs with hierarchical relationship is visualized as a DAG (directed acyclic graph). Through the empirical analysis using the concept hierarchy of Open Directory Project, we proved that the proposed method outperforms a representative baseline method and it can automatically extract concept hierarchies with high accuracy.

A Context Recognition System for Various Food Intake using Mobile and Wearable Sensor Data (모바일 및 웨어러블 센서 데이터를 이용한 다양한 식사상황 인식 시스템)

  • Kim, Kee-Hoon;Cho, Sung-Bae
    • Journal of KIISE
    • /
    • v.43 no.5
    • /
    • pp.531-540
    • /
    • 2016
  • Development of various sensors attached to mobile and wearable devices has led to increasing recognition of current context-based service to the user. In this study, we proposed a probabilistic model for recognizing user's food intake context, which can occur in a great variety of contexts. The model uses low-level sensor data from mobile and wrist-wearable devices that can be widely available in daily life. To cope with innate complexity and fuzziness in high-level activities like food intake, a context model represents the relevant contexts systematically based on 4 components of activity theory and 5 W's, and tree-structured Bayesian network recognizes the probabilistic state. To verify the proposed method, we collected 383 minutes of data from 4 people in a week and found that the proposed method outperforms the conventional machine learning methods in accuracy (93.21%). Also, we conducted a scenario-based test and investigated the effect contribution of individual components for recognition.

Bounds of PIM-based similarity measures with partially marginal proportion (부분적 주변 비율에 의한 확률적 흥미도 측도 기반 유사성 측도의 상한 및 하한의 설정)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.4
    • /
    • pp.857-864
    • /
    • 2015
  • By Wikipedia, data mining is the computational process of discovering patterns in huge data sets involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. Clustering or cluster analysis is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. The similarity measures being used in the clustering may be classified into various types depending on the characteristics of data. In this paper, we computed bounds for similarity measures based on the probabilistic interestingness measure with partially marginal probability such as Peirce I, Peirce II, Cole I, Cole II, Loevinger, Park I, and Park II measure. We confirmed the absolute value of Loevinger measure wasthe upper limit of the absolute value of any other existing measures. Ordering of other measures is determined by the size of concurrence proportion, non-simultaneous occurrence proportion, and mismatch proportion.

Effectiveness of Repeated Examination to Diagnose Enterobiasis in Nursery School Groups

  • Remm, Mare;Remm, Kalle
    • Parasites, Hosts and Diseases
    • /
    • v.47 no.3
    • /
    • pp.235-241
    • /
    • 2009
  • The aim of this study was to estimate the benefit from repeated examinations in the diagnosis of enterobiasis in nursery school groups, and to test the effectiveness of individual-based risk predictions using different methods. A total of 604 children were examined using double, and 96 using triple, anal swab examinations. The questionnaires for parents, structured observations, and interviews with supervisors were used to identify factors of possible infection risk. In order to model the risk of enterobiasis at individual level, a similarity-based machine learning and prediction software Constud was compared with data mining methods in the Statistica 8 Data Miner software package. Prevalence according to a single examination was 22.5%; the increase as a result of double examinations was 8.2%. Single swabs resulted in an estimated prevalence of 20.1% among children examined 3 times; double swabs increased this by 10.1%, and triple swabs by 7.3%. Random forest classification, boosting classification trees, and Constud correctly predicted about 2/3 of the results of the second examination. Constud estimated a mean prevalence of 31.5% in groups. Constud was able to yield the highest overall fit of individual-based predictions while boosting classification tree and random forest models were more effective in recognizing Enterobius positive persons. As a rule, the actual prevalence of enterobiasis is higher than indicated by a single examination. We suggest using either the values of the mean increase in prevalence after double examinations compared to single examinations or group estimations deduced from individual-level modelled risk predictions.

Selection of the principal genotype with genetic algorithm (유전자 알고리즘에 의한 우수 유전자형 선별)

  • Lee, Jae-Young;Goh, Jin-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.4
    • /
    • pp.639-647
    • /
    • 2009
  • From development of computer science, genetic algorithm has been applied to many fields for search like non-linear problem based on various variables and optimization process. Among others, in the data mining field, there are methods to select the best input variables for model accuracy and various predict models which were merged by using the genetic algorithm. In the meantime, to improve and preserve quality of the Hanwoo (Korean cattle) which is represented the agricultural industry in our country, we need to find out outstanding economical traits of Hanwoo in having specific genotype of single nucleotide polymorphism (SNP) which is inherited to next generation. According to, This research proposed the selecting method to find genotype of SNPs marker which affects economical traits of the Hanwoo by using the genetic algorithm. And we selected the best genotypes of the principal SNPs marker by applying to real data on Hanwoo genetic.

  • PDF

Signed Hellinger measure for directional association (연관성 방향을 고려한 부호 헬링거 측도의 제안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.353-362
    • /
    • 2016
  • By Wikipedia, data mining is the process of discovering patterns in a big data set involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. and database systems. Association rule is a method for discovering interesting relations between items in large transactions by interestingness measures. Association rule interestingness measures play a major role within a knowledge discovery process in databases, and have been developed by many researchers. Among them, the Hellinger measure is a good association threshold considering the information content and the generality of a rule. But it has the drawback that it can not determine the direction of the association. In this paper we proposed a signed Hellinger measure to be able to interpret operationally, and we checked three conditions of association threshold. Furthermore, we investigated some aspects through a few examples. The results showed that the signed Hellinger measure was better than the Hellinger measure because the signed one was able to estimate the right direction of association.

Sulfur Isotopic Ratios in Precipitation around Chonju-city, Korea and Its Availability as a Tracer of the Source of Atmospheric Pollutants (전주지역 강수의 황동위원소비와 대기오염원의 추적자로서 그 유용성)

  • Na, Choon-Ki;Kim, Seon-Young;Jeon, Sir-Ryeong;Lee, Mu-Seong;Chung, Jae-Il
    • Economic and Environmental Geology
    • /
    • v.28 no.3
    • /
    • pp.243-249
    • /
    • 1995
  • In order to investigate the origin of sulfate in rain waters and to evaluate the feasibility of using sulfur isotope method as a tracer of atmospheric pollutants, the sulfur isotopic ratio of sulfate in rain waters collected in Chonju city from October 1994 to March 1995 was monitored and was compared with those of possible sources proposed by previous works. The pH of rain waters shows an intermediate acidic range from 4.45 to 6.88 and their daily variation appears to be well correlated with to the amount of precipitation. The sulfur isotopic ratios of sulfate in rain waters show a highly restricted range from 0.0 to + 1.8‰. The ${\delta}^{34}S$ values are similar to those of soil and pine tree surrounding Chonju city, but largely deviate from those of China. D-parameter($d={\delta}D-8{\delta}^{18}O$) of rain waters varies from 9.4 to 28.8. The values indicate that the rain waters in Chonju city are originated from the rainy front of China continent. All data obtained from this study suggested that sulfate in the rain waters collected in Chonju city was mainly derived from the sulfur dioxide gas emitted by the petroleum combustion. Therefore, sulfur isotopic study for the precipitation provided an excellent tool for environmental assessment in this region and for tracing the source of atmospheric pollutants.

  • PDF