• Title/Summary/Keyword: Tree mining

Search Result 566, Processing Time 0.027 seconds

The Hybrid Model using SVM and Decision Tree for Intrusion Detection (SVM과 의사결정트리를 이용한 혼합형 침입탐지 모델)

  • Um, Nam-Kyoung;Woo, Sung-Hee;Lee, Sang-Ho
    • The KIPS Transactions:PartC
    • /
    • v.14C no.1 s.111
    • /
    • pp.1-6
    • /
    • 2007
  • In order to operate a secure network, it is very important for the network to raise positive detection as well as lower negative detection for reducing the damage from network intrusion. By using SVM on the intrusion detection field, we expect to improve real-time detection of intrusion data. However, due to classification based on calculating values after having expressed input data in vector space by SVM, continuous data type can not be used as any input data. Therefore, we present the hybrid model between SVM and decision tree method to make up for the weak point. Accordingly, we see that intrusion detection rate, F-P error rate, F-N error rate are improved as 5.6%, 0.16%, 0.82%, respectively.

Improvement of DHP Association Rules Algorithm for Perfect Hashing (완전해싱을 위한 DHP 연관 규칙 탐사 알고리즘의 개선 방안)

  • 이형봉
    • Journal of KIISE:Databases
    • /
    • v.31 no.2
    • /
    • pp.91-98
    • /
    • 2004
  • DHP mining association rules algorithm maintains previously independent direct hash table to reduce the sire of hash tree containing the frequency number of each candidate large itemset. It performs pruning by using the direct hash table when the hash tree is constructed. The mort large the size of direct hash table increases, the higher the effort of pruning becomes. Especially, the effect of pruning in phase 2 which generate 2-large itemsets is so high that it dominates the overall performance of DHP algorithm. So, following the speedy trends of producing VLM(Very Large Memory) systems, extreme increment of direct hash table size is being tried and one of those trials is perfect hash table in phase 2. In case of using perfect hash table in phase 2, we found that some rearrangement of DHP algorithm got about 20% performance improvement compared to simply |H$_2$| reconfigured DHP algorithm. In this paper, we examine the feasibility of perfect hash table in phase 2 and propose PHP algorithm, a rearranged DHP algorithm, which uses the characteristics of perfect hash table sufficiently, then make an analysis on the results in experimental environment.

Tolerance Computation for Process Parameter Considering Loss Cost : In Case of the Larger is better Characteristics (손실 비용을 고려한 공정 파라미터 허용차 산출 : 망대 특성치의 경우)

  • Kim, Yong-Jun;Kim, Geun-Sik;Park, Hyung-Geun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.2
    • /
    • pp.129-136
    • /
    • 2017
  • Among the information technology and automation that have rapidly developed in the manufacturing industries recently, tens of thousands of quality variables are estimated and categorized in database every day. The former existing statistical methods, or variable selection and interpretation by experts, place limits on proper judgment. Accordingly, various data mining methods, including decision tree analysis, have been developed in recent years. Cart and C5.0 are representative algorithms for decision tree analysis, but these algorithms have limits in defining the tolerance of continuous explanatory variables. Also, target variables are restricted by the information that indicates only the quality of the products like the rate of defective products. Therefore it is essential to develop an algorithm that improves upon Cart and C5.0 and allows access to new quality information such as loss cost. In this study, a new algorithm was developed not only to find the major variables which minimize the target variable, loss cost, but also to overcome the limits of Cart and C5.0. The new algorithm is one that defines tolerance of variables systematically by adopting 3 categories of the continuous explanatory variables. The characteristics of larger-the-better was presumed in the environment of programming R to compare the performance among the new algorithm and existing ones, and 10 simulations were performed with 1,000 data sets for each variable. The performance of the new algorithm was verified through a mean test of loss cost. As a result of the verification show, the new algorithm found that the tolerance of continuous explanatory variables lowered loss cost more than existing ones in the larger is better characteristics. In a conclusion, the new algorithm could be used to find the tolerance of continuous explanatory variables to minimize the loss in the process taking into account the loss cost of the products.

A Study of Extraction of Variables Affecting the Adolescents' Computer Use Type with Decision Tree (의사결정트리 기반의 분석을 통한 청소년의 컴퓨터 사용 유형별 관련 변수 추출)

  • Lee, Hye-Joo;Jung, Eui-Hyun
    • The Journal of Korean Association of Computer Education
    • /
    • v.15 no.2
    • /
    • pp.9-18
    • /
    • 2012
  • This study investigated the extraction algorithm fitting for variables of adolescents' computer use type with the sample from KYPS data (3409 in the second grade of the junior high school; 1704 boys and 1705 girls). The results of the decision tree model revealed that : (1) Gender, computer use time, misdeed friends, parent supervision, other agreement of misdeed, parent study expectation, self-control, teacher attachment, and sibling relation were significant for entertainment type. (2) Gender, cyberclub, computer use time, self-belief, online misdeed were significant for relation type. (3) Study enthusiasm, personal study time, optimistic disposition, study and spare time, cyberclub, self-belief, and other people criticism were significant for information type. These results suggest that adolescents' diverse conditions should be considered for using computer more efficiently.

  • PDF

A method of searching the optimum performance of a classifier by testing only the significant events (중요한 이벤트만을 검색함으로써 분류기의 최적 성능을 찾는 방법)

  • Kim, Dong-Hui;Lee, Won Don
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.6
    • /
    • pp.1275-1282
    • /
    • 2014
  • Too much information exists in ubiquitous environment, and therefore it is not easy to obtain the appropriately classified information from the available data set. Decision tree algorithm is useful in the field of data mining or machine learning system, as it is fast and deduces good result on the problem of classification. Sometimes, however, a decision tree may have leaf nodes which consist of only a few or noise data. The decisions made by those weak leaves will not be effective and therefore should be excluded in the decision process. This paper proposes a method using a classifier, UChoo, for solving a classification problem, and suggests an effective method of decision process involving only the important leaves and thereby excluding the noisy leaves. The experiment shows that this method is effective and reduces the erroneous decisions and can be applied when only important decisions should be made.

Mining Search Keywords for Improving the Accuracy of Entity Search (엔터티 검색의 정확성을 높이기 위한 검색 키워드 마이닝)

  • Lee, Sun Ku;On, Byung-Won;Jung, Soo-Mok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.9
    • /
    • pp.451-464
    • /
    • 2016
  • Nowadays, entity search such as Google Product Search and Yahoo Pipes has been in the spotlight. The entity search engines have been used to retrieve web pages relevant with a particular entity. However, if an entity (e.g., Chinatown movie) has various meanings (e.g., Chinatown movies, Chinatown restaurants, and Incheon Chinatown), then the accuracy of the search result will be decreased significantly. To address this problem, in this article, we propose a novel method that quantifies the importance of search queries and then offers the best query for the entity search, based on Frequent Pattern (FP)-Tree, considering the correlation between the entity relevance and the frequency of web pages. According to the experimental results presented in this paper, the proposed method (59% in the average precision) improved the accuracy five times, compared to the traditional query terms (less than 10% in the average precision).

SOHO Bankruptcy Prediction Using Modified Bagging Predictors (Modified Bagging Predictors를 이용한 SOHO 부도 예측)

  • Kim, Seung-Hyuk;Kim, Jong-Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.2
    • /
    • pp.15-26
    • /
    • 2007
  • In this study, a SOHO (Small Office Home Office) bankruptcy prediction model is proposed using Modified Bagging Predictors which is modification of traditional Bagging Predictors. There have been several studies on bankruptcy prediction for large and middle size companies. However, little studies have been done for SOHOs. In commercial banks, loan approval processes for SOHOs are usually less structured than those for large and middle size companies, and largely depend on partial information such as credit scores. In this study, we use a real SOHO loan approval data set of a Korean bank. First, decision tree induction techniques and artificial neural networks are applied to the data set, and the results are not satisfactory. Bagging Predictors which has been not previously applied for bankruptcy prediction and Modified Bagging Predictors which is proposed in this paper are applied to the data set. The experimental results show that Modified Bagging Predictors provides better performance than decision tree inductions techniques, artificial neural networks, and Bagging Predictors.

  • PDF

Exploring The Career Attitude Prediction Model Of Multicultural Youth Using Decision Tree Analysis (다문화청소년의 진로태도 예측모형 탐색)

  • Oh, Jung-A;Lee, Young-Joo;Kim, Pyeong-Hwa
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.6
    • /
    • pp.99-105
    • /
    • 2021
  • This study investigates, 1) the predicting career attitudes of multicultural youth, 2) the aim was to provide evidence-based data on career and policy development. A survey for a total of 1,335 multicultural youth and data were analyzed by data-mining decision tree with SPSS 23.0. The main findings are as follows. First, female students showed satisfaction in life, self-esteem and support for mothers' career. Second, In boys, self-esteem was the most important. Based on these results, it contains suggestions for career development for multicultural youth.

Empirical Analysis of Influential Factors Affecting Domestic Workers' Turnover Intention: Emphasis on Public Database and Decision Tree Method (근로자들의 이직 의도에 영향을 주는 요인에 관한 실증연구: 공공 데이터베이스와 의사결정나무 기법을 중심으로)

  • Geo Nu Ko;Hyun Jin Jo;Kun Chang Lee
    • Information Systems Review
    • /
    • v.22 no.4
    • /
    • pp.41-58
    • /
    • 2020
  • This study addresses the issue of which factors make domestic works have turnover intention. To pursue this research issue, we utilized a public database "2017 Occupational Migration Path Survey", administerd by Korea Employment Information Service (KEIS). Decision tree method was applied to extract crucial factors influencing workers' turnover intention. They include 'the degree of matching the level of education with the level of work', 'the possibility of individual development', 'the job-related education and training', 'the promotion system', 'wage and income', 'social reputation for work' and 'the stability of employment'.

Prediction of the Gold-silver Deposits from Geochemical Maps - Applications to the Bayesian Geostatistics and Decision Tree Techniques (지화학자료를 이용한 금${\cdot}$은 광산의 배태 예상지역 추정-베이시안 지구통계학과 의사나무 결정기법의 활용)

  • Hwang, Sang-Gi;Lee, Pyeong-Koo
    • Economic and Environmental Geology
    • /
    • v.38 no.6 s.175
    • /
    • pp.663-673
    • /
    • 2005
  • This study investigates the relationship between the geochemical maps and the gold-silver deposit locations. Geochemical maps of 21 elements, which are published by KIGAM, locations of gold-silver deposits, and 1:1,000,000 scale geological map of Korea are utilized far this investigation. Pixel size of the basic geochemical maps is 250m and these data are resampled in 1km spacing for the statistical analyses. Relationship between the mine location and the geochemical data are investigated using bayesian statistics and decision tree algorithms. For the bayesian statistics, each geochemical maps are reclassified by percentile divisions which divides the data by 5, 25, 50, 75, 95, and $100\%$ data groups. Number of mine locations in these divisions are counted and the probabilities are calculated. Posterior probabilities of each pixel are calculated using the probability of 21 geochemical maps and the geological map. A prediction map of the mining locations is made by plotting the posterior probability. The input parameters for the decision tree construction are 21 geochemical elements and lithology, and the output parameters are 5 types of mines (Ag/Au, Cu, Fe, Pb/Zn, W) and absence of the mine. The locations for the absence of the mine are selected by resampling the overall area by 1 km spacing and eliminating my resampled points, which is in 750m distance from mine locations. A prediction map of each mine area is produced by applying the decision tree to every pixels. The prediction by Bayesian method is slightly better than the decision tree. However both prediction maps show reasonable match with the input mine locations. We interpret that such match indicate the rules produced by both methods are reasonable and therefore the geochemical data has strong relations with the mine locations. This implies that the geochemical rules could be used as background values oi mine locations, therefore could be used for evaluation of mine contamination. Bayesian statistics indicated that the probability of Au/Ag deposit increases as CaO, Cu, MgO, MnO, Pb and Li increases, and Zr decreases.