• Title/Summary/Keyword: tree classification method

Search Result 361, Processing Time 0.037 seconds

Classification Performance Improvement of UNSW-NB15 Dataset Based on Feature Selection (특징선택 기법에 기반한 UNSW-NB15 데이터셋의 분류 성능 개선)

  • Lee, Dae-Bum;Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.5
    • /
    • pp.35-42
    • /
    • 2019
  • Recently, as the Internet and various wearable devices have appeared, Internet technology has contributed to obtaining more convenient information and doing business. However, as the internet is used in various parts, the attack surface points that are exposed to attacks are increasing, Attempts to invade networks aimed at taking unfair advantage, such as cyber terrorism, are also increasing. In this paper, we propose a feature selection method to improve the classification performance of the class to classify the abnormal behavior in the network traffic. The UNSW-NB15 dataset has a rare class imbalance problem with relatively few instances compared to other classes, and an undersampling method is used to eliminate it. We use the SVM, k-NN, and decision tree algorithms and extract a subset of combinations with superior detection accuracy and RMSE through training and verification. The subset has recall values of more than 98% through the wrapper based experiments and the DT_PSO showed the best performance.

An Analysis of Choice Behavior for Tour Type of Commercial Vehicle using Decision Tree (의사결정나무를 이용한 화물자동차 투어유형 선택행태 분석)

  • Kim, Han-Su;Park, Dong-Ju;Kim, Chan-Seong;Choe, Chang-Ho;Kim, Gyeong-Su
    • Journal of Korean Society of Transportation
    • /
    • v.28 no.6
    • /
    • pp.43-54
    • /
    • 2010
  • In recent years there have been studies on tour based approaches for freight travel demand modelling. The purpose of this paper is to analyze tour type choice behavior of commercial vehicles which are divided into round trips and chained tours. The methods of the study are based on the decision tree and the logit model. The results indicates that the explanation variables for classifying tour types of commercial vehicles are loading factor, average goods quantity, and total goods quantity. The results of the decision tree method are similar to those of logit model. In addition, the explanation variables for tour type classification of small trucks are not different from those for medium trucks', implying that the most important factor on the vehicle tour planning is how to load goods such as shipment size and total quantity.

Analysis of the Characteristics of the Older Adults with Depression Using Data Mining Decision Tree Analysis (의사결정나무 분석법을 활용한 우울 노인의 특성 분석)

  • Park, Myonghwa;Choi, Sora;Shin, A Mi;Koo, Chul Hoi
    • Journal of Korean Academy of Nursing
    • /
    • v.43 no.1
    • /
    • pp.1-10
    • /
    • 2013
  • Purpose: The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. Methods: A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. Results: The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. Conclusion: The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.

Data-driven approach to machine condition prognosis using least square regression trees

  • Tran, Van Tung;Yang, Bo-Suk;Oh, Myung-Suck
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2007.11a
    • /
    • pp.886-890
    • /
    • 2007
  • Machine fault prognosis techniques have been considered profoundly in the recent time due to their profit for reducing unexpected faults or unscheduled maintenance. With those techniques, the working conditions of components, the trending of fault propagation, and the time-to-failure are forecasted precisely before they reach the failure thresholds. In this work, we propose an approach of Least Square Regression Tree (LSRT), which is an extension of the Classification and Regression Tree (CART), in association with one-step-ahead prediction of time-series forecasting technique to predict the future conditions of machines. In this technique, the number of available observations is firstly determined by using Cao's method and LSRT is employed as prognosis system in the next step. The proposed approach is evaluated by real data of low methane compressor. Furthermore, the comparison between the predicted results of CART and LSRT are carried out to prove the accuracy. The predicted results show that LSRT offers a potential for machine condition prognosis.

  • PDF

A Study on Development of Diagnostic Index for Measure of Rural Villages Landscapes Level (농촌마을단위 경관진단지표 개발에 관한 연구)

  • Song, Hee-Jung;Kim, Dae-Sik
    • Journal of Korean Society of Rural Planning
    • /
    • v.19 no.3
    • /
    • pp.107-116
    • /
    • 2013
  • In this study, it provides the diagnostic index for the rural landscape formation. For the development of diagnostic index, this study first analyzed documents and papers on the landscape formation. Landscape types are also classified by their function and then landscape index was developed by AHP method. Classification system was categorized as three steps: 2 items for 1st step, 10 items for 2nd step, and 20 items(criteria) for 3rd step. In the survey of weighting values with AHP method, the analysis result for the first step showed that rural village landscape is more important than landscape around the village by approximately 20%. In the second step, residence is rated as the most important, followed by village tree planting, and then farmland around the rural villages, greenery, and water environment. In the third step, the feng shui is rated as the most important, followed by tree planting, village forest, culture, and history. While vehicle maintenance, village alleys and pedestrian facilities are rated lower. In index of the around the village, weighting value for index of the farm land and skyline has the highest value. While species richness, water quality and water resources were rated relatively low. In the future, the rural landscapes diagnosis index will be applied to measure the level of the rural villages landscapes and it is expected to propose political support for the landscapes formation.

An application of datamining approach to CQI using the discharge summary (퇴원요약 데이터베이스를 이용한 데이터마이닝 기법의 CQI 활동에의 황용 방안)

  • 선미옥;채영문;이해종;이선희;강성홍;호승희
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.11a
    • /
    • pp.289-299
    • /
    • 2000
  • This study provides an application of datamining approach to CQI(Continuous Quality Improvement) using the discharge summary. First, we found a process variation in hospital infection rate by SPC (Statistical Process Control) technique. Second, importance of factors influencing hospital infection was inferred through the decision tree analysis which is a classification method in data-mining approach. The most important factor was surgery followed by comorbidity and length of operation. Comorbidity was further divided into age and principal diagnosis and the length of operation was further divided into age and chief complaint. 24 rules of hospital infection were generated by the decision tree analysis. Of these, 9 rules with predictive prover greater than 50% were suggested as guidelines for hospital infection control. The optimum range of target group in hospital infection control were Identified through the information gain summary. Association rule, which is another kind of datamining method, was performed to analyze the relationship between principal diagnosis and comorbidity. The confidence score, which measures the decree of association, between urinary tract infection and causal bacillus was the highest, followed by the score between postoperative wound disruption find postoperative wound infection. This study demonstrated how datamining approach could be used to provide information to support prospective surveillance of hospital infection. The datamining technique can also be applied to various areas fur CQI using other hospital databases.

  • PDF

A study on removal of unnecessary input variables using multiple external association rule (다중외적연관성규칙을 이용한 불필요한 입력변수 제거에 관한 연구)

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.877-884
    • /
    • 2011
  • The decision tree is a representative algorithm of data mining and used in many domains such as retail target marketing, fraud detection, data reduction, variable screening, category merging, etc. This method is most useful in classification problems, and to make predictions for a target group after dividing it into several small groups. When we create a model of decision tree with a large number of input variables, we suffer difficulties in exploration and analysis of the model because of complex trees. And we can often find some association exist between input variables by external variables despite of no intrinsic association. In this paper, we study on the removal method of unnecessary input variables using multiple external association rules. And then we apply the removal method to actual data for its efficiencies.

A data extension technique to handle incomplete data (불완전한 데이터를 처리하기 위한 데이터 확장기법)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.2
    • /
    • pp.7-13
    • /
    • 2021
  • This paper introduces an algorithm that compensates for missing values after converting them into a format that can represent the probability for incomplete data including missing values in training data. In the previous method using this data conversion, incomplete data was processed by allocating missing values with an equal probability that missing variables can have. This method applied to many problems and obtained good results, but it was pointed out that there is a loss of information in that all information remaining in the missing variable is ignored and a new value is assigned. On the other hand, in the new proposed method, only complete information not including missing values is input into the well-known classification algorithm (C4.5), and the decision tree is constructed during learning. Then, the probability of the missing value is obtained from this decision tree and assigned as an estimated value of the missing variable. That is, some lost information is recovered using a lot of information that has not been lost from incomplete learning data.

Machine Learning-Based Rapid Prediction Method of Failure Mode for Reinforced Concrete Column (기계학습 기반 철근콘크리트 기둥에 대한 신속 파괴유형 예측 모델 개발 연구)

  • Kim, Subin;Oh, Keunyeong;Shin, Jiuk
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.113-119
    • /
    • 2024
  • Existing reinforced concrete buildings with seismically deficient column details affect the overall behavior depending on the failure type of column. This study aims to develop and validate a machine learning-based prediction model for the column failure modes (shear, flexure-shear, and flexure failure modes). For this purpose, artificial neural network (ANN), K-nearest neighbor (KNN), decision tree (DT), and random forest (RF) models were used, considering previously collected experimental data. Using four machine learning methodologies, we developed a classification learning model that can predict the column failure modes in terms of the input variables using concrete compressive strength, steel yield strength, axial load ratio, height-to-dept aspect ratio, longitudinal reinforcement ratio, and transverse reinforcement ratio. The performance of each machine learning model was compared and verified by calculating accuracy, precision, recall, F1-Score, and ROC. Based on the performance measurements of the classification model, the RF model represents the highest average value of the classification model performance measurements among the considered learning methods, and it can conservatively predict the shear failure mode. Thus, the RF model can rapidly predict the column failure modes with simple column details.

Phytosociological Community Classification for Forest Vegetation around Maruguem (Ridge Line) from Misiryeong to Danmokryeong of Baekdudaegan (백두대간 미시령-단목령 구간의 마루금 주변 산림식생에 대한 식물사회학적 군락유형분류)

  • Chae, Seung-Beom;Yun, Chung-Weon
    • Journal of Korean Society of Forest Science
    • /
    • v.108 no.3
    • /
    • pp.277-289
    • /
    • 2019
  • This study was designed to analyze vegetation units using a phytosociological method and to identify the ecological characteristics of each vegetation unit, for forest vegetation from Misiryeong to Danmokryeong of Baekdudaegan, in which, in total, 150 plots were surveyed during May to October 2016. Using community classification according to phytosociology, the Quercus mongolica community group was classified at the top level of a vegetation hierarchy that was classified into an Abies koreana community and a Carpinus cordata community. The A. koreana community was divided into Thuja koraiensis and A. koreana typical groups. The T. koraiensis group was subdivided into Pinus pumila and Betula chinensis subgroups. The C. cordata community was divided into Sasa borealis and C. cordata typical groups. Thus, this forest vegetation comprised one community group, two communities, four groups, and two subgroups and indicated five vegetation units. After analyzing the correlations among the five vegetation units classified by this plant sociological method and the environmental factors like altitude, bare rock, number of present species, and coverage of tree layer with a coincidence method, the A. koreana community and C. cordata typical group were found to be distributed above 1,000 m in altitude, and the S. borealis group was distributed below 1,000 m in altitude. Except for vegetation unit 1, vegetation units tended to be mainly distributed where there was less than 20% bare rock. There was no typical tendency in the number of species present; vegetation unit 5 showed the most abundance among the vegetation units. Coverage by the tree layer mostly exceeded 60%, except for vegetation unit 1.