• Title/Summary/Keyword: tree classification method

Search Result 361, Processing Time 0.025 seconds

An Improved Text Classification Method for Sentiment Classification

  • Wang, Guangxing;Shin, Seong Yoon
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.1
    • /
    • pp.41-48
    • /
    • 2019
  • In recent years, sentiment analysis research has become popular. The research results of sentiment analysis have achieved remarkable results in practical applications, such as in Amazon's book recommendation system and the North American movie box office evaluation system. Analyzing big data based on user preferences and evaluations and recommending hot-selling books and hot-rated movies to users in a targeted manner greatly improve book sales and attendance rate in movies [1, 2]. However, traditional machine learning-based sentiment analysis methods such as the Classification and Regression Tree (CART), Support Vector Machine (SVM), and k-nearest neighbor classification (kNN) had performed poorly in accuracy. In this paper, an improved kNN classification method is proposed. Through the improved method and normalizing of data, the purpose of improving accuracy is achieved. Subsequently, the three classification algorithms and the improved algorithm were compared based on experimental data. Experiments show that the improved method performs best in the kNN classification method, with an accuracy rate of 11.5% and a precision rate of 20.3%.

Feature Based Decision Tree Model for Fault Detection and Classification of Semiconductor Process (반도체 공정의 이상 탐지와 분류를 위한 특징 기반 의사결정 트리)

  • Son, Ji-Hun;Ko, Jong-Myoung;Kim, Chang-Ouk
    • IE interfaces
    • /
    • v.22 no.2
    • /
    • pp.126-134
    • /
    • 2009
  • As product quality and yield are essential factors in semiconductor manufacturing, monitoring the main manufacturing steps is a critical task. For the purpose, FDC(Fault detection and classification) is used for diagnosing fault states in the processes by monitoring data stream collected by equipment sensors. This paper proposes an FDC model based on decision tree which provides if-then classification rules for causal analysis of the processing results. Unlike previous decision tree approaches, we reflect the structural aspect of the data stream to FDC. For this, we segment the data stream into multiple subregions, define structural features for each subregion, and select the features which have high relevance to results of the process and low redundancy to other features. As the result, we can construct simple, but highly accurate FDC model. Experiments using the data stream collected from etching process show that the proposed method is able to classify normal/abnormal states with high accuracy.

A Study on the Classification of Variables Affecting Smartphone Addiction in Decision Tree Environment Using Python Program

  • Kim, Seung-Jae
    • International journal of advanced smart convergence
    • /
    • v.11 no.4
    • /
    • pp.68-80
    • /
    • 2022
  • Since the launch of AI, technology development to implement complete and sophisticated AI functions has continued. In efforts to develop technologies for complete automation, Machine Learning techniques and deep learning techniques are mainly used. These techniques deal with supervised learning, unsupervised learning, and reinforcement learning as internal technical elements, and use the Big-data Analysis method again to set the cornerstone for decision-making. In addition, established decision-making is being improved through subsequent repetition and renewal of decision-making standards. In other words, big data analysis, which enables data classification and recognition/recognition, is important enough to be called a key technical element of AI function. Therefore, big data analysis itself is important and requires sophisticated analysis. In this study, among various tools that can analyze big data, we will use a Python program to find out what variables can affect addiction according to smartphone use in a decision tree environment. We the Python program checks whether data classification by decision tree shows the same performance as other tools, and sees if it can give reliability to decision-making about the addictiveness of smartphone use. Through the results of this study, it can be seen that there is no problem in performing big data analysis using any of the various statistical tools such as Python and R when analyzing big data.

A methodology for Internet Customer segmentation using Decision Trees

  • Cho, Y.B.;Kim, S.H.
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2003.05a
    • /
    • pp.206-213
    • /
    • 2003
  • Application of existing decision tree algorithms for Internet retail customer classification is apt to construct a bushy tree due to imprecise source data. Even excessive analysis may not guarantee the effectiveness of the business although the results are derived from fully detailed segments. Thus, it is necessary to determine the appropriate number of segments with a certain level of abstraction. In this study, we developed a stopping rule that considers the total amount of information gained while generating a rule tree. In addition to forwarding from root to intermediate nodes with a certain level of abstraction, the decision tree is investigated by the backtracking pruning method with misclassification loss information.

  • PDF

Prediction method of slope hazards using a decision tree model (의사결정나무모형을 이용한 급경사지재해 예측기법)

  • Song, Young-Suk;Chae, Byung-Gon;Cho, Yong-Chan
    • Proceedings of the Korean Geotechical Society Conference
    • /
    • 2008.03a
    • /
    • pp.1365-1371
    • /
    • 2008
  • Based on the data obtained from field investigation and soil testing to slope hazards occurrence section and non-occurrence section in gneiss area, a prediction technique was developed by the use of a decision tree model. The slope hazards data of Seoul and Kyonggi Province were 104 sections in gneiss area. The number of data applied in developing prediction model was 61 sections except a vacant value. The statistical analyses using the decision tree model were applied to the entrophy index. As the results of analyses, a slope angle, a degree of saturation and an elevation were selected as the classification standard. The prediction model of decision tree using entrophy index is most likely accurate. The classification standard of the selected prediction model is composed of the slope angle, the degree of saturation and the elevation from the first choice stage. The classification standard values of the slope angle, the degree of saturation and elevation are $17.9^{\circ}$, 52.1% and 320m, respectively.

  • PDF

Shot Change Detection Using Multiple Features and Binary Decision Tree (다수의 특징과 이진 분류 트리를 이용한 장면 전환 검출)

  • 홍승범;백중환
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.5C
    • /
    • pp.514-522
    • /
    • 2003
  • Contrary to the previous methods, in this paper, we propose an enhanced shot change detection method using multiple features and binary decision tree. The previous methods usually used single feature and fixed threshold between consecutive frames. However, contents such as color, shape, background, and texture change simultaneously at shot change points in a video sequence. Therefore, in this paper, we detect the shot changes effectively using multiple features, which are supplementary each other, rather than using single feature. In order to classify the shot changes, we use binary classification tree. According to this classification result, we extract important features among the multiple features and obtain threshold value for each feature. We also perform the cross-validation and droop-case to verify the performance of our method. From an experimental result, it was revealed that the EI of our method performed average of 2% better than that of the conventional shot change detection methods.

Prediction Model for the Risk of Scapular Winging in Young Women Based on the Decision Tree

  • Gwak, Gyeong-tae;Ahn, Sun-hee;Kim, Jun-hee;Weon, Young-soo;Kwon, Oh-yun
    • Physical Therapy Korea
    • /
    • v.27 no.2
    • /
    • pp.140-148
    • /
    • 2020
  • Background: Scapular winging (SW) could be caused by tightness or weakness of the periscapular muscles. Although data mining techniques are useful in classifying or predicting risk of musculoskeletal disorder, predictive models for risk of musculoskeletal disorder using the results of clinical test or quantitative data are scarce. Objects: This study aimed to (1) investigate the difference between young women with and without SW, (2) establish a predictive model for presence of SW, and (3) determine the cutoff value of each variable for predicting the risk of SW using the decision tree method. Methods: Fifty young female subjects participated in this study. To classify the presence of SW as the outcome variable, scapular protractor strength, elbow flexor strength, shoulder internal rotation, and whether the scapula is in the dominant or nondominant side were determined. Results: The classification tree selected scapular protractor strength, shoulder internal rotation range of motion, and whether the scapula is in the dominant or nondominant side as predictor variables. The classification tree model correctly classified 78.79% (p = 0.02) of the training data set. The accuracy obtained by the classification tree on the test data set was 82.35% (p = 0.04). Conclusion: The classification tree showed acceptable accuracy (82.35%) and high specificity (95.65%) but low sensitivity (54.55%). Based on the predictive model in this study, we suggested that 20% of body weight in scapular protractor strength is a meaningful cutoff value for presence of SW.

Classification of COVID-19 Disease: A Machine Learning Perspective

  • Kinza Sardar
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.3
    • /
    • pp.107-112
    • /
    • 2024
  • Nowadays the deadly virus famous as COVID-19 spread all over the world starts from the Wuhan China in 2019. This disease COVID-19 Virus effect millions of people in very short time. There are so many symptoms of COVID19 perhaps the Identification of a person infected with COVID-19 virus is really a difficult task. Moreover it's a challenging task to identify whether a person or individual have covid test positive or negative. We are developing a framework in which we used machine learning techniques..The proposed method uses DecisionTree, KNearestNeighbors, GaussianNB, LogisticRegression, BernoulliNB , RandomForest , Machine Learning methods as the classifier for diagnosis of covid ,however, 5-fold and 10-fold cross-validations were applied through the classification process. The experimental results showed that the best accuracy obtained from Decision Tree classifiers. The data preprocessing techniques have been applied for improving the classification performance. Recall, accuracy, precision, and F-score metrics were used to evaluate the classification performance. In future we will improve model accuracy more than we achieved now that is 93 percent by applying different techniques

The Study on Improving Accuracy of Land Cover Classification using Spectral Library of Hyperspectral Image (초분광영상의 분광라이브러리를 이용한 토지피복분류의 정확도 향상에 관한 연구)

  • Park, Jung-Seo;Seo, Jin-Jae;Go, Je-Woong;Cho, Gi-Sung
    • Journal of Cadastre & Land InformatiX
    • /
    • v.46 no.2
    • /
    • pp.239-251
    • /
    • 2016
  • Hyperspectral image is widely used for land cover classification because it has a number of narrow bands and allow each pixel to include much more information in comparison with previous multi-spectral image. However, Higher spectral resolution of hyperspectral image results in an increase in data volumes and a decrease in noise efficiency. SAM(Spectral Angle Mapping), a method based on vector inner product to compare spectrum distribution, is a highly valuable and popular way to analyze continuous spectrum of hyperspectral image. SAM is shown to be less accurate when it is used to analyze hyperspectral image for land cover classification using spectral library. this inaccuracy is due to the effects of atmosphere. We suggest a decision tree based method to compensate the defect and show that the method improved accuracy of land cover classification.

CANCER CLASSIFICATION AND PREDICTION USING MULTIVARIATE ANALYSIS

  • Shon, Ho-Sun;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.706-709
    • /
    • 2006
  • Cancer is one of the major causes of death; however, the survival rate can be increased if discovered at an early stage for timely treatment. According to the statistics of the World Health Organization of 2002, breast cancer was the most prevalent cancer for all cancers occurring in women worldwide, and it account for 16.8% of entire cancers inflicting Korean women today. In order to classify the type of breast cancer whether it is benign or malignant, this study was conducted with the use of the discriminant analysis and the decision tree of data mining with the breast cancer data disclosed on the web. The discriminant analysis is a statistical method to seek certain discriminant criteria and discriminant function to separate the population groups on the basis of observation values obtained from two or more population groups, and use the values obtained to allow the existing observation value to the population group thereto. The decision tree analyzes the record of data collected in the part to show it with the pattern existing in between them, namely, the combination of attribute for the characteristics of each class and make the classification model tree. Through this type of analysis, it may obtain the systematic information on the factors that cause the breast cancer in advance and prevent the risk of recurrence after the surgery.

  • PDF