• 제목/요약/키워드: Classification Performance

검색결과 3,704건 처리시간 0.041초

의사결정트리의 분류 정확도 향상 (Classification Accuracy Improvement for Decision Tree)

  • 메하리 마르타 레제네;박상현
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2017년도 춘계학술발표대회
    • /
    • pp.787-790
    • /
    • 2017
  • Data quality is the main issue in the classification problems; generally, the presence of noisy instances in the training dataset will not lead to robust classification performance. Such instances may cause the generated decision tree to suffer from over-fitting and its accuracy may decrease. Decision trees are useful, efficient, and commonly used for solving various real world classification problems in data mining. In this paper, we introduce a preprocessing technique to improve the classification accuracy rates of the C4.5 decision tree algorithm. In the proposed preprocessing method, we applied the naive Bayes classifier to remove the noisy instances from the training dataset. We applied our proposed method to a real e-commerce sales dataset to test the performance of the proposed algorithm against the existing C4.5 decision tree classifier. As the experimental results, the proposed method improved the classification accuracy by 8.5% and 14.32% using training dataset and 10-fold crossvalidation, respectively.

가우시안 혼합모델을 이용한 솔라셀 색상분류 (Solar Cell Classification using Gaussian Mixture Models)

  • 고진석;임재열
    • 반도체디스플레이기술학회지
    • /
    • 제10권2호
    • /
    • pp.1-5
    • /
    • 2011
  • In recent years, worldwide production of solar wafers increased rapidly. Therefore, the solar wafer technology in the developed countries already has become an industry, and related industries such as solar wafer manufacturing equipment have developed rapidly. In this paper we propose the color classification method of the polycrystalline solar wafer that needed in manufacturing equipment. The solar wafer produced in the manufacturing process does not have a uniform color. Therefore, the solar wafer panels made with insensitive color uniformity will fall off the aesthetics. Gaussian mixture models (GMM) are among the most statistically mature methods for clustering and we use the Gaussian mixture models for the classification of the polycrystalline solar wafers. In addition, we compare the performance of the color feature vector from various color space for color classification. Experimental results show that the feature vector from YCbCr color space has the most efficient performance and the correct classification rate is 97.4%.

A Deeping Learning-based Article- and Paragraph-level Classification

  • Kim, Euhee
    • 한국컴퓨터정보학회논문지
    • /
    • 제23권11호
    • /
    • pp.31-41
    • /
    • 2018
  • Text classification has been studied for a long time in the Natural Language Processing field. In this paper, we propose an article- and paragraph-level genre classification system using Word2Vec-based LSTM, GRU, and CNN models for large-scale English corpora. Both article- and paragraph-level classification performed best in accuracy with LSTM, which was followed by GRU and CNN in accuracy performance. Thus, it is to be confirmed that in evaluating the classification performance of LSTM, GRU, and CNN, the word sequential information for articles is better than the word feature extraction for paragraphs when the pre-trained Word2Vec-based word embeddings are used in both deep learning-based article- and paragraph-level classification tasks.

Object Classification Method Using Dynamic Random Forests and Genetic Optimization

  • Kim, Jae Hyup;Kim, Hun Ki;Jang, Kyung Hyun;Lee, Jong Min;Moon, Young Shik
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권5호
    • /
    • pp.79-89
    • /
    • 2016
  • In this paper, we proposed the object classification method using genetic and dynamic random forest consisting of optimal combination of unit tree. The random forest can ensure good generalization performance in combination of large amount of trees by assigning the randomization to the training samples and feature selection, etc. allocated to the decision tree as an ensemble classification model which combines with the unit decision tree based on the bagging. However, the random forest is composed of unit trees randomly, so it can show the excellent classification performance only when the sufficient amounts of trees are combined. There is no quantitative measurement method for the number of trees, and there is no choice but to repeat random tree structure continuously. The proposed algorithm is composed of random forest with a combination of optimal tree while maintaining the generalization performance of random forest. To achieve this, the problem of improving the classification performance was assigned to the optimization problem which found the optimal tree combination. For this end, the genetic algorithm methodology was applied. As a result of experiment, we had found out that the proposed algorithm could improve about 3~5% of classification performance in specific cases like common database and self infrared database compare with the existing random forest. In addition, we had shown that the optimal tree combination was decided at 55~60% level from the maximum trees.

평행사변형 분류 알고리즘의 성능에 대한 연구 (A Study on the Performance of Parallelepiped Classification Algorithm)

  • 용환기
    • 한국지리정보학회지
    • /
    • 제4권4호
    • /
    • pp.1-7
    • /
    • 2001
  • 위성영상은 GIS 정보획득을 위한 가장 중요한 초기자료로서, 이로부터 주제도와 같은 유용한 정보를 추출하기 위해서는 위성영상 즉 다중스펙트럼 영상을 목적에 적합하게 분류하는 처리과정이 필요하다. 위성영상의 분류기법은 크게 감독기법과 무감독기법으로 나뉘는데, 본 논문에서는 감독분류기법 중의 하나인 평행사변형 알고리즘에서 군집의 초기값 설정이 알고리즘의 성능에 미치는 영향을 분석한다. 본 연구에서는 우선 직렬컴퓨터에서 평행사변형 알고리즘의 성능과 초기값 변화와의 관계를 살펴보고, 이를 확장하여 MIMD 병렬구조 컴퓨터 모델을 사용한 경우에 초기값의 변화가 평행사변형 알고리즘의 성능에 미치는 영향을 분석한다. 평행사변형 알고리즘의 성능은 초기값의 설정에 따라 직렬구조의 컴퓨터를 사용하는 경우에는 최고 2.4배, 그리고 MIMD 병렬구조 모델을 사용한 경우에는 최고 2.5배의 성능 향상을 보였다. 전산모의실험을 통해 위성영상의 감독분류기법에서 초기값이 평행사변형 분류알고리즘의 성능에 상당한 영향을 미치며, 직렬컴퓨터와 MIMD 병렬컴퓨터에서 초기값의 적절한 설정을 통해 분류기법의 성능이 향상됨을 확인하였다.

  • PDF

Stream-based Biomedical Classification Algorithms for Analyzing Biosignals

  • Fong, Simon;Hang, Yang;Mohammed, Sabah;Fiaidhi, Jinan
    • Journal of Information Processing Systems
    • /
    • 제7권4호
    • /
    • pp.717-732
    • /
    • 2011
  • Classification in biomedical applications is an important task that predicts or classifies an outcome based on a given set of input variables such as diagnostic tests or the symptoms of a patient. Traditionally the classification algorithms would have to digest a stationary set of historical data in order to train up a decision-tree model and the learned model could then be used for testing new samples. However, a new breed of classification called stream-based classification can handle continuous data streams, which are ever evolving, unbound, and unstructured, for instance--biosignal live feeds. These emerging algorithms can potentially be used for real-time classification over biosignal data streams like EEG and ECG, etc. This paper presents a pioneer effort that studies the feasibility of classification algorithms for analyzing biosignals in the forms of infinite data streams. First, a performance comparison is made between traditional and stream-based classification. The results show that accuracy declines intermittently for traditional classification due to the requirement of model re-learning as new data arrives. Second, we show by a simulation that biosignal data streams can be processed with a satisfactory level of performance in terms of accuracy, memory requirement, and speed, by using a collection of stream-mining algorithms called Optimized Very Fast Decision Trees. The algorithms can effectively serve as a corner-stone technology for real-time classification in future biomedical applications.

컨볼루션 신경망 모델을 이용한 분류에서 입력 영상의 종류가 정확도에 미치는 영향 (The Effect of Type of Input Image on Accuracy in Classification Using Convolutional Neural Network Model)

  • 김민정;김정훈;박지은;정우연;이종민
    • 대한의용생체공학회:의공학회지
    • /
    • 제42권4호
    • /
    • pp.167-174
    • /
    • 2021
  • The purpose of this study is to classify TIFF images, PNG images, and JPEG images using deep learning, and to compare the accuracy by verifying the classification performance. The TIFF, PNG, and JPEG images converted from chest X-ray DICOM images were applied to five deep neural network models performed in image recognition and classification to compare classification performance. The data consisted of a total of 4,000 X-ray images, which were converted from DICOM images into 16-bit TIFF images and 8-bit PNG and JPEG images. The learning models are CNN models - VGG16, ResNet50, InceptionV3, DenseNet121, and EfficientNetB0. The accuracy of the five convolutional neural network models of TIFF images is 99.86%, 99.86%, 99.99%, 100%, and 99.89%. The accuracy of PNG images is 99.88%, 100%, 99.97%, 99.87%, and 100%. The accuracy of JPEG images is 100%, 100%, 99.96%, 99.89%, and 100%. Validation of classification performance using test data showed 100% in accuracy, precision, recall and F1 score. Our classification results show that when DICOM images are converted to TIFF, PNG, and JPEG images and learned through preprocessing, the learning works well in all formats. In medical imaging research using deep learning, the classification performance is not affected by converting DICOM images into any format.

Feature Selection Algorithm for Intrusions Detection System using Sequential Forward Search and Random Forest Classifier

  • Lee, Jinlee;Park, Dooho;Lee, Changhoon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권10호
    • /
    • pp.5132-5148
    • /
    • 2017
  • Cyber attacks are evolving commensurate with recent developments in information security technology. Intrusion detection systems collect various types of data from computers and networks to detect security threats and analyze the attack information. The large amount of data examined make the large number of computations and low detection rates problematic. Feature selection is expected to improve the classification performance and provide faster and more cost-effective results. Despite the various feature selection studies conducted for intrusion detection systems, it is difficult to automate feature selection because it is based on the knowledge of security experts. This paper proposes a feature selection technique to overcome the performance problems of intrusion detection systems. Focusing on feature selection, the first phase of the proposed system aims at constructing a feature subset using a sequential forward floating search (SFFS) to downsize the dimension of the variables. The second phase constructs a classification model with the selected feature subset using a random forest classifier (RFC) and evaluates the classification accuracy. Experiments were conducted with the NSL-KDD dataset using SFFS-RF, and the results indicated that feature selection techniques are a necessary preprocessing step to improve the overall system performance in systems that handle large datasets. They also verified that SFFS-RF could be used for data classification. In conclusion, SFFS-RF could be the key to improving the classification model performance in machine learning.

공연종사자 피난을 위한 안전시설의 운영전략 연구 (A Study of the Safety Facilities Operation Strategies for Performing Arts Workers Evacuation)

  • 정성학;박용규
    • 대한안전경영과학회지
    • /
    • 제26권1호
    • /
    • pp.63-74
    • /
    • 2024
  • The objectives of this study is to classify evacuation types, derive the characteristics of 4 types, develop and discover evacuation routes within the performance hall space, and present the statistical classification results of the evacuation classification model by classification type. To achieve this purpose, the characteristics of each evacuation type's four types are applied through a network reliability analysis method and utilized for institutional improvement and policy. This study applies for the building law, evacuation and relief safety standards when establishing a performance hall safety management plan, and reflects it in safety-related laws, safety standards, and policy systems. Statistical data by evacuation type were analyzed, and measurement characteristics were compared and analyzed by evacuation types. Evaluate the morphological similarity and reliability of evacuation types according to door width and passage length and propose the install position of evacuation guidance sign boards. The results of this study are expected to be used as basic data to provide operation strategies for safety facility evacuation information sign boards according to evacuation route classification types when taking a safety management plan. The operation strategy for the evacuation sign boards installation that integrates employee guidance and safety training is applied to the performance hall safety management plan. It will contribute to establishing an operational strategy for performance space safety when constructing performance facilities in the future.

InceptionV3 기반의 심장비대증 분류 정확도 향상 연구 (A Study on the Improvement of Accuracy of Cardiomegaly Classification Based on InceptionV3)

  • 정우연;김정훈
    • 대한의용생체공학회:의공학회지
    • /
    • 제43권1호
    • /
    • pp.45-51
    • /
    • 2022
  • The purpose of this study is to improve the classification accuracy compared to the existing InceptionV3 model by proposing a new model modified with the fully connected hierarchical structure of InceptionV3, which showed excellent performance in medical image classification. The data used for model training were trained after data augmentation on a total of 1026 chest X-ray images of patients diagnosed with normal heart and Cardiomegaly at Kyungpook National University Hospital. As a result of the experiment, the learning classification accuracy and loss of the InceptionV3 model were 99.57% and 1.42, and the accuracy and loss of the proposed model were 99.81% and 0.92. As a result of the classification performance evaluation for precision, recall, and F1 score of Inception V3, the precision of the normal heart was 78%, the recall rate was 100%, and the F1 score was 88. The classification accuracy for Cardiomegaly was 100%, the recall rate was 78%, and the F1 score was 88. On the other hand, in the case of the proposed model, the accuracy for a normal heart was 100%, the recall rate was 92%, and the F1 score was 96. The classification accuracy for Cardiomegaly was 95%, the recall rate was 100%, and the F1 score was 97. If the chest X-ray image for normal heart and Cardiomegaly can be classified using the model proposed based on the study results, better classification will be possible and the reliability of classification performance will gradually increase.