• Title/Summary/Keyword: 클래스 불균형 분류

Search Result 57, Processing Time 0.028 seconds

Fault Detection of Unbalanced Cycle Signal Data Using SOM-based Feature Signal Extraction Method (SOM기반 특징 신호 추출 기법을 이용한 불균형 주기 신호의 이상 탐지)

  • Kim, Song-Ee;Kang, Ji-Hoon;Park, Jong-Hyuck;Kim, Sung-Shick;Baek, Jun-Geol
    • Journal of the Korea Society for Simulation
    • /
    • v.21 no.2
    • /
    • pp.79-90
    • /
    • 2012
  • In this paper, a feature signal extraction method is proposed in order to enhance the low performance of fault detection caused by unbalanced data which denotes the situations when severe disparity exists between the numbers of class instances. Most of the cyclic signals gathered during the process are recognized as normal, while only a few signals are regarded as fault; the majorities of cyclic signals data are unbalanced data. SOM(Self-Organizing Map)-based feature signal extraction method is considered to fix the adverse effects caused by unbalanced data. The weight neurons, mapped to the every node of SOM grid, are extracted as the feature signals of both class data which are used as a reference data set for fault detection. kNN(k-Nearest Neighbor) and SVM(Support Vector Machine) are considered to make fault detection models with comparisons to Hotelling's $T^2$ Control Chart, the most widely used method for fault detection. Experiments are conducted by using simulated process signals which resembles the frequent cyclic signals in semiconductor manufacturing.

Data Processing of AutoML-based Classification Models for Improving Performance in Unbalanced Classes (불균형 클래스에서 AutoML 기반 분류 모델의 성능 향상을 위한 데이터 처리)

  • Lee, Dong-Joon;Kang, Ji-Soo;Chung, Kyungyong
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.49-54
    • /
    • 2021
  • With the recent development of smart healthcare technology, interest in daily diseases is increasing. However, healthcare data has an imbalance between positive and negative data. This is caused by the difficulty of collecting data because there are relatively many people who are not patients compared to patients with certain diseases. Data imbalances need to be adjusted because they affect performance in ongoing learning during disease prediction and analysis. Therefore, in this paper, We replace missing values through multiple imputation in detection models to determine whether they are prevalent or not, and resolve data imbalances through over-sampling. Based on AutoML using preprocessed data, We generate several models and select top 3 models to generate ensemble models.

Gender Prediction and Precision Inference Method based on the naive Bayesian (나이브 베이지안에 기반한 성별 예측 및 정확률 추론 기법)

  • Kwon, TaeWon;Lee, Euijong;Baik, Doo-Kwon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.04a
    • /
    • pp.588-590
    • /
    • 2016
  • 사용자의 성별은 기본적이면서도 중요한 마케팅 데이터다. 그러나 최근에는 개인정보보호 강화 추세로, 회원가입 시 성별이나 나이 등의 세부 정보를 입력하지 않는 간편 가입이 많아졌다. 이러한 입력되지 않은 정보 추출을 위해 성별 예측 연구의 필요성이 증가되었다. 성별이 입력된 사용자의 정보를 바탕으로 성별이 입력되지 않은 사용자의 성별을 예측하는 기존 연구가 다양한 방법으로 진행되어왔고, 우수한 식별이 가능한 기법들은 이진분류기인 SVM을 기반으로 한 연구가 다수 존재한다. 그러나 SVM 알고리즘은 이진 분류만 가능하기 때문에 성별예측에 대한 정확률은 알 수가 없다. 성별예측의 정확률을 활용하면 부정확한 분류를 예방할 수 있으며 상품추천의 가중치로 사용 될 수 있다. 본 연구는 확률을 기반으로 하여 정확률을 추론 가능한 나이브 베이지안을 응용한다. 그리고 데이터 집합 사례를 균형있게 늘려주는 SMOTE기법을 이용해 클래스 불균형 문제를 개선했으며 또한 성별 예측의 특성에 맞게 노이즈를 제거하고, 성별 분류에 확정적인 아이템에 가중치를 적용했다. 더불어 제안 방법을 실제 데이터에 적용시켜 우수성을 입증하였다.

Image-Based Skin Cancer Classification System Using Attention Layer (Attention layer를 활용한 이미지 기반 피부암 분류 시스템)

  • GyuWon Lee;SungHee Woo
    • Journal of Practical Engineering Education
    • /
    • v.16 no.1_spc
    • /
    • pp.59-64
    • /
    • 2024
  • As the aging population grows, the incidence of cancer is increasing. Skin cancer appears externally, but people often don't notice it or simply overlook it. As a result, if the early detection period is missed, the survival rate in the case of late stage cancer is only 7.5-11%. However, the disadvantage of diagnosing, serious skin cancer is that it requires a lot of time and money, such as a detailed examination and cell tests, rather than simple visual diagnosis. To overcome these challenges, we propose an Attention-based CNN model skin cancer classification system. If skin cancer can be detected early, it can be treated quickly, and the proposed system can greatly help the work of a specialist. To mitigate the problem of image data imbalance according to skin cancer type, this skin cancer classification model applies the Over Sampling, technique to data with a high distribution ratio, and adds a pre-learning model without an Attention layer. This model is then compared to the model without the Attention layer. We also plan to solve the data imbalance problem by strengthening data augmentation techniques for specific classes.

Feature Selection for Anomaly Detection Based on Genetic Algorithm (유전 알고리즘 기반의 비정상 행위 탐지를 위한 특징선택)

  • Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.7
    • /
    • pp.1-7
    • /
    • 2018
  • Feature selection, one of data preprocessing techniques, is one of major research areas in many applications dealing with large dataset. It has been used in pattern recognition, machine learning and data mining, and is now widely applied in a variety of fields such as text classification, image retrieval, intrusion detection and genome analysis. The proposed method is based on a genetic algorithm which is one of meta-heuristic algorithms. There are two methods of finding feature subsets: a filter method and a wrapper method. In this study, we use a wrapper method, which evaluates feature subsets using a real classifier, to find an optimal feature subset. The training dataset used in the experiment has a severe class imbalance and it is difficult to improve classification performance for rare classes. After preprocessing the training dataset with SMOTE, we select features and evaluate them with various machine learning algorithms.

Ensemble Size Reduction in Fraud Detection System (축소된 앙상블에 의한 부정행위 적발 모형)

  • Song, Yeong-Mi;Ji, Won-Cheol;Han, Wan-Gyu
    • 한국경영정보학회:학술대회논문집
    • /
    • 2007.06a
    • /
    • pp.597-602
    • /
    • 2007
  • 데이터 마이닝 분야에서 앙상블 모형의 유용성은 널리 인정되고 있다. 앙상블을 구성하는 단위모형들 사이의 다양성이 보장되는 경우, 최종 모형의 정확성 및 안정성이 향상되기 때문이다. 하지만, 얼마나 많은 단위 모형들이 어떤 방식으로 결합되어야 하는가에 대해서는 아직도 더 많은 연구가 필요하다. 본 연구에서는 신용카드 부정사용 유형 중 하나인 현금불법융통 문제에 대해 앙상블 모형의 유용성을 검증하고자 한다. 부정행위 적발 모형은 전형적인 분류 문제의 한 유형이나, 클래스간 불균형이 매우 심하다는 특징이 있다. 따라서, 현금불법융통 문제에 적합한 다양성(Diversity) 척도를 개발하여 최소한의 단위모형들로 앙상블 모형을 구성하는 방안을 제시하였다. 축소된 앙상블 모형이 많은 수의 모형을 결합한 앙상블 모형과 거의 같은 정확성 및 안정성을 보임을 국내 신용카드사의 실제 자료를 사용하여 입증하였다.

  • PDF

A Study on the Classification of Fault Motors using Sound Data (소리 데이터를 이용한 불량 모터 분류에 관한 연구)

  • Il-Sik, Chang;Gooman, Park
    • Journal of Broadcast Engineering
    • /
    • v.27 no.6
    • /
    • pp.885-896
    • /
    • 2022
  • Motor failure in manufacturing plays an important role in future A/S and reliability. Motor failure is detected by measuring sound, current, and vibration. For the data used in this paper, the sound of the car's side mirror motor gear box was used. Motor sound consists of three classes. Sound data is input to the network model through a conversion process through MelSpectrogram. In this paper, various methods were applied, such as data augmentation to improve the performance of classifying fault motors and various methods according to class imbalance were applied resampling, reweighting adjustment, change of loss function and representation learning and classification into two stages. In addition, the curriculum learning method and self-space learning method were compared through a total of five network models such as Bidirectional LSTM Attention, Convolutional Recurrent Neural Network, Multi-Head Attention, Bidirectional Temporal Convolution Network, and Convolution Neural Network, and the optimal configuration was found for motor sound classification.

Predictability of emergency water supply using machine learning-based classification techniques (딥러닝 기반 분류기법을 활용한 비상급수 예측 가능성 검토)

  • Oh, Yeoung Rok;Jun, Kyung Soo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.303-303
    • /
    • 2022
  • 기후변화로 인해 기상이변 현상의 발생 빈도가 잦아지며 가뭄 방생 빈도 또한 증가하는 추세이다. 이에 따라 가뭄 피해를 경감하는 선제적 가뭄대응체계 구축과 가뭄이 발생한 이후에 피해를 최소화하기 위한 연구가 필요하다. 본 연구에서는 가뭄피해 여부를 이진분류 방법으로 접근하여 예측 가능성을 검토하였다. 가뭄피해 여부는 비상급수(제한급수,운반급수) 자료를 이용하여 비상급수가 시행된 경우를 가뭄피해 발생으로 보고, 비상급수가 시행되지 않은 경우를 피해 없는 사례로 구분하였다. 기상 상황 변수로는 강수량, 기온, 상대습도 등을 이용하였다. 또한 지역별 연간 총 급수량 대비 저수량을 이용하여 지역별 현 상황을 고려하고자 하였다. 의사결정나무를 이용하여 분석한 결과 불균형 클래스 문제의 정확도에 주로 이용되는 오차행렬의 정확도가 0.95 이상으로 나타났으며, F1-Score는 약 0.5 로 나타났다. 이는 예측 결과 전체를 대상으로 했을 경우 95 %의 확률로 가뭄피해 여부를 구분할 수 있는 것을 나타내며, 가뭄 피해만을 대상으로 했을 경우 50 %의 정확도를 타나낸다. 그러나 본 연구에서는 비상급수를 유발하는 충분한 환경적 변수를 고려하지 않았고, 다양한 딥러닝 모형을 분석하지 않았다. 따라서 비상급수를 유발하는 요인을 충분히 고려하고 딥러닝 기법을 고도화 한다면 모형의 정확도 개선을 기대할 수 있을 것으로 판단된다.

  • PDF

A Study on the Prediction of Uniaxial Compressive Strength Classification Using Slurry TBM Data and Random Forest (이수식 TBM 데이터와 랜덤포레스트를 이용한 일축압축강도 분류 예측에 관한 연구)

  • Tae-Ho Kang;Soon-Wook Choi;Chulho Lee;Soo-Ho Chang
    • Tunnel and Underground Space
    • /
    • v.33 no.6
    • /
    • pp.547-560
    • /
    • 2023
  • Recently, research on predicting ground classification using machine learning techniques, TBM excavation data, and ground data is increasing. In this study, a multi-classification prediction study for uniaxial compressive strength (UCS) was conducted by applying random forest model based on a decision tree among machine learning techniques widely used in various fields to machine data and ground data acquired at three slurry shield TBM sites. For the classification prediction, the training and test data were divided into 7:3, and a grid search including 5-fold cross-validation was used to select the optimal parameter. As a result of classification learning for UCS using a random forest, the accuracy of the multi-classification prediction model was found to be high at both 0.983 and 0.982 in the training set and the test set, respectively. However, due to the imbalance in data distribution between classes, the recall was evaluated low in class 4. It is judged that additional research is needed to increase the amount of measured data of UCS acquired in various sites.

Design of Fetal Health Classification Model for Hospital Operation Management (효율적인 병원보건관리를 위한 태아건강분류 모델)

  • Chun, Je-Ran
    • Journal of Digital Convergence
    • /
    • v.19 no.5
    • /
    • pp.263-268
    • /
    • 2021
  • The purpose of this study was to propose a model which is suitable for the actual delivery system by designing a fetal delivery hospital operation management and fetal health classification model. The number of deaths during childbirth is similar to the number of maternal mortality rate of 295,000 as of 2017. Among those numbers, 94% of deaths are preventable in most cases. Therefore, in this paper, we proposed a model that predicts the health condition of the fetus using data like heart rate of fetuses, fetal movements, uterine contractions, etc. that are extracted from the Cardiotocograms(CTG) test using a random forest. If the redundancy of the data is unbalanced, This proposed model guarantees a stable management of the fetal delivery health management system. To secure the accuracy of the fetal delivery health management system, we remove the outlier which embedded in the system, by setting thresholds for the upper and lower standard deviations. In addition, as the proportion of the sequence class uses the health status of fetus, a small number of classes were replicated by data-resampling to balance the classes. We had the 4~5% improvement and as the result we reached the accuracy of 97.75%. It is expected that the developed model will contribute to prevent death and effective fetal health management, also disease prevention by predicting and managing the fetus'deaths and diseases accurately in advance.