• Title/Summary/Keyword: 오버샘플링 기법

Search Result 57, Processing Time 0.024 seconds

Attention-Based Ensemble for Mitigating Side Effects of Data Imbalance Method (데이터 불균형 기법의 부작용 완화를 위한 어텐션 기반 앙상블)

  • Yo-Han Park;Yong-Seok Choi;Wencke Liermann;Kong Joo Lee
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.546-551
    • /
    • 2023
  • 일반적으로 딥러닝 모델은 모든 라벨에 데이터 수가 균형을 이룰 때 가장 좋은 성능을 보인다. 그러나 현실에서는 특정라벨에 대한 데이터가 부족한 경우가 많으며 이로 인해 불균형 데이터 문제가 발생한다. 이에 대한 해결책으로 오버샘플링과 가중치 손실과 같은 데이터 불균형 기법이 연구되었지만 이러한 기법들은 데이터가 적은 라벨의 성능을 개선하는 동시에 데이터가 많은 라벨의 성능을 저하시키는 부작용을 가지고 있다. 본 논문에서는 이 문제를 완화시키고자 어텐션 기반의 앙상블 기법을 제안한다. 어텐션 기반의 앙상블은 데이터 불균형 기법을 적용한 모델과 적용하지 않은 모델의 출력 값을 가중 평균하여 최종 예측을 수행한다. 이때 가중치는 어텐션 메커니즘을 통해 동적으로 조절된다. 그로므로 어텐션 기반의 앙상블 모델은 입력 데이터 특성에 따라 가중치를 조절할 수가 있다. 실험은 에세이 자동 평가 데이터를 대상으로 수행하였다. 실험 결과로는 제안한 모델이 데이터 불균형 기법의 부작용을 완화하고 성능이 개선되었다.

  • PDF

A Design of Variable Rate Clock and Data Recovery Circuit for Biomedical Silicon Bead (생체 의학 정보 수집이 가능한 실리콘 비드용 가변적인 속도 클록 데이터 복원 회로 설계)

  • Cho, Sung-Hun;Lee, Dong-Soo;Park, Hyung-Gu;Lee, Kang-Yoon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.20 no.4
    • /
    • pp.39-45
    • /
    • 2015
  • In this paper, variable rate CDR(Clock and Data Recovery) circuit adopting blind oversampling architecture is presented. The clock recovery circuit is implemented by using wide range voltage controlled oscillator and band selection method and the data recovery circuit is designed to digital circuit used majority voting method in order to low power and small area. The designed low power variable clock and data recovery is implemented by wide range voltage controlled oscillator and digital data recovery circuit. The designed variable rate CDR is operated from 10 bps to 2 Mbps. The total power consumption is about 4.4mW at 1MHz clock. The supply voltage is 1.2V. The designed die area is $120{\mu}m{\times}75{\mu}m$ and this circuit is fabricated in $0.13{\mu}m$ CMOS process.

Analysis of Incident Impact Factors and Development of SMOGN-DNN Model for Prediction of Incident Clearance Time (돌발상황 처리시간 예측을 위한 영향요인 분석 및 SMOGN-DNN 모델 개발)

  • Yun, Gyu Ri;Bae, Sang Hoon
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.20 no.4
    • /
    • pp.46-56
    • /
    • 2021
  • Predicting the incident clearance time is important for eliminating the high transportation costs and congestion from non-repetitive congestion caused by incidents. In this study, the factors influencing the clearance time suitable for domestic road conditions were analyzed, using a training dataset for predicting the incident clearance time using artificial neural networks. In a previous study, the under-prediction problem for high incident clearance time was used. In the present study, over-sampling training data applied using the SMOGN technique was obtained and applied to the model as a solution. As a result, the DNN model applying the SMOGN technique could compensate for the limitations of the previously developed prediction model by predicting the clearance time with the highest accuracy among the models developed in the research process with MAE = 18.3 minutes.

Arrhythmia Classification using GAN-based Over-Sampling Method and Combination Model of CNN-BLSTM (GAN 오버샘플링 기법과 CNN-BLSTM 결합 모델을 이용한 부정맥 분류)

  • Cho, Ik-Sung;Kwon, Hyeog-Soong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.10
    • /
    • pp.1490-1499
    • /
    • 2022
  • Arrhythmia is a condition in which the heart has an irregular rhythm or abnormal heart rate, early diagnosis and management is very important because it can cause stroke, cardiac arrest, or even death. In this paper, we propose arrhythmia classification using hybrid combination model of CNN-BLSTM. For this purpose, the QRS features are detected from noise removed signal through pre-processing and a single bit segment was extracted. In this case, the GAN oversampling technique is applied to solve the data imbalance problem. It consisted of CNN layers to extract the patterns of the arrhythmia precisely, used them as the input of the BLSTM. The weights were learned through deep learning and the learning model was evaluated by the validation data. To evaluate the performance of the proposed method, classification accuracy, precision, recall, and F1-score were compared by using the MIT-BIH arrhythmia database. The achieved scores indicate 99.30%, 98.70%, 97.50%, 98.06% in terms of the accuracy, precision, recall, F1 score, respectively.

The Development of Property Prediction Model in Consideration of Biodegradable Fiber Spinning Process Data Characteristics (생분해성 섬유 방사 공정 데이터 특성을 고려한 물성 예측 모델 개발)

  • Park, SeChan;Kim, Deok Yeop;Seo, Kang Bok;Lee, Woo Jin
    • Annual Conference of KIPS
    • /
    • 2022.05a
    • /
    • pp.362-365
    • /
    • 2022
  • 최근 노동 집약적인 성격의 섬유 산업에서는 AI를 통해 공정에 들어가는 시간과 비용을 줄이고 품질을 최적화 하려는 시도를 하고 있다. 그러나 섬유 방사 공정은 데이터 수집에 필요한 비용이 크고 체계적인 데이터 처리 시스템이 부족하여 축적된 데이터양이 적다. 또 방사 목적에 따라 특정 변수 위주의 조합에 대한 데이터만을 우선적으로 수집하여 데이터 불균형이 발생하며, 물성 측정환경 차이로 인해 동일 방사조건에서 수집된 샘플 간에도 오차가 존재한다. 이러한 데이터 특성들을 고려하지 않고 AI 모델에 활용할 경우 과적합과 성능 저하 등의 문제가 발생할 수 있다. 따라서 본 논문에서는 물성 단위 및 허용오차를 고려한 이상치 처리 기법과 데이터 불균형 정도 및 물성과의 상관성을 고려한 오버샘플링 기법을 물성 예측 모델에 적용한다. 두 기법들을 모델에 적용한 결과 그렇지 않은 모델에 비해 물성 예측 오차와 방사 공정 데이터에 대한 모델의 적합도가 개선됨을 보인다.

On sampling algorithms for imbalanced binary data: performance comparison and some caveats (불균형적인 이항 자료 분석을 위한 샘플링 알고리즘들: 성능비교 및 주의점)

  • Kim, HanYong;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.681-690
    • /
    • 2017
  • Various imbalanced binary classification problems exist such as fraud detection in banking operations, detecting spam mail and predicting defective products. Several sampling methods such as over sampling, under sampling, SMOTE have been developed to overcome the poor prediction performance of binary classifiers when the proportion of one group is dominant. In order to overcome this problem, several sampling methods such as over-sampling, under-sampling, SMOTE have been developed. In this study, we investigate prediction performance of logistic regression, Lasso, random forest, boosting and support vector machine in combination with the sampling methods for binary imbalanced data. Four real data sets are analyzed to see if there is a substantial improvement in prediction performance. We also emphasize some precautions when the sampling methods are implemented.

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Fraud Detection System Model Using Generative Adversarial Networks and Deep Learning (생성적 적대 신경망과 딥러닝을 활용한 이상거래탐지 시스템 모형)

  • Ye Won Kim;Ye Lim Yu;Hong Yong Choi
    • Information Systems Review
    • /
    • v.22 no.1
    • /
    • pp.59-72
    • /
    • 2020
  • Artificial Intelligence is establishing itself as a familiar tool from an intractable concept. In this trend, financial sector is also looking to improve the problem of existing system which includes Fraud Detection System (FDS). It is being difficult to detect sophisticated cyber financial fraud using original rule-based FDS. This is because diversification of payment environment and increasing number of electronic financial transactions has been emerged. In order to overcome present FDS, this paper suggests 3 types of artificial intelligence models, Generative Adversarial Network (GAN), Deep Neural Network (DNN), and Convolutional Neural Network (CNN). GAN proves how data imbalance problem can be developed while DNN and CNN show how abnormal financial trading patterns can be precisely detected. In conclusion, among the experiments on this paper, WGAN has the highest improvement effects on data imbalance problem. DNN model reflects more effects on fraud classification comparatively.

Fashion Category Oversampling Automation System

  • Minsun Yeu;Do Hyeok Yoo;SuJin Bak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.31-40
    • /
    • 2024
  • In the realm of domestic online fashion platform industry the manual registration of product information by individual business owners leads to inconvenience and reliability issues, especially when dealing with simultaneous registrations of numerous product groups. Moreover, bias is significantly heightened due to the low quality of product images and an imbalance in data quantity. Therefore, this study proposes a ResNet50 model aimed at minimizing data bias through oversampling techniques and conducting multiple classifications for 13 fashion categories. Transfer learning is employed to optimize resource utilization and reduce prolonged learning times. The results indicate improved discrimination of up to 33.4% for data augmentation in classes with insufficient data compared to the basic convolution neural network (CNN) model. The reliability of all outcomes is underscored by precision and affirmed by the recall curve. This study is suggested to advance the development of the domestic online fashion platform industry to a higher echelon.

Cross-Layer Handover Scheme Using Linear Regression Analysis in Mobile WiMAX Networks (선형 회귀 분석을 이용한 모바일 와이맥스에서 계층 통합적 핸드오버 기법)

  • Choi, Yong-Hoon;Yun, Seok-Yeul;Chung, Young-Uk;Kim, Beom-Joon;Lee, Jung-Ryun;Lee, Hyun-Joon
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.8 no.2
    • /
    • pp.91-99
    • /
    • 2009
  • Mobile WiMAX is an emerging technology that can provide ubiquitous Internet access. To provide seamless service in mobile WiMAX environment, delay or disruption in dealing with mobility must be minimized. However offering seamless services on IEEE 802.16e networks is very hard due to long handover latency both in layer 2 and 3. In this paper, we propose a fast cross-layer handover scheme based on prediction algorithm. With the help of the prediction, layer-3 handover activities are able to occur prior to layer-2 handover, and therefore, total handover latency can be reduced. The experiments conducted with system parameters and propagation model defined by WiMAX Forum demonstrate that the proposed method predicts the future signal level accurately and reduces the total handover latency.

  • PDF