• Title/Summary/Keyword: class imbalance

Search Result 127, Processing Time 0.021 seconds

A Study on the Improvement of Image Classification Performance in the Defense Field through Cost-Sensitive Learning of Imbalanced Data (불균형데이터의 비용민감학습을 통한 국방분야 이미지 분류 성능 향상에 관한 연구)

  • Jeong, Miae;Ma, Jungmok
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.24 no.3
    • /
    • pp.281-292
    • /
    • 2021
  • With the development of deep learning technology, researchers and technicians keep attempting to apply deep learning in various industrial and academic fields, including the defense. Most of these attempts assume that the data are balanced. In reality, since lots of the data are imbalanced, the classifier is not properly built and the model's performance can be low. Therefore, this study proposes cost-sensitive learning as a solution to the imbalance data problem of image classification in the defense field. In the proposed model, cost-sensitive learning is a method of giving a high weight on the cost function of a minority class. The results of cost-sensitive based model shows the test F1-score is higher when cost-sensitive learning is applied than general learning's through 160 experiments using submarine/non-submarine dataset and warship/non-warship dataset. Furthermore, statistical tests are conducted and the results are shown significantly.

A Study on Optimization of Classification Performance through Fourier Transform and Image Augmentation (푸리에 변환 및 이미지 증강을 통한 분류 성능 최적화에 관한 연구)

  • Kihyun Kim;Seong-Mok Kim;Yong Soo Kim
    • Journal of Korean Society for Quality Management
    • /
    • v.51 no.1
    • /
    • pp.119-129
    • /
    • 2023
  • Purpose: This study proposes a classification model for implementing condition-based maintenance (CBM) by monitoring the real-time status of a machine using acceleration sensor data collected from a vehicle. Methods: The classification model's performance was improved by applying Fourier transform to convert the acceleration sensor data from the time domain to the frequency domain. Additionally, the Generative Adversarial Network (GAN) algorithm was used to augment images and further enhance the classification model's performance. Results: Experimental results demonstrate that the GAN algorithm can effectively serve as an image augmentation technique to enhance the performance of the classification model. Consequently, the proposed approach yielded a significant improvement in the classification model's accuracy. Conclusion: While this study focused on the effectiveness of the GAN algorithm as an image augmentation method, further research is necessary to compare its performance with other image augmentation techniques. Additionally, it is essential to consider the potential for performance degradation due to class imbalance and conduct follow-up studies to address this issue.

Two-Branch Classifier for Retinal Imaging Analysis (망막 영상 분석을 위한 두 갈래 분류기)

  • Oh, Young-tack;Park, Hyunjin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.614-616
    • /
    • 2021
  • The world faces difficulties in terms of eye care, including treatment, quality of prevention, vision rehabilitation services, and scarcity of trained eye care experts. However, it is difficult to develop a method for classifying various ocular diseases because the existing dataset for retinal image disclosure does not consist of various diseases found in clinical practice. We propose a method for classifying ocular diseases using the Retinal Fundus Multi-disease Image Dataset (RFMiD), a dataset published in the ISBI-2021 challenge. Our goal is to develop a robust and generalizable model for screening retinal images into normal and abnormal categories. The performance of the proposed model shows a value of 0.9782 for the test dataset as an area under the curve (AUC) score.

  • PDF

Cost-Sensitive Learning for Cardio-Cerebrovascular Disease Risk Prediction (심혈관질환 위험 예측을 위한 비용민감 학습 모델)

  • Yu Na Lee;Kyung-Hee Lee;Wan-Sup Cho
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.161-168
    • /
    • 2021
  • In this study, we propose a cardiovascular disease prediction model using machine learning. First, a multidimensional analysis of various differences between the two groups is performed and the results are visualized. In particular, we propose a predictive model using cost-sensitive learning that can improve the sensitivity for cases where there is a high class imbalance between the normal and patient groups, such as diseases. In this study, a predictive model is developed using CART and XGBoost, which are representative machine learning technologies, and prediction and performance are compared for cardiovascular disease patient data. According to the study results, CART showed higher accuracy and specificity than XGBoost, and the accuracy was about 70% to 74%.

A Study of Analysis on the Menu Concept of the Hotel Semi Buffet Restaurants - Focusing on the 1st class hotels in seoul - (호텔 세미뷔페 레스토랑의 메뉴 컨셉 분석 - 서울시내 특1급 호텔을 중심으로 -)

  • Min, Kye-Hong;Choi, Young-Ki
    • Journal of the Korean Society of Food Culture
    • /
    • v.22 no.5
    • /
    • pp.597-602
    • /
    • 2007
  • For the hotel industry, the situations having difficulties in management are becoming we planed by the rises of the cost and labor costs, the imbalance between supply and demand, stiffening competitions between the hotels. Therefore, there has been a plan for a great change to attract customers, escaping from the existing form of management in order to secure competitive powers in the food and beverage field. For that purpose, we plan to investigate into the preference of buffet restaurants in ten 5star hotels in Seoul. By the analysis, we also plan to present the menu concepts that stand out and are preferred by the customers in managing semi-buffet restaurants. Therefore, the linear and planar coordinate values of the H Hotels and I Hotels came out both positive(+) as results of a similarity analysis using MOS, we can predict that they would be positioning on the same dimension. Furthermore we can predict that the menu of antipasto, sushi, sashimi and desserts would be positioning on the same dimension as a result of analysis of the most preferred menu by customers for each station in managing a semi-buffet restaurant. Based on these results, there must be continuous supervision over the menu of buffet restaurants.

Analyzing Key Variables in Network Attack Classification on NSL-KDD Dataset using SHAP (SHAP 기반 NSL-KDD 네트워크 공격 분류의 주요 변수 분석)

  • Sang-duk Lee;Dae-gyu Kim;Chang Soo Kim
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.4
    • /
    • pp.924-935
    • /
    • 2023
  • Purpose: The central aim of this study is to leverage machine learning techniques for the classification of Intrusion Detection System (IDS) data, with a specific focus on identifying the variables responsible for enhancing overall performance. Method: First, we classified 'R2L(Remote to Local)' and 'U2R (User to Root)' attacks in the NSL-KDD dataset, which are difficult to detect due to class imbalance, using seven machine learning models, including Logistic Regression (LR) and K-Nearest Neighbor (KNN). Next, we use the SHapley Additive exPlanation (SHAP) for two classification models that showed high performance, Random Forest (RF) and Light Gradient-Boosting Machine (LGBM), to check the importance of variables that affect classification for each model. Result: In the case of RF, the 'service' variable and in the case of LGBM, the 'dst_host_srv_count' variable were confirmed to be the most important variables. These pivotal variables serve as key factors capable of enhancing performance in the context of classification for each respective model. Conclusion: In conclusion, this paper successfully identifies the optimal models, RF and LGBM, for classifying 'R2L' and 'U2R' attacks, while elucidating the crucial variables associated with each selected model.

Intrusion Detection Approach using Feature Learning and Hierarchical Classification (특징학습과 계층분류를 이용한 침입탐지 방법 연구)

  • Han-Sung Lee;Yun-Hee Jeong;Se-Hoon Jung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.1
    • /
    • pp.249-256
    • /
    • 2024
  • Machine learning-based intrusion detection methodologies require a large amount of uniform learning data for each class to be classified, and have the problem of having to retrain the entire system when adding an attack type to be detected or classified. In this paper, we use feature learning and hierarchical classification methods to solve classification problems and data imbalance problems using relatively little training data, and propose an intrusion detection methodology that makes it easy to add new attack types. The feasibility of the proposed system was verified through experiments using KDD IDS data..

Multi-scale context fusion network for melanoma segmentation

  • Zhenhua Li;Lei Zhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.7
    • /
    • pp.1888-1906
    • /
    • 2024
  • Aiming at the problems that the edge of melanoma image is fuzzy, the contrast with the background is low, and the hair occlusion makes it difficult to segment accurately, this paper proposes a model MSCNet for melanoma segmentation based on U-net frame. Firstly, a multi-scale pyramid fusion module is designed to reconstruct the skip connection and transmit global information to the decoder. Secondly, the contextural information conduction module is innovatively added to the top of the encoder. The module provides different receptive fields for the segmented target by using the hole convolution with different expansion rates, so as to better fuse multi-scale contextural information. In addition, in order to suppress redundant information in the input image and pay more attention to melanoma feature information, global channel attention mechanism is introduced into the decoder. Finally, In order to solve the problem of lesion class imbalance, this paper uses a combined loss function. The algorithm of this paper is verified on ISIC 2017 and ISIC 2018 public datasets. The experimental results indicate that the proposed algorithm has better accuracy for melanoma segmentation compared with other CNN-based image segmentation algorithms.

A Class-C type Wideband Current-Reuse VCO With 2-Step Auto Amplitude Calibration(AAC) Loop (2 단계 자동 진폭 캘리브레이션 기법을 적용한 넓은 튜닝 범위를 갖는 클래스-C 타입 전류 재사용 전압제어발진기 설계)

  • Kim, Dongyoung;Choi, Jinwook;Lee, Dongsoo;Lee, Kang-Yoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.11
    • /
    • pp.94-100
    • /
    • 2014
  • In this paper, a design of low power Current-Reuse Voltage Controlled Oscillator (VCO) which has wide tuning range about 1.95 GHz ~ 3.15 GHz is presented. Class-C type is applied to improve phase noise and 2-Step Auto Amplitude Calibration (AAC) is used for minimizing the imbalance of differential VCO output voltage which is main issue of Current-Reuse VCO. The mismatch of differential VCO output voltage is presented about 1.5mV ~ 4.5mV. This mismatch is within 0.6 % compared with VCO output voltage. Proposed Current-Reuse VCO is designed using CMOS $0.13{\mu}m$ process. Supply voltage is 1.2 V and current consumption is 2.6 mA at center frequency. The phase noise is -116.267 dBc/Hz at 2.3GHz VCO frequency at 1MHz offset. The layout size is $720{\times}580{\mu}m^2$.

The Performance Improvement of U-Net Model for Landcover Semantic Segmentation through Data Augmentation (데이터 확장을 통한 토지피복분류 U-Net 모델의 성능 개선)

  • Baek, Won-Kyung;Lee, Moung-Jin;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_2
    • /
    • pp.1663-1676
    • /
    • 2022
  • Recently, a number of deep-learning based land cover segmentation studies have been introduced. Some studies denoted that the performance of land cover segmentation deteriorated due to insufficient training data. In this study, we verified the improvement of land cover segmentation performance through data augmentation. U-Net was implemented for the segmentation model. And 2020 satellite-derived landcover dataset was utilized for the study data. The pixel accuracies were 0.905 and 0.923 for U-Net trained by original and augmented data respectively. And the mean F1 scores of those models were 0.720 and 0.775 respectively, indicating the better performance of data augmentation. In addition, F1 scores for building, road, paddy field, upland field, forest, and unclassified area class were 0.770, 0.568, 0.433, 0.455, 0.964, and 0.830 for the U-Net trained by original data. It is verified that data augmentation is effective in that the F1 scores of every class were improved to 0.838, 0.660, 0.791, 0.530, 0.969, and 0.860 respectively. Although, we applied data augmentation without considering class balances, we find that data augmentation can mitigate biased segmentation performance caused by data imbalance problems from the comparisons between the performances of two models. It is expected that this study would help to prove the importance and effectiveness of data augmentation in various image processing fields.