• Title/Summary/Keyword: class imbalanced dataset

Search Result 26, Processing Time 0.022 seconds

Ensemble Knowledge Distillation for Classification of 14 Thorax Diseases using Chest X-ray Images (흉부 X-선 영상을 이용한 14 가지 흉부 질환 분류를 위한 Ensemble Knowledge Distillation)

  • Ho, Thi Kieu Khanh;Jeon, Younghoon;Gwak, Jeonghwan
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.313-315
    • /
    • 2021
  • Timely and accurate diagnosis of lung diseases using Chest X-ray images has been gained much attention from the computer vision and medical imaging communities. Although previous studies have presented the capability of deep convolutional neural networks by achieving competitive binary classification results, their models were seemingly unreliable to effectively distinguish multiple disease groups using a large number of x-ray images. In this paper, we aim to build an advanced approach, so-called Ensemble Knowledge Distillation (EKD), to significantly boost the classification accuracies, compared to traditional KD methods by distilling knowledge from a cumbersome teacher model into an ensemble of lightweight student models with parallel branches trained with ground truth labels. Therefore, learning features at different branches of the student models could enable the network to learn diverse patterns and improve the qualify of final predictions through an ensemble learning solution. Although we observed that experiments on the well-established ChestX-ray14 dataset showed the classification improvements of traditional KD compared to the base transfer learning approach, the EKD performance would be expected to potentially enhance classification accuracy and model generalization, especially in situations of the imbalanced dataset and the interdependency of 14 weakly annotated thorax diseases.

  • PDF

Using Machine Learning Technique for Analytical Customer Loyalty

  • Mohamed M. Abbassy
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.190-198
    • /
    • 2023
  • To enhance customer satisfaction for higher profits, an e-commerce sector can establish a continuous relationship and acquire new customers. Utilize machine-learning models to analyse their customer's behavioural evidence to produce their competitive advantage to the e-commerce platform by helping to improve overall satisfaction. These models will forecast customers who will churn and churn causes. Forecasts are used to build unique business strategies and services offers. This work is intended to develop a machine-learning model that can accurately forecast retainable customers of the entire e-commerce customer data. Developing predictive models classifying different imbalanced data effectively is a major challenge in collected data and machine learning algorithms. Build a machine learning model for solving class imbalance and forecast customers. The satisfaction accuracy is used for this research as evaluation metrics. This paper aims to enable to evaluate the use of different machine learning models utilized to forecast satisfaction. For this research paper are selected three analytical methods come from various classifications of learning. Classifier Selection, the efficiency of various classifiers like Random Forest, Logistic Regression, SVM, and Gradient Boosting Algorithm. Models have been used for a dataset of 8000 records of e-commerce websites and apps. Results indicate the best accuracy in determining satisfaction class with both gradient-boosting algorithm classifications. The results showed maximum accuracy compared to other algorithms, including Gradient Boosting Algorithm, Support Vector Machine Algorithm, Random Forest Algorithm, and logistic regression Algorithm. The best model developed for this paper to forecast satisfaction customers and accuracy achieve 88 %.

Training Data Sets Construction from Large Data Set for PCB Character Recognition

  • NDAYISHIMIYE, Fabrice;Gang, Sumyung;Lee, Joon Jae
    • Journal of Multimedia Information System
    • /
    • v.6 no.4
    • /
    • pp.225-234
    • /
    • 2019
  • Deep learning has become increasingly popular in both academic and industrial areas nowadays. Various domains including pattern recognition, Computer vision have witnessed the great power of deep neural networks. However, current studies on deep learning mainly focus on quality data sets with balanced class labels, while training on bad and imbalanced data set have been providing great challenges for classification tasks. We propose in this paper a method of data analysis-based data reduction techniques for selecting good and diversity data samples from a large dataset for a deep learning model. Furthermore, data sampling techniques could be applied to decrease the large size of raw data by retrieving its useful knowledge as representatives. Therefore, instead of dealing with large size of raw data, we can use some data reduction techniques to sample data without losing important information. We group PCB characters in classes and train deep learning on the ResNet56 v2 and SENet model in order to improve the classification performance of optical character recognition (OCR) character classifier.

Data anomaly detection for structural health monitoring using a combination network of GANomaly and CNN

  • Liu, Gaoyang;Niu, Yanbo;Zhao, Weijian;Duan, Yuanfeng;Shu, Jiangpeng
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.53-62
    • /
    • 2022
  • The deployment of advanced structural health monitoring (SHM) systems in large-scale civil structures collects large amounts of data. Note that these data may contain multiple types of anomalies (e.g., missing, minor, outlier, etc.) caused by harsh environment, sensor faults, transfer omission and other factors. These anomalies seriously affect the evaluation of structural performance. Therefore, the effective analysis and mining of SHM data is an extremely important task. Inspired by the deep learning paradigm, this study develops a novel generative adversarial network (GAN) and convolutional neural network (CNN)-based data anomaly detection approach for SHM. The framework of the proposed approach includes three modules : (a) A three-channel input is established based on fast Fourier transform (FFT) and Gramian angular field (GAF) method; (b) A GANomaly is introduced and trained to extract features from normal samples alone for class-imbalanced problems; (c) Based on the output of GANomaly, a CNN is employed to distinguish the types of anomalies. In addition, a dataset-oriented method (i.e., multistage sampling) is adopted to obtain the optimal sampling ratios between all different samples. The proposed approach is tested with acceleration data from an SHM system of a long-span bridge. The results show that the proposed approach has a higher accuracy in detecting the multi-pattern anomalies of SHM data.

A Hybrid Oversampling Technique for Imbalanced Structured Data based on SMOTE and Adapted CycleGAN (불균형 정형 데이터를 위한 SMOTE와 변형 CycleGAN 기반 하이브리드 오버샘플링 기법)

  • Jung-Dam Noh;Byounggu Choi
    • Information Systems Review
    • /
    • v.24 no.4
    • /
    • pp.97-118
    • /
    • 2022
  • As generative adversarial network (GAN) based oversampling techniques have achieved impressive results in class imbalance of unstructured dataset such as image, many studies have begun to apply it to solving the problem of imbalance in structured dataset. However, these studies have failed to reflect the characteristics of structured data due to changing the data structure into an unstructured data format. In order to overcome the limitation, this study adapted CycleGAN to reflect the characteristics of structured data, and proposed hybridization of synthetic minority oversampling technique (SMOTE) and the adapted CycleGAN. In particular, this study tried to overcome the limitations of existing studies by using a one-dimensional convolutional neural network unlike previous studies that used two-dimensional convolutional neural network. Oversampling based on the method proposed have been experimented using various datasets and compared the performance of the method with existing oversampling methods such as SMOTE and adaptive synthetic sampling (ADASYN). The results indicated the proposed hybrid oversampling method showed superior performance compared to the existing methods when data have more dimensions or higher degree of imbalance. This study implied that the classification performance of oversampling structured data can be improved using the proposed hybrid oversampling method that considers the characteristic of structured data.

A Data Sampling Technique for Secure Dataset Using Weight VAE Oversampling(W-VAE) (가중치 VAE 오버샘플링(W-VAE)을 이용한 보안데이터셋 샘플링 기법 연구)

  • Kang, Hanbada;Lee, Jaewoo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.12
    • /
    • pp.1872-1879
    • /
    • 2022
  • Recently, with the development of artificial intelligence technology, research to use artificial intelligence to detect hacking attacks is being actively conducted. However, the fact that security data is a representative imbalanced data is recognized as a major obstacle in composing the learning data, which is the key to the development of artificial intelligence models. Therefore, in this paper, we propose a W-VAE oversampling technique that applies VAE, a deep learning generation model, to data extraction for oversampling, and sets the number of oversampling for each class through weight calculation using K-NN for sampling. In this paper, a total of five oversampling techniques such as ROS, SMOTE, and ADASYN were applied through NSL-KDD, an open network security dataset. The oversampling method proposed in this paper proved to be the most effective sampling method compared to the existing oversampling method through the F1-Score evaluation index.