• Title/Summary/Keyword: SMOTE (Synthetic Minority Oversampling TEchnique)

Search Result 15, Processing Time 0.027 seconds

Experimental Analysis of Bankruptcy Prediction with SHAP framework on Polish Companies

  • Tuguldur Enkhtuya;Dae-Ki Kang
    • International journal of advanced smart convergence
    • /
    • v.12 no.1
    • /
    • pp.53-58
    • /
    • 2023
  • With the fast development of artificial intelligence day by day, users are demanding explanations about the results of algorithms and want to know what parameters influence the results. In this paper, we propose a model for bankruptcy prediction with interpretability using the SHAP framework. SHAP (SHAPley Additive exPlanations) is framework that gives a visualized result that can be used for explanation and interpretation of machine learning models. As a result, we can describe which features are important for the result of our deep learning model. SHAP framework Force plot result gives us top features which are mainly reflecting overall model score. Even though Fully Connected Neural Networks are a "black box" model, Shapley values help us to alleviate the "black box" problem. FCNNs perform well with complex dataset with more than 60 financial ratios. Combined with SHAP framework, we create an effective model with understandable interpretation. Bankruptcy is a rare event, then we avoid imbalanced dataset problem with the help of SMOTE. SMOTE is one of the oversampling technique that resulting synthetic samples are generated for the minority class. It uses K-nearest neighbors algorithm for line connecting method in order to producing examples. We expect our model results assist financial analysts who are interested in forecasting bankruptcy prediction of companies in detail.

Failure Prognostics of Start Motor Based on Machine Learning (머신러닝을 이용한 스타트 모터의 고장예지)

  • Ko, Do-Hyun;Choi, Wook-Hyun;Choi, Seong-Dae;Hur, Jang-Wook
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.20 no.12
    • /
    • pp.85-91
    • /
    • 2021
  • In our daily life, artificial intelligence performs simple and complicated tasks like us, including operating mobile phones and working at homes and workplaces. Artificial intelligence is used in industrial technology for diagnosing various types of equipment using the machine learning technology. This study presents a fault mode effect analysis (FMEA) of start motors using machine learning and big data. Through multiple data collection, we observed that the primary failure of the start motor was caused by the melting of the magnetic switch inside the start motor causing it to fail. Long-short-term memory (LSTM) was used to diagnose the condition of the magnetic locations, and synthetic data were generated using the synthetic minority oversampling technique (SMOTE). This technique has the advantage of increasing the data accuracy. LSTM can also predict a start motor failure.

Influence of Social Capital on Depression of Older Adults Living in Rural Area: A Cross-Sectional Study Using the 2019 Korea Community Health Survey (사회자본이 농촌 거주 노인의 우울 상태에 미치는 영향: 2019년도 지역사회건강조사를 이용한 단면연구)

  • Jung, Minho;Kim, Jinhyun
    • Journal of Korean Academy of Nursing
    • /
    • v.52 no.2
    • /
    • pp.144-156
    • /
    • 2022
  • Purpose: This study aimed to investigate the influence of social capital on the depression of older adults living in rural areas. Methods: Data sets were obtained from the 2019 Korea Community Health Survey. The participants were 39,390 older adults over 65 years old living in rural areas. Indicators of social capital included trust, reciprocity, network, and social participation. Depression-the dependent variable-was measured using the Patient Health Questionnaire-9 (PHQ-9). Hierarchical ordinal logistic regression was conducted to identify factors associated with depression after adjusting the data numbers to 102,601 by applying the Synthetic Minority Oversampling Technique (SMOTE). Results: The independent variables-indicators of social capital-exhibited significant association with the depression of older adults. The odds ratios of depression were higher in groups without social capital variables. Conclusion: To reduce depression, we recommend increasing social capital. Factors identified in this study need to be considered in older adult depression intervention programs and policies.

LSTM-based fraud detection system framework using real-time data resampling techniques (실시간 리샘플링 기법을 활용한 LSTM 기반의 사기 거래 탐지 시스템)

  • Seo-Yi Kim;Yeon-Ji Lee;Il-Gu Lee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.505-508
    • /
    • 2024
  • 금융산업의 디지털 전환은 사용자에게 편리함을 제공하지만 기존에 존재하지 않던 보안상 취약점을 유발했다. 이러한 문제를 해결하기 위해 기계학습 기술을 적용한 사기 거래 탐지 시스템에 대한 연구가 활발하게 이루어지고 있다. 하지만 모델 학습 과정에서 발생하는 데이터 불균형 문제로 인해 오랜 시간이 소요되고 탐지 성능이 저하되는 문제가 있다. 본 논문에서는 실시간 데이터 오버 샘플링을 통해 이상 거래 탐지 시 데이터 불균형 문제를 해결하고 모델 학습 시간을 개선한 새로운 이상 거래 탐지 시스템(Fraud Detection System, FDS)을 제안한다. 본 논문에서 제안하는 SMOTE(Synthetic Minority Oversampling Technique)를 적용한 LSTM(Long-Short Term Memory) 알고리즘 기반의 FDS 프레임워크는 종래의 LSTM 알고리즘 기반의 FDS 모델과 비교했을 때, 데이터 사이즈가 96.5% 감소했으며, 정밀도, 재현율, F1-Score 가 34.81%, 11.14%, 22.51% 개선되었다.

Development of an Anomaly Detection Algorithm for Verification of Radionuclide Analysis Based on Artificial Intelligence in Radioactive Wastes (방사성폐기물 핵종분석 검증용 이상 탐지를 위한 인공지능 기반 알고리즘 개발)

  • Seungsoo Jang;Jang Hee Lee;Young-su Kim;Jiseok Kim;Jeen-hyeng Kwon;Song Hyun Kim
    • Journal of Radiation Industry
    • /
    • v.17 no.1
    • /
    • pp.19-32
    • /
    • 2023
  • The amount of radioactive waste is expected to dramatically increase with decommissioning of nuclear power plants such as Kori-1, the first nuclear power plant in South Korea. Accurate nuclide analysis is necessary to manage the radioactive wastes safely, but research on verification of radionuclide analysis has yet to be well established. This study aimed to develop the technology that can verify the results of radionuclide analysis based on artificial intelligence. In this study, we propose an anomaly detection algorithm for inspecting the analysis error of radionuclide. We used the data from 'Updated Scaling Factors in Low-Level Radwaste' (NP-5077) published by EPRI (Electric Power Research Institute), and resampling was performed using SMOTE (Synthetic Minority Oversampling Technique) algorithm to augment data. 149,676 augmented data with SMOTE algorithm was used to train the artificial neural networks (classification and anomaly detection networks). 324 NP-5077 report data verified the performance of networks. The anomaly detection algorithm of radionuclide analysis was divided into two modules that detect a case where radioactive waste was incorrectly classified or discriminate an abnormal data such as loss of data or incorrectly written data. The classification network was constructed using the fully connected layer, and the anomaly detection network was composed of the encoder and decoder. The latter was operated by loading the latent vector from the end layer of the classification network. This study conducted exploratory data analysis (i.e., statistics, histogram, correlation, covariance, PCA, k-mean clustering, DBSCAN). As a result of analyzing the data, it is complicated to distinguish the type of radioactive waste because data distribution overlapped each other. In spite of these complexities, our algorithm based on deep learning can distinguish abnormal data from normal data. Radionuclide analysis was verified using our anomaly detection algorithm, and meaningful results were obtained.