• Title/Summary/Keyword: SHapley Additive exPlanations (SHAP) algorithm

Search Result 8, Processing Time 0.022 seconds

Investigation of characteristic values in TDR waveform using SHapley Additive exPlanations (SHAP) for dielectric constant estimation during curing time

  • Won-Taek Hong;WooJin Han;Yong-Hoon Byun;Hyung-Koo Yoon
    • Smart Structures and Systems
    • /
    • v.34 no.1
    • /
    • pp.25-32
    • /
    • 2024
  • As materials cure, the internal electrical flow changes, leading to variations in the dielectric constant over time. This study aims to assess the impact of voltage values extracted from time domain reflectometry (TDR) waveforms, measured during the curing of materials, on predicting the dielectric constant. The experiments are conducted over a curing period ranging from 60 to 8640 minutes, with 30 TDR trials. From the measured waveforms, values of V0, V1, V2, Vf, and Δt are deduced. Additionally, curing time is included as an input variable. Groups A and B are distinguished based on the presence or absence of Δt, indicating a physical relationship between Δt and the dielectric constant. The dielectric constant is set as the output variable. The SHapley Additive exPlanations (SHAP) algorithm is applied to the compiled data. The results indicate that Δt and V1 are the most influential input variables in both Group-A and Group-B. The study also presents the distribution of SHAP values and interacts SHAP values to infer the interrelationships among the input variables. To validate the reliability of these findings, the partial dependence (PD) algorithm is applied to estimate the marginal effects of each input variable, with outcomes closely aligning with those of the SHAP algorithm. This research suggests that understanding the contributions and proportional relationships of each input variable can aid in interpreting the relationships among various material properties.

Experimental Analysis of Bankruptcy Prediction with SHAP framework on Polish Companies

  • Tuguldur Enkhtuya;Dae-Ki Kang
    • International journal of advanced smart convergence
    • /
    • v.12 no.1
    • /
    • pp.53-58
    • /
    • 2023
  • With the fast development of artificial intelligence day by day, users are demanding explanations about the results of algorithms and want to know what parameters influence the results. In this paper, we propose a model for bankruptcy prediction with interpretability using the SHAP framework. SHAP (SHAPley Additive exPlanations) is framework that gives a visualized result that can be used for explanation and interpretation of machine learning models. As a result, we can describe which features are important for the result of our deep learning model. SHAP framework Force plot result gives us top features which are mainly reflecting overall model score. Even though Fully Connected Neural Networks are a "black box" model, Shapley values help us to alleviate the "black box" problem. FCNNs perform well with complex dataset with more than 60 financial ratios. Combined with SHAP framework, we create an effective model with understandable interpretation. Bankruptcy is a rare event, then we avoid imbalanced dataset problem with the help of SMOTE. SMOTE is one of the oversampling technique that resulting synthetic samples are generated for the minority class. It uses K-nearest neighbors algorithm for line connecting method in order to producing examples. We expect our model results assist financial analysts who are interested in forecasting bankruptcy prediction of companies in detail.

Hybrid machine learning with moth-flame optimization methods for strength prediction of CFDST columns under compression

  • Quang-Viet Vu;Dai-Nhan Le;Thai-Hoan Pham;Wei Gao;Sawekchai Tangaramvong
    • Steel and Composite Structures
    • /
    • v.51 no.6
    • /
    • pp.679-695
    • /
    • 2024
  • This paper presents a novel technique that combines machine learning (ML) with moth-flame optimization (MFO) methods to predict the axial compressive strength (ACS) of concrete filled double skin steel tubes (CFDST) columns. The proposed model is trained and tested with a dataset containing 125 tests of the CFDST column subjected to compressive loading. Five ML models, including extreme gradient boosting (XGBoost), gradient tree boosting (GBT), categorical gradient boosting (CAT), support vector machines (SVM), and decision tree (DT) algorithms, are utilized in this work. The MFO algorithm is applied to find optimal hyperparameters of these ML models and to determine the most effective model in predicting the ACS of CFDST columns. Predictive results given by some performance metrics reveal that the MFO-CAT model provides superior accuracy compared to other considered models. The accuracy of the MFO-CAT model is validated by comparing its predictive results with existing design codes and formulae. Moreover, the significance and contribution of each feature in the dataset are examined by employing the SHapley Additive exPlanations (SHAP) method. A comprehensive uncertainty quantification on probabilistic characteristics of the ACS of CFDST columns is conducted for the first time to examine the models' responses to variations of input variables in the stochastic environments. Finally, a web-based application is developed to predict ACS of the CFDST column, enabling rapid practical utilization without requesting any programing or machine learning expertise.

Defect Prediction and Variable Impact Analysis in CNC Machining Process (CNC 가공 공정 불량 예측 및 변수 영향력 분석)

  • Hong, Ji Soo;Jung, Young Jin;Kang, Sung Woo
    • Journal of Korean Society for Quality Management
    • /
    • v.52 no.2
    • /
    • pp.185-199
    • /
    • 2024
  • Purpose: The improvement of yield and quality in product manufacturing is crucial from the perspective of process management. Controlling key variables within the process is essential for enhancing the quality of the produced items. In this study, we aim to identify key variables influencing product defects and facilitate quality enhancement in CNC machining process using SHAP(SHapley Additive exPlanations) Methods: Firstly, we conduct model training using boosting algorithm-based models such as AdaBoost, GBM, XGBoost, LightGBM, and CatBoost. The CNC machining process data is divided into training data and test data at a ratio 9:1 for model training and test experiments. Subsequently, we select a model with excellent Accuracy and F1-score performance and apply SHAP to extract variables influencing defects in the CNC machining process. Results: By comparing the performances of different models, the selected CatBoost model demonstrated an Accuracy of 97% and an F1-score of 95%. Using Shapley Value, we extract key variables that positively of negatively impact the dependent variable(good/defective product). We identify variables with relatively low importance, suggesting variables that should be prioritized for management. Conclusion: The extraction of key variables using SHAP provides explanatory power distinct from traditional machine learning techniques. This study holds significance in identifying key variables that should be prioritized for management in CNC machining process. It is expected to contribute to enhancing the production quality of the CNC machining process.

A Study on Efficient AI Model Drift Detection Methods for MLOps (MLOps를 위한 효율적인 AI 모델 드리프트 탐지방안 연구)

  • Ye-eun Lee;Tae-jin Lee
    • Journal of Internet Computing and Services
    • /
    • v.24 no.5
    • /
    • pp.17-27
    • /
    • 2023
  • Today, as AI (Artificial Intelligence) technology develops and its practicality increases, it is widely used in various application fields in real life. At this time, the AI model is basically learned based on various statistical properties of the learning data and then distributed to the system, but unexpected changes in the data in a rapidly changing data situation cause a decrease in the model's performance. In particular, as it becomes important to find drift signals of deployed models in order to respond to new and unknown attacks that are constantly created in the security field, the need for lifecycle management of the entire model is gradually emerging. In general, it can be detected through performance changes in the model's accuracy and error rate (loss), but there are limitations in the usage environment in that an actual label for the model prediction result is required, and the detection of the point where the actual drift occurs is uncertain. there is. This is because the model's error rate is greatly influenced by various external environmental factors, model selection and parameter settings, and new input data, so it is necessary to precisely determine when actual drift in the data occurs based only on the corresponding value. There are limits to this. Therefore, this paper proposes a method to detect when actual drift occurs through an Anomaly analysis technique based on XAI (eXplainable Artificial Intelligence). As a result of testing a classification model that detects DGA (Domain Generation Algorithm), anomaly scores were extracted through the SHAP(Shapley Additive exPlanations) Value of the data after distribution, and as a result, it was confirmed that efficient drift point detection was possible.

Machine learning-based probabilistic predictions of shear resistance of welded studs in deck slab ribs transverse to beams

  • Vitaliy V. Degtyarev;Stephen J. Hicks
    • Steel and Composite Structures
    • /
    • v.49 no.1
    • /
    • pp.109-123
    • /
    • 2023
  • Headed studs welded to steel beams and embedded within the concrete of deck slabs are vital components of modern composite floor systems, where safety and economy depend on the accurate predictions of the stud shear resistance. The multitude of existing deck profiles and the complex behavior of studs in deck slab ribs makes developing accurate and reliable mechanical or empirical design models challenging. The paper addresses this issue by presenting a machine learning (ML) model developed from the natural gradient boosting (NGBoost) algorithm capable of producing probabilistic predictions and a database of 464 push-out tests, which is considerably larger than the databases used for developing existing design models. The proposed model outperforms models based on other ML algorithms and existing descriptive equations, including those in EC4 and AISC 360, while offering probabilistic predictions unavailable from other models and producing higher shear resistances for many cases. The present study also showed that the stud shear resistance is insensitive to the concrete elastic modulus, stud welding type, location of slab reinforcement, and other parameters considered important by existing models. The NGBoost model was interpreted by evaluating the feature importance and dependence determined with the SHapley Additive exPlanations (SHAP) method. The model was calibrated via reliability analyses in accordance with the Eurocodes to ensure that its predictions meet the required reliability level and facilitate its use in design. An interactive open-source web application was created and deployed to the cloud to allow for convenient and rapid stud shear resistance predictions with the developed model.

Data-centric XAI-driven Data Imputation of Molecular Structure and QSAR Model for Toxicity Prediction of 3D Printing Chemicals (3D 프린팅 소재 화학물질의 독성 예측을 위한 Data-centric XAI 기반 분자 구조 Data Imputation과 QSAR 모델 개발)

  • ChanHyeok Jeong;SangYoun Kim;SungKu Heo;Shahzeb Tariq;MinHyeok Shin;ChangKyoo Yoo
    • Korean Chemical Engineering Research
    • /
    • v.61 no.4
    • /
    • pp.523-541
    • /
    • 2023
  • As accessibility to 3D printers increases, there is a growing frequency of exposure to chemicals associated with 3D printing. However, research on the toxicity and harmfulness of chemicals generated by 3D printing is insufficient, and the performance of toxicity prediction using in silico techniques is limited due to missing molecular structure data. In this study, quantitative structure-activity relationship (QSAR) model based on data-centric AI approach was developed to predict the toxicity of new 3D printing materials by imputing missing values in molecular descriptors. First, MissForest algorithm was utilized to impute missing values in molecular descriptors of hazardous 3D printing materials. Then, based on four different machine learning models (decision tree, random forest, XGBoost, SVM), a machine learning (ML)-based QSAR model was developed to predict the bioconcentration factor (Log BCF), octanol-air partition coefficient (Log Koa), and partition coefficient (Log P). Furthermore, the reliability of the data-centric QSAR model was validated through the Tree-SHAP (SHapley Additive exPlanations) method, which is one of explainable artificial intelligence (XAI) techniques. The proposed imputation method based on the MissForest enlarged approximately 2.5 times more molecular structure data compared to the existing data. Based on the imputed dataset of molecular descriptor, the developed data-centric QSAR model achieved approximately 73%, 76% and 92% of prediction performance for Log BCF, Log Koa, and Log P, respectively. Lastly, Tree-SHAP analysis demonstrated that the data-centric-based QSAR model achieved high prediction performance for toxicity information by identifying key molecular descriptors highly correlated with toxicity indices. Therefore, the proposed QSAR model based on the data-centric XAI approach can be extended to predict the toxicity of potential pollutants in emerging printing chemicals, chemical process, semiconductor or display process.

Retrieval of Hourly Aerosol Optical Depth Using Top-of-Atmosphere Reflectance from GOCI-II and Machine Learning over South Korea (GOCI-II 대기상한 반사도와 기계학습을 이용한 남한 지역 시간별 에어로졸 광학 두께 산출)

  • Seyoung Yang;Hyunyoung Choi;Jungho Im
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.933-948
    • /
    • 2023
  • Atmospheric aerosols not only have adverse effects on human health but also exert direct and indirect impacts on the climate system. Consequently, it is imperative to comprehend the characteristics and spatiotemporal distribution of aerosols. Numerous research endeavors have been undertaken to monitor aerosols, predominantly through the retrieval of aerosol optical depth (AOD) via satellite-based observations. Nonetheless, this approach primarily relies on a look-up table-based inversion algorithm, characterized by computationally intensive operations and associated uncertainties. In this study, a novel high-resolution AOD direct retrieval algorithm, leveraging machine learning, was developed using top-of-atmosphere reflectance data derived from the Geostationary Ocean Color Imager-II (GOCI-II), in conjunction with their differences from the past 30-day minimum reflectance, and meteorological variables from numerical models. The Light Gradient Boosting Machine (LGBM) technique was harnessed, and the resultant estimates underwent rigorous validation encompassing random, temporal, and spatial N-fold cross-validation (CV) using ground-based observation data from Aerosol Robotic Network (AERONET) AOD. The three CV results consistently demonstrated robust performance, yielding R2=0.70-0.80, RMSE=0.08-0.09, and within the expected error (EE) of 75.2-85.1%. The Shapley Additive exPlanations(SHAP) analysis confirmed the substantial influence of reflectance-related variables on AOD estimation. A comprehensive examination of the spatiotemporal distribution of AOD in Seoul and Ulsan revealed that the developed LGBM model yielded results that are in close concordance with AERONET AOD over time, thereby confirming its suitability for AOD retrieval at high spatiotemporal resolution (i.e., hourly, 250 m). Furthermore, upon comparing data coverage, it was ascertained that the LGBM model enhanced data retrieval frequency by approximately 8.8% in comparison to the GOCI-II L2 AOD products, ameliorating issues associated with excessive masking over very illuminated surfaces that are often encountered in physics-based AOD retrieval processes.