• Title/Summary/Keyword: Defect prediction models

Search Result 27, Processing Time 0.023 seconds

Ambiguity Analysis of Defectiveness in NASA MDP Data Sets (NASA MDP 데이터 집합의 결함도 모호성 분석)

  • Hong, Euyseok
    • Journal of Information Technology Services
    • /
    • v.12 no.2
    • /
    • pp.361-371
    • /
    • 2013
  • Public domain defect data sets, such as NASA data sets which are available from the NASA MDP and PROMISE repositories, make it possible to compare the results of different defect prediction models by using the same data sets. This means that repeatable and general prediction models can be built. However, some recent studies have raised questions about the quality of two versions of NASA data set, and made new cleaned data sets by applying their data cleaning processes. We find that there are two ways in the NASA MDP versions to determine the defectiveness of a module, 0 or 1, and the two results are different in some cases. This serious problem, to our knowledge, has not been addressed in previous studies. To handle this ambiguity problem, we define two kinds of module defectiveness and two conditions that can be used to determine the ambiguous cases. We meticulously analyze 5 projects among the 13 NASA projects by using our ambiguity analysis method. The results show that JM1 and PC4 are the best projects with few ambiguous cases.

Effect of Boundary Conditions of Failure Pressure Models on Reliability Estimation of Buried Pipelines

  • Lee, Ouk-Sub;Pyun, Jang-Sik;Kim, Dong-Hyeok
    • International Journal of Precision Engineering and Manufacturing
    • /
    • v.4 no.6
    • /
    • pp.12-19
    • /
    • 2003
  • This paper presents the effect of boundary conditions in various failure pressure models published for the estimation of failure pressure. Furthermore, this approach is extended to the failure prediction with the aid of a failure probability model. The first order Taylor series expansion of the limit state function is used in order to estimate the probability of failure associated with each corrosion defect in buried pipelines for long exposure period with unit of years. A failure probability model based on the von-Mises failure criterion is adapted. The log-normal and standard normal probability functions for varying random variables are adapted. The effects of random variables such as defect depth, pipe diameter, defect length, fluid pressure, corrosion rate, material yield stress, material ultimate tensile strength and pipe thickness on the failure probability of the buried pipelines are systematically investigated for the corrosion pipeline by using an adapted failure probability model and varying failure pressure model.

Analyzing Machine Learning Techniques for Fault Prediction Using Web Applications

  • Malhotra, Ruchika;Sharma, Anjali
    • Journal of Information Processing Systems
    • /
    • v.14 no.3
    • /
    • pp.751-770
    • /
    • 2018
  • Web applications are indispensable in the software industry and continuously evolve either meeting a newer criteria and/or including new functionalities. However, despite assuring quality via testing, what hinders a straightforward development is the presence of defects. Several factors contribute to defects and are often minimized at high expense in terms of man-hours. Thus, detection of fault proneness in early phases of software development is important. Therefore, a fault prediction model for identifying fault-prone classes in a web application is highly desired. In this work, we compare 14 machine learning techniques to analyse the relationship between object oriented metrics and fault prediction in web applications. The study is carried out using various releases of Apache Click and Apache Rave datasets. En-route to the predictive analysis, the input basis set for each release is first optimized using filter based correlation feature selection (CFS) method. It is found that the LCOM3, WMC, NPM and DAM metrics are the most significant predictors. The statistical analysis of these metrics also finds good conformity with the CFS evaluation and affirms the role of these metrics in the defect prediction of web applications. The overall predictive ability of different fault prediction models is first ranked using Friedman technique and then statistically compared using Nemenyi post-hoc analysis. The results not only upholds the predictive capability of machine learning models for faulty classes using web applications, but also finds that ensemble algorithms are most appropriate for defect prediction in Apache datasets. Further, we also derive a consensus between the metrics selected by the CFS technique and the statistical analysis of the datasets.

Prediction Model of CNC Processing Defects Using Machine Learning (머신러닝을 이용한 CNC 가공 불량 발생 예측 모델)

  • Han, Yong Hee
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.2
    • /
    • pp.249-255
    • /
    • 2022
  • This study proposed an analysis framework for real-time prediction of CNC processing defects using machine learning-based models that are recently attracting attention as processing defect prediction methods, and applied it to CNC machines. Analysis shows that the XGBoost, CatBoost, and LightGBM models have the same best accuracy, precision, recall, F1 score, and AUC, of which the LightGBM model took the shortest execution time. This short run time has practical advantages such as reducing actual system deployment costs, reducing the probability of CNC machine damage due to rapid prediction of defects, and increasing overall CNC machine utilization, confirming that the LightGBM model is the most effective machine learning model for CNC machines with only basic sensors installed. In addition, it was confirmed that classification performance was maximized when an ensemble model consisting of LightGBM, ExtraTrees, k-Nearest Neighbors, and logistic regression models was applied in situations where there are no restrictions on execution time and computing power.

Automated condition assessment of concrete bridges with digital imaging

  • Adhikari, Ram S.;Bagchi, Ashutosh;Moselhi, Osama
    • Smart Structures and Systems
    • /
    • v.13 no.6
    • /
    • pp.901-925
    • /
    • 2014
  • The reliability of a Bridge management System depends on the quality of visual inspection and the reliable estimation of bridge condition rating. However, the current practices of visual inspection have been identified with several limitations, such as: they are time-consuming, provide incomplete information, and their reliance on inspectors' experience. To overcome such limitations, this paper presents an approach of automating the prediction of condition rating for bridges based on digital image analysis. The proposed methodology encompasses image acquisition, development of 3D visualization model, image processing, and condition rating model. Under this method, scaling defect in concrete bridge components is considered as a candidate defect and the guidelines in the Ontario Structure Inspection Manual (OSIM) have been adopted for developing and testing the proposed method. The automated algorithms for scaling depth prediction and mapping of condition ratings are based on training of back propagation neural networks. The result of developed models showed better prediction capability of condition rating over the existing methods such as, Naïve Bayes Classifiers and Bagged Decision Tree.

Defect Prediction Using Machine Learning Algorithm in Semiconductor Test Process (기계학습 알고리즘을 이용한 반도체 테스트공정의 불량 예측)

  • Jang, Suyeol;Jo, Mansik;Cho, Seulki;Moon, Byungmoo
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.31 no.7
    • /
    • pp.450-454
    • /
    • 2018
  • Because of the rapidly changing environment and high uncertainties, the semiconductor industry is in need of appropriate forecasting technology. In particular, both the cost and time in the test process are increasing because the process becomes complicated and there are more factors to consider. In this paper, we propose a prediction model that predicts a final "good" or "bad" on the basis of preconditioning test data generated in the semiconductor test process. The proposed prediction model solves the classification and regression problems that are often dealt with in the semiconductor process and constructs a reliable prediction model. We also implemented a prediction model through various machine learning algorithms. We compared the performance of the prediction models constructed through each algorithm. Actual data of the semiconductor test process was used for accurate prediction model construction and effective test verification.

Severity-based Fault Prediction using Unsupervised Learning (비감독형 학습 기법을 사용한 심각도 기반 결함 예측)

  • Hong, Euyseok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.3
    • /
    • pp.151-157
    • /
    • 2018
  • Most previous studies of software fault prediction have focused on supervised learning models for binary classification that determines whether an input module has faults or not. However, binary classification model determines only the presence or absence of faults in the module without considering the complex characteristics of the fault, and supervised model has the limitation that it requires a training data set that most development groups do not have. To solve these two problems, this paper proposes severity-based ternary classification model using unsupervised learning algorithms, and experimental results show that the proposed model has comparable performance to the supervised models.

Bayesian Network-based Probabilistic Management of Software Metrics for Refactoring (리팩토링을 위한 소프트웨어 메트릭의 베이지안 네트워크 기반 확률적 관리)

  • Choi, Seunghee;Lee, Goo Yeon
    • Journal of KIISE
    • /
    • v.43 no.12
    • /
    • pp.1334-1341
    • /
    • 2016
  • In recent years, the importance of managing software defects in the implementation stage has emerged because of the rapid development and wide-range usage of intelligent smart devices. Even if not a few studies have been conducted on the prediction models for software defects, their outcomes have not been widely shared. This paper proposes an efficient probabilistic management model of software metrics based on the Bayesian network, to overcome limits such as binary defect prediction models. We expect the proposed model to configure the Bayesian network by taking advantage of various software metrics, which can help in identifying improvements for refactoring. Once the source code has improved through code refactoring, the measured related metric values will also change. The proposed model presents probability values reflecting the effects after defect removal, which can be achieved by improving metrics through refactoring. This model could cope with the conclusive binary predictions, and consequently secure flexibilities on decision making, using indeterminate probability values.

Defect Prediction and Variable Impact Analysis in CNC Machining Process (CNC 가공 공정 불량 예측 및 변수 영향력 분석)

  • Hong, Ji Soo;Jung, Young Jin;Kang, Sung Woo
    • Journal of Korean Society for Quality Management
    • /
    • v.52 no.2
    • /
    • pp.185-199
    • /
    • 2024
  • Purpose: The improvement of yield and quality in product manufacturing is crucial from the perspective of process management. Controlling key variables within the process is essential for enhancing the quality of the produced items. In this study, we aim to identify key variables influencing product defects and facilitate quality enhancement in CNC machining process using SHAP(SHapley Additive exPlanations) Methods: Firstly, we conduct model training using boosting algorithm-based models such as AdaBoost, GBM, XGBoost, LightGBM, and CatBoost. The CNC machining process data is divided into training data and test data at a ratio 9:1 for model training and test experiments. Subsequently, we select a model with excellent Accuracy and F1-score performance and apply SHAP to extract variables influencing defects in the CNC machining process. Results: By comparing the performances of different models, the selected CatBoost model demonstrated an Accuracy of 97% and an F1-score of 95%. Using Shapley Value, we extract key variables that positively of negatively impact the dependent variable(good/defective product). We identify variables with relatively low importance, suggesting variables that should be prioritized for management. Conclusion: The extraction of key variables using SHAP provides explanatory power distinct from traditional machine learning techniques. This study holds significance in identifying key variables that should be prioritized for management in CNC machining process. It is expected to contribute to enhancing the production quality of the CNC machining process.

UNCERTAINTY ANALYSIS OF DATA-BASED MODELS FOR ESTIMATING COLLAPSE MOMENTS OF WALL-THINNED PIPE BENDS AND ELBOWS

  • Kim, Dong-Su;Kim, Ju-Hyun;Na, Man-Gyun;Kim, Jin-Weon
    • Nuclear Engineering and Technology
    • /
    • v.44 no.3
    • /
    • pp.323-330
    • /
    • 2012
  • The development of data-based models requires uncertainty analysis to explain the accuracy of their predictions. In this paper, an uncertainty analysis of the support vector regression (SVR) model, which is a data-based model, was performed because previous research showed that the SVR method accurately estimates the collapse moments of wall-thinned pipe bends and elbows. The uncertainty analysis method used in this study was an analytic uncertainty analysis method, and estimates with a 95% confidence interval were obtained for 370 test data points. From the results, the prediction interval (PI) was very narrow, which means that the predicted values are quite accurate. Therefore, the proposed SVR method can be used effectively to assess and validate the integrity of the wall-thinned pipe bends and elbows.