• Title/Summary/Keyword: k-fold cross validation

Search Result 154, Processing Time 0.022 seconds

Rubber O-ring defect detection system using K-fold cross validation and support vector machine (K-겹 교차 검증과 서포트 벡터 머신을 이용한 고무 오링결함 검출 시스템)

  • Lee, Yong Eun;Choi, Nak Joon;Byun, Young Hoo;Kim, Dae Won;Kim, Kyung Chun
    • Journal of the Korean Society of Visualization
    • /
    • v.19 no.1
    • /
    • pp.68-73
    • /
    • 2021
  • In this study, the detection of rubber o-ring defects was carried out using k-fold cross validation and Support Vector Machine (SVM) algorithm. The data process was carried out in 3 steps. First, we proceeded with a frame alignment to eliminate unnecessary regions in the learning and secondly, we applied gray-scale changes for computational reduction. Finally, data processing was carried out using image augmentation to prevent data overfitting. After processing data, SVM algorithm was used to obtain normal and defect detection accuracy. In addition, we applied the SVM algorithm through the k-fold cross validation method to compare the classification accuracy. As a result, we obtain results that show better performance by applying the k-fold cross validation method.

Application of Time-series Cross Validation in Hyperparameter Tuning of a Predictive Model for 2,3-BDO Distillation Process (시계열 교차검증을 적용한 2,3-BDO 분리공정 온도예측 모델의 초매개변수 최적화)

  • An, Nahyeon;Choi, Yeongryeol;Cho, Hyungtae;Kim, Junghwan
    • Korean Chemical Engineering Research
    • /
    • v.59 no.4
    • /
    • pp.532-541
    • /
    • 2021
  • Recently, research on the application of artificial intelligence in the chemical process has been increasing rapidly. However, overfitting is a significant problem that prevents the model from being generalized well to predict unseen data on test data, as well as observed training data. Cross validation is one of the ways to solve the overfitting problem. In this study, the time-series cross validation method was applied to optimize the number of batch and epoch in the hyperparameters of the prediction model for the 2,3-BDO distillation process, and it compared with K-fold cross validation generally used. As a result, the RMSE of the model with time-series cross validation was lower by 9.06%, and the MAPE was higher by 0.61% than the model with K-fold cross validation. Also, the calculation time was 198.29 sec less than the K-fold cross validation method.

Developing a Molecular Prognostic Predictor of a Cancer based on a Small Sample

  • Kim Inyoung;Lee Sunho;Rha Sun Young;Kim Byungsoo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.195-198
    • /
    • 2004
  • One Important problem in a cancer microarray study is to identify a set of genes from which a molecular prognostic indicator can be developed. In parallel with this problem is to validate the chosen set of genes. We develop in this note a K-fold cross validation procedure by combining a 'pre-validation' technique and a bootstrap resampling procedure in the Cox regression . The pre-validation technique predicts the microarray predictor of a case without having seen the true class level of the case. It was suggested by Tibshirani and Efron (2002) to avoid the possible over-fitting in the regression in which a microarray based predictor is employed. The bootstrap resampling procedure for the Cox regression was proposed by Sauerbrei and Schumacher (1992) as a means of overcoming the instability of a stepwise selection procedure. We apply this K-fold cross validation to the microarray data of 92 gastric cancers of which the experiment was conducted at Cancer Metastasis Research Center, Yonsei University. We also share some of our experience on the 'false positive' result due to the information leak.

  • PDF

Region of Interest (ROI) Selection of Land Cover Using SVM Cross Validation (SVM 교차검증을 활용한 토지피복 ROI 선정)

  • Jeong, Jong-Chul;Youn, Hyoung-Jin
    • Journal of Cadastre & Land InformatiX
    • /
    • v.50 no.1
    • /
    • pp.75-85
    • /
    • 2020
  • This study examines machine learning cross-validation to utilized create ROI for classification of land cover. The study area located in Sejong and one KOMPSAT-3A image was used in this analysis: procedure on October 28, 2019. We used four bands(Red, Green, Blue, Near infra-red) for learning cross validation process. In this study, we used K-fold method in cross validation and used SVM kernel type with cross validation result. In addition, we used 4 kernels of SVM(Linear, Polynomial, RBF, Sigmoid) for supervised classification land cover map using extracted ROI. During the cross validation process, 1,813 data extracted from 3,500 data, and the most of the building, road and grass class data were removed about 60% during cross validation process. Based on this, the supervised SVM linear technique showed the highest classification accuracy of 91.77% compared to other kernel methods. The grass' producer accuracy showed 79.43% and identified a large mis-classification in forests. Depending on the results of the study, extraction ROI using cross validation may be effective in forest, water and agriculture areas, but it is deemed necessary to improve the distinction of built-up, grass and bare-soil area.

Threatening privacy by identifying appliances and the pattern of the usage from electric signal data (스마트 기기 환경에서 전력 신호 분석을 통한 프라이버시 침해 위협)

  • Cho, Jae yeon;Yoon, Ji Won
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.5
    • /
    • pp.1001-1009
    • /
    • 2015
  • In Smart Grid, smart meter sends our electric signal data to the main server of power supply in real-time. However, the more efficient the management of power loads become, the more likely the user's pattern of usage leaks. This paper points out the threat of privacy and the need of security measures in smart device environment by showing that it's possible to identify the appliances and the specific usage patterns of users from the smart meter's data. Learning algorithm PCA is used to reduce the dimension of the feature space and k-NN Classifier to infer appliances and states of them. Accuracy is validated with 10-fold Cross Validation.

A Study on Deriving the Statistical Weight Estimation Formula for an Aircraft Wing (항공기 날개의 통계적 중량 예측식 도출 연구)

  • Kim, Seok-Beom;Jeong, Han-Gyu;Hwang, Ho-Yon
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.46 no.1
    • /
    • pp.32-40
    • /
    • 2018
  • In this research, a method of deriving statistical weight prediction formula which is used during the conceptual design phase was studied and it was programmed using Microsoft Excel and verified by applying to jet transport aircraft. The database was built while referencing the variables of conventional wing weight estimation formulas and it was used for modeling the jet transport wing weight regression equation. The model was evaluated using the K-fold cross validation method to solve the overfitting problem of the model.

Applicability study on urban flooding risk criteria estimation algorithm using cross-validation and SVM (교차검증과 SVM을 이용한 도시침수 위험기준 추정 알고리즘 적용성 검토)

  • Lee, Hanseung;Cho, Jaewoong;Kang, Hoseon;Hwang, Jeonggeun
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.12
    • /
    • pp.963-973
    • /
    • 2019
  • This study reviews a urban flooding risk criteria estimation model to predict risk criteria in areas where flood risk criteria are not precalculated by using watershed characteristic data and limit rainfall based on damage history. The risk criteria estimation model was designed using Support Vector Machine, one of the machine learning algorithms. The learning data consisted of regional limit rainfall and watershed characteristic. The learning data were applied to the SVM algorithm after normalization. We calculated the mean absolute error and standard deviation using Leave-One-Out and K-fold cross-validation algorithms and evaluated the performance of the model. In Leave-One-Out, models with small standard deviation were selected as the optimal model, and models with less folds were selected in the K-fold. The average accuracy of the selected models by rainfall duration is over 80%, suggesting that SVM can be used to estimate flooding risk criteria.

Penalized logistic regression models for determining the discharge of dyspnea patients (호흡곤란 환자 퇴원 결정을 위한 벌점 로지스틱 회귀모형)

  • Park, Cheolyong;Kye, Myo Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.125-133
    • /
    • 2013
  • In this paper, penalized binary logistic regression models are employed as statistical models for determining the discharge of 668 patients with a chief complaint of dyspnea based on 11 blood tests results. Specifically, the ridge model based on $L^2$ penalty and the Lasso model based on $L^1$ penalty are considered in this paper. In the comparison of prediction accuracy, our models are compared with the logistic regression models with all 11 explanatory variables and the selected variables by variable selection method. The results show that the prediction accuracy of the ridge logistic regression model is the best among 4 models based on 10-fold cross-validation.

Machine Learning Based Hybrid Approach to Detect Intrusion in Cyber Communication

  • Neha Pathak;Bobby Sharma
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.11
    • /
    • pp.190-194
    • /
    • 2023
  • By looking the importance of communication, data delivery and access in various sectors including governmental, business and individual for any kind of data, it becomes mandatory to identify faults and flaws during cyber communication. To protect personal, governmental and business data from being misused from numerous advanced attacks, there is the need of cyber security. The information security provides massive protection to both the host machine as well as network. The learning methods are used for analyzing as well as preventing various attacks. Machine learning is one of the branch of Artificial Intelligence that plays a potential learning techniques to detect the cyber-attacks. In the proposed methodology, the Decision Tree (DT) which is also a kind of supervised learning model, is combined with the different cross-validation method to determine the accuracy and the execution time to identify the cyber-attacks from a very recent dataset of different network attack activities of network traffic in the UNSW-NB15 dataset. It is a hybrid method in which different types of attributes including Gini Index and Entropy of DT model has been implemented separately to identify the most accurate procedure to detect intrusion with respect to the execution time. The different DT methodologies including DT using Gini Index, DT using train-split method and DT using information entropy along with their respective subdivision such as using K-Fold validation, using Stratified K-Fold validation are implemented.

Prediction of concrete compressive strength using non-destructive test results

  • Erdal, Hamit;Erdal, Mursel;Simsek, Osman;Erdal, Halil Ibrahim
    • Computers and Concrete
    • /
    • v.21 no.4
    • /
    • pp.407-417
    • /
    • 2018
  • Concrete which is a composite material is one of the most important construction materials. Compressive strength is a commonly used parameter for the assessment of concrete quality. Accurate prediction of concrete compressive strength is an important issue. In this study, we utilized an experimental procedure for the assessment of concrete quality. Firstly, the concrete mix was prepared according to C 20 type concrete, and slump of fresh concrete was about 20 cm. After the placement of fresh concrete to formworks, compaction was achieved using a vibrating screed. After 28 day period, a total of 100 core samples having 75 mm diameter were extracted. On the core samples pulse velocity determination tests and compressive strength tests were performed. Besides, Windsor probe penetration tests and Schmidt hammer tests were also performed. After setting up the data set, twelve artificial intelligence (AI) models compared for predicting the concrete compressive strength. These models can be divided into three categories (i) Functions (i.e., Linear Regression, Simple Linear Regression, Multilayer Perceptron, Support Vector Regression), (ii) Lazy-Learning Algorithms (i.e., IBk Linear NN Search, KStar, Locally Weighted Learning) (iii) Tree-Based Learning Algorithms (i.e., Decision Stump, Model Trees Regression, Random Forest, Random Tree, Reduced Error Pruning Tree). Four evaluation processes, four validation implements (i.e., 10-fold cross validation, 5-fold cross validation, 10% split sample validation & 20% split sample validation) are used to examine the performance of predictive models. This study shows that machine learning regression techniques are promising tools for predicting compressive strength of concrete.