• Title/Summary/Keyword: 10-fold cross-validation

Search Result 213, Processing Time 0.021 seconds

Clinical significance of APOB inactivation in hepatocellular carcinoma

  • Lee, Gena;Jeong, Yun Seong;Kim, Do Won;Kwak, Min Jun;Koh, Jiwon;Joo, Eun Wook;Lee, Ju-Seog;Kah, Susie;Sim, Yeong-Eun;Yim, Sun Young
    • Experimental and Molecular Medicine
    • /
    • v.50 no.11
    • /
    • pp.7.1-7.12
    • /
    • 2018
  • Recent findings from The Cancer Genome Atlas project have provided a comprehensive map of genomic alterations that occur in hepatocellular carcinoma (HCC), including unexpected mutations in apolipoprotein B (APOB). We aimed to determine the clinical significance of this non-oncogenetic mutation in HCC. An Apob gene signature was derived from genes that differed between control mice and mice treated with siRNA specific for Apob (1.5-fold difference; P < 0.005). Human gene expression data were collected from four independent HCC cohorts (n = 941). A prediction model was constructed using Bayesian compound covariate prediction, and the robustness of the APOB gene signature was validated in HCC cohorts. The correlation of the APOB signature with previously validated gene signatures was performed, and network analysis was conducted using ingenuity pathway analysis. APOB inactivation was associated with poor prognosis when the APOB gene signature was applied in all human HCC cohorts. Poor prognosis with APOB inactivation was consistently observed through cross-validation with previously reported gene signatures (NCIP A, HS, high-recurrence SNUR, and high RS subtypes). Knowledge-based gene network analysis using genes that differed between low-APOB and high-APOB groups in all four cohorts revealed that low-APOB activity was associated with upregulation of oncogenic and metastatic regulators, such as HGF, MTIF, ERBB2, FOXM1, and CD44, and inhibition of tumor suppressors, such as TP53 and PTEN. In conclusion, APOB inactivation is associated with poor outcome in patients with HCC, and APOB may play a role in regulating multiple genes involved in HCC development.

Analysis of Occupational Injury and Feature Importance of Fall Accidents on the Construction Sites using Adaboost (에이다 부스트를 활용한 건설현장 추락재해의 강도 예측과 영향요인 분석)

  • Choi, Jaehyun;Ryu, HanGuk
    • Journal of the Architectural Institute of Korea Structure & Construction
    • /
    • v.35 no.11
    • /
    • pp.155-162
    • /
    • 2019
  • The construction industry is the highest safety accident causing industry as 28.55% portion of all industries' accidents in Korea. In particular, falling is the highest accidents type composed of 60.16% among the construction field accidents. Therefore, we analyzed the factors of major disaster affecting the fall accident and then derived feature importances by considering various variables. We used data collected from Korea Occupational Safety & Health Agency (KOSHA) for learning and predicting in the proposed model. We have an effort to predict the degree of occupational fall accidents by using the machine learning model, i.e., Adaboost, short for Adaptive Boosting. Adaboost is a machine learning meta-algorithm which can be used in conjunction with many other types of learning algorithms to improve performance. Decision trees were combined with AdaBoost in this model to predict and classify the degree of occupational fall accidents. HyOperpt was also used to optimize hyperparameters and to combine k-fold cross validation by hierarchy. We extracted and analyzed feature importances and affecting fall disaster by permutation technique. In this study, we verified the degree of fall accidents with predictive accuracy. The machine learning model was also confirmed to be applicable to the safety accident analysis in construction site. In the future, if the safety accident data is accumulated automatically in the network system using IoT(Internet of things) technology in real time in the construction site, it will be possible to analyze the factors and types of accidents according to the site conditions from the real time data.

Feature selection and prediction modeling of drug responsiveness in Pharmacogenomics (약물유전체학에서 약물반응 예측모형과 변수선택 방법)

  • Kim, Kyuhwan;Kim, Wonkuk
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.153-166
    • /
    • 2021
  • A main goal of pharmacogenomics studies is to predict individual's drug responsiveness based on high dimensional genetic variables. Due to a large number of variables, feature selection is required in order to reduce the number of variables. The selected features are used to construct a predictive model using machine learning algorithms. In the present study, we applied several hybrid feature selection methods such as combinations of logistic regression, ReliefF, TurF, random forest, and LASSO to a next generation sequencing data set of 400 epilepsy patients. We then applied the selected features to machine learning methods including random forest, gradient boosting, and support vector machine as well as a stacking ensemble method. Our results showed that the stacking model with a hybrid feature selection of random forest and ReliefF performs better than with other combinations of approaches. Based on a 5-fold cross validation partition, the mean test accuracy value of the best model was 0.727 and the mean test AUC value of the best model was 0.761. It also appeared that the stacking models outperform than single machine learning predictive models when using the same selected features.

Quality Prediction Model for Manufacturing Process of Free-Machining 303-series Stainless Steel Small Rolling Wire Rods (쾌삭 303계 스테인리스강 소형 압연 선재 제조 공정의 생산품질 예측 모형)

  • Seo, Seokjun;Kim, Heungseob
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.12-22
    • /
    • 2021
  • This article suggests the machine learning model, i.e., classifier, for predicting the production quality of free-machining 303-series stainless steel(STS303) small rolling wire rods according to the operating condition of the manufacturing process. For the development of the classifier, manufacturing data for 37 operating variables were collected from the manufacturing execution system(MES) of Company S, and the 12 types of derived variables were generated based on literature review and interviews with field experts. This research was performed with data preprocessing, exploratory data analysis, feature selection, machine learning modeling, and the evaluation of alternative models. In the preprocessing stage, missing values and outliers are removed, and oversampling using SMOTE(Synthetic oversampling technique) to resolve data imbalance. Features are selected by variable importance of LASSO(Least absolute shrinkage and selection operator) regression, extreme gradient boosting(XGBoost), and random forest models. Finally, logistic regression, support vector machine(SVM), random forest, and XGBoost are developed as a classifier to predict the adequate or defective products with new operating conditions. The optimal hyper-parameters for each model are investigated by the grid search and random search methods based on k-fold cross-validation. As a result of the experiment, XGBoost showed relatively high predictive performance compared to other models with an accuracy of 0.9929, specificity of 0.9372, F1-score of 0.9963, and logarithmic loss of 0.0209. The classifier developed in this study is expected to improve productivity by enabling effective management of the manufacturing process for the STS303 small rolling wire rods.

Reliability-based combined high and low cycle fatigue analysis of turbine blade using adaptive least squares support vector machines

  • Ma, Juan;Yue, Peng;Du, Wenyi;Dai, Changping;Wriggers, Peter
    • Structural Engineering and Mechanics
    • /
    • v.83 no.3
    • /
    • pp.293-304
    • /
    • 2022
  • In this work, a novel reliability approach for combined high and low cycle fatigue (CCF) estimation is developed by combining active learning strategy with least squares support vector machines (LS-SVM) (named as ALS-SVM) surrogate model to address the multi-resources uncertainties, including working loads, material properties and model itself. Initially, a new active learner function combining LS-SVM approach with Monte Carlo simulation (MCS) is presented to improve computational efficiency with fewer calls to the performance function. To consider the uncertainty of surrogate model at candidate sample points, the learning function employs k-fold cross validation method and introduces the predicted variance to sequentially select sampling. Following that, low cycle fatigue (LCF) loads and high cycle fatigue (HCF) loads are firstly estimated based on the training samples extracted from finite element (FE) simulations, and their simulated responses together with the sample points of model parameters in Coffin-Manson formula are selected as the MC samples to establish ALS-SVM model. In this analysis, the MC samples are substituted to predict the CCF reliability of turbine blades by using the built ALS-SVM model. Through the comparison of the two approaches, it is indicated that the reliability model by linear cumulative damage rule provides a non-conservative result compared with that by the proposed one. In addition, the results demonstrate that ALS-SVM is an effective analysis method holding high computational efficiency with small training samples to gain accurate fatigue reliability.

Decision based uncertainty model to predict rockburst in underground engineering structures using gradient boosting algorithms

  • Kidega, Richard;Ondiaka, Mary Nelima;Maina, Duncan;Jonah, Kiptanui Arap Too;Kamran, Muhammad
    • Geomechanics and Engineering
    • /
    • v.30 no.3
    • /
    • pp.259-272
    • /
    • 2022
  • Rockburst is a dynamic, multivariate, and non-linear phenomenon that occurs in underground mining and civil engineering structures. Predicting rockburst is challenging since conventional models are not standardized. Hence, machine learning techniques would improve the prediction accuracies. This study describes decision based uncertainty models to predict rockburst in underground engineering structures using gradient boosting algorithms (GBM). The model input variables were uniaxial compressive strength (UCS), uniaxial tensile strength (UTS), maximum tangential stress (MTS), excavation depth (D), stress ratio (SR), and brittleness coefficient (BC). Several models were trained using different combinations of the input variables and a 3-fold cross-validation resampling procedure. The hyperparameters comprising learning rate, number of boosting iterations, tree depth, and number of minimum observations were tuned to attain the optimum models. The performance of the models was tested using classification accuracy, Cohen's kappa coefficient (k), sensitivity and specificity. The best-performing model showed a classification accuracy, k, sensitivity and specificity values of 98%, 93%, 1.00 and 0.957 respectively by optimizing model ROC metrics. The most and least influential input variables were MTS and BC, respectively. The partial dependence plots revealed the relationship between the changes in the input variables and model predictions. The findings reveal that GBM can be used to anticipate rockburst and guide decisions about support requirements before mining development.

A ResNet based multiscale feature extraction for classifying multi-variate medical time series

  • Zhu, Junke;Sun, Le;Wang, Yilin;Subramani, Sudha;Peng, Dandan;Nicolas, Shangwe Charmant
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.5
    • /
    • pp.1431-1445
    • /
    • 2022
  • We construct a deep neural network model named ECGResNet. This model can diagnosis diseases based on 12-lead ECG data of eight common cardiovascular diseases with a high accuracy. We chose the 16 Blocks of ResNet50 as the main body of the model and added the Squeeze-and-Excitation module to learn the data information between channels adaptively. We modified the first convolutional layer of ResNet50 which has a convolutional kernel of 7 to a superposition of convolutional kernels of 8 and 16 as our feature extraction method. This way allows the model to focus on the overall trend of the ECG signal while also noticing subtle changes. The model further improves the accuracy of cardiovascular and cerebrovascular disease classification by using a fully connected layer that integrates factors such as gender and age. The ECGResNet model adds Dropout layers to both the residual block and SE module of ResNet50, further avoiding the phenomenon of model overfitting. The model was eventually trained using a five-fold cross-validation and Flooding training method, with an accuracy of 95% on the test set and an F1-score of 0.841.We design a new deep neural network, innovate a multi-scale feature extraction method, and apply the SE module to extract features of ECG data.

Machine Learning Algorithm for Estimating Ink Usage (머신러닝을 통한 잉크 필요량 예측 알고리즘)

  • Se Wook Kwon;Young Joo Hyun;Hyun Chul Tae
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.1
    • /
    • pp.23-31
    • /
    • 2023
  • Research and interest in sustainable printing are increasing in the packaging printing industry. Currently, predicting the amount of ink required for each work is based on the experience and intuition of field workers. Suppose the amount of ink produced is more than necessary. In this case, the rest of the ink cannot be reused and is discarded, adversely affecting the company's productivity and environment. Nowadays, machine learning models can be used to figure out this problem. This study compares the ink usage prediction machine learning models. A simple linear regression model, Multiple Regression Analysis, cannot reflect the nonlinear relationship between the variables required for packaging printing, so there is a limit to accurately predicting the amount of ink needed. This study has established various prediction models which are based on CART (Classification and Regression Tree), such as Decision Tree, Random Forest, Gradient Boosting Machine, and XGBoost. The accuracy of the models is determined by the K-fold cross-validation. Error metrics such as root mean squared error, mean absolute error, and R-squared are employed to evaluate estimation models' correctness. Among these models, XGBoost model has the highest prediction accuracy and can reduce 2134 (g) of wasted ink for each work. Thus, this study motivates machine learning's potential to help advance productivity and protect the environment.

CT-Based Fagotti Scoring System for Non-Invasive Prediction of Cytoreduction Surgery Outcome in Patients with Advanced Ovarian Cancer

  • Na Young Kim;Dae Chul Jung;Jung Yun Lee;Kyung Hwa Han;Young Taik Oh
    • Korean Journal of Radiology
    • /
    • v.22 no.9
    • /
    • pp.1481-1489
    • /
    • 2021
  • Objective: To construct a CT-based Fagotti scoring system by analyzing the correlations between laparoscopic findings and CT features in patients with advanced ovarian cancer. Materials and Methods: This retrospective cohort study included patients diagnosed with stage III/IV ovarian cancer who underwent diagnostic laparoscopy and debulking surgery between January 2010 and June 2018. Two radiologists independently reviewed preoperative CT scans and assessed ten CT features known as predictors of suboptimal cytoreduction. Correlation analysis between ten CT features and seven laparoscopic parameters based on the Fagotti scoring system was performed using Spearman's correlation. Variable selection and model construction were performed by logistic regression with the least absolute shrinkage and selection operator method using a predictive index value (PIV) ≥ 8 as an indicator of suboptimal cytoreduction. The final CT-based scoring system was internally validated using 5-fold cross-validation. Results: A total of 157 patients (median age, 56 years; range, 27-79 years) were evaluated. Among 120 (76.4%) patients with a PIV ≥ 8, 105 patients received neoadjuvant chemotherapy followed by interval debulking surgery, and the optimal cytoreduction rate was 90.5% (95 of 105). Among 37 (23.6%) patients with PIV < 8, 29 patients underwent primary debulking surgery, and the optimal cytoreduction rate was 93.1% (27 of 29). CT features showing significant correlations with PIV ≥ 8 were mesenteric involvement, gastro-transverse mesocolon-splenic space involvement, diaphragmatic involvement, and para-aortic lymphadenopathy. The area under the receiver operating curve of the final model for prediction of PIV ≥ 8 was 0.72 (95% confidence interval: 0.62-0.82). Conclusion: Central tumor burden and upper abdominal spread features on preoperative CT were identified as distinct predictive factors for high PIV on diagnostic laparoscopy. The CT-based PIV prediction model might be useful for patient stratification before cytoreduction surgery for advanced ovarian cancer.

A Deep Learning Approach for Covid-19 Detection in Chest X-Rays

  • Sk. Shalauddin Kabir;Syed Galib;Hazrat Ali;Fee Faysal Ahmed;Mohammad Farhad Bulbul
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.3
    • /
    • pp.125-134
    • /
    • 2024
  • The novel coronavirus 2019 is called COVID-19 has outspread swiftly worldwide. An early diagnosis is more important to control its quick spread. Medical imaging mechanics, chest calculated tomography or chest X-ray, are playing a vital character in the identification and testing of COVID-19 in this present epidemic. Chest X-ray is cost effective method for Covid-19 detection however the manual process of x-ray analysis is time consuming given that the number of infected individuals keep growing rapidly. For this reason, it is very important to develop an automated COVID-19 detection process to control this pandemic. In this study, we address the task of automatic detection of Covid-19 by using a popular deep learning model namely the VGG19 model. We used 1300 healthy and 1300 confirmed COVID-19 chest X-ray images in this experiment. We performed three experiments by freezing different blocks and layers of VGG19 and finally, we used a machine learning classifier SVM for detecting COVID-19. In every experiment, we used a five-fold cross-validation method to train and validated the model and finally achieved 98.1% overall classification accuracy. Experimental results show that our proposed method using the deep learning-based VGG19 model can be used as a tool to aid radiologists and play a crucial role in the timely diagnosis of Covid-19.