• Title/Summary/Keyword: time-dependent ROC

Search Result 9, Processing Time 0.021 seconds

Review for time-dependent ROC analysis under diverse survival models (생존 분석 자료에서 적용되는 시간 가변 ROC 분석에 대한 리뷰)

  • Kim, Yang-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.35-47
    • /
    • 2022
  • The receiver operating characteristic (ROC) curve was developed to quantify the classification ability of marker values (covariates) on the response variable and has been extended to survival data with diverse missing data structure. When survival data is understood as binary data (status of being alive or dead) at each time point, the ROC curve expressed at every time point results in time-dependent ROC curve and time-dependent area under curve (AUC). In particular, a follow-up study brings the change of cohort and incomplete data structures such as censoring and competing risk. In this paper, we review time-dependent ROC estimators under several contexts and perform simulation to check the performance of each estimators. We analyzed a dementia dataset to compare the prognostic power of markers.

Estimation of the time-dependent AUC for cure rate model with covariate dependent censoring

  • Yang-Jin Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.4
    • /
    • pp.365-375
    • /
    • 2024
  • Diverse methods to evaluate the prediction model of a time to event have been proposed in the context of right censored data where all subjects are subject to be susceptible. A time-dependent AUC (area under curve) measures the predictive ability of a marker based on case group and control one which are varying over time. When a substantial portion of subjects are event-free, a population consists of a susceptible group and a cured one. An uncertain curability of censored subjects makes it difficult to define both case group and control one. In this paper, our goal is to propose a time-dependent AUC for a cure rate model when a censoring distribution is related with covariates. A class of inverse probability of censoring weighted (IPCW) AUC estimators is proposed to adjust the possible sampling bias. We evaluate the finite sample performance of the suggested methods with diverse simulation schemes and the application to the melanoma dataset is presented to compare with other methods.

Machine Learning-Based Prediction of COVID-19 Severity and Progression to Critical Illness Using CT Imaging and Clinical Data

  • Subhanik Purkayastha;Yanhe Xiao;Zhicheng Jiao;Rujapa Thepumnoeysuk;Kasey Halsey;Jing Wu;Thi My Linh Tran;Ben Hsieh;Ji Whae Choi;Dongcui Wang;Martin Vallieres;Robin Wang;Scott Collins;Xue Feng;Michael Feldman;Paul J. Zhang;Michael Atalay;Ronnie Sebro;Li Yang;Yong Fan;Wei-hua Liao;Harrison X. Bai
    • Korean Journal of Radiology
    • /
    • v.22 no.7
    • /
    • pp.1213-1224
    • /
    • 2021
  • Objective: To develop a machine learning (ML) pipeline based on radiomics to predict Coronavirus Disease 2019 (COVID-19) severity and the future deterioration to critical illness using CT and clinical variables. Materials and Methods: Clinical data were collected from 981 patients from a multi-institutional international cohort with real-time polymerase chain reaction-confirmed COVID-19. Radiomics features were extracted from chest CT of the patients. The data of the cohort were randomly divided into training, validation, and test sets using a 7:1:2 ratio. A ML pipeline consisting of a model to predict severity and time-to-event model to predict progression to critical illness were trained on radiomics features and clinical variables. The receiver operating characteristic area under the curve (ROC-AUC), concordance index (C-index), and time-dependent ROC-AUC were calculated to determine model performance, which was compared with consensus CT severity scores obtained by visual interpretation by radiologists. Results: Among 981 patients with confirmed COVID-19, 274 patients developed critical illness. Radiomics features and clinical variables resulted in the best performance for the prediction of disease severity with a highest test ROC-AUC of 0.76 compared with 0.70 (0.76 vs. 0.70, p = 0.023) for visual CT severity score and clinical variables. The progression prediction model achieved a test C-index of 0.868 when it was based on the combination of CT radiomics and clinical variables compared with 0.767 when based on CT radiomics features alone (p < 0.001), 0.847 when based on clinical variables alone (p = 0.110), and 0.860 when based on the combination of visual CT severity scores and clinical variables (p = 0.549). Furthermore, the model based on the combination of CT radiomics and clinical variables achieved time-dependent ROC-AUCs of 0.897, 0.933, and 0.927 for the prediction of progression risks at 3, 5 and 7 days, respectively. Conclusion: CT radiomics features combined with clinical variables were predictive of COVID-19 severity and progression to critical illness with fairly high accuracy.

Effects of a Five Times Sit to Stand Test on the Daily Life Independence of Korean Elderly and Cut-Off Analysis

  • Nam, Seung-Min;Kim, Seong-Gil
    • Journal of the Korean Society of Physical Medicine
    • /
    • v.14 no.4
    • /
    • pp.29-35
    • /
    • 2019
  • PURPOSE: The aim of this study was to provide the standard value of the Five Times Sit to Stand Test (FTSST) measurement on the daily life independence of the elderly in Korea and examine the effects of this test on their daily lives. METHODS: This study was conducted on elderly people over 65 years of age living in Gyeongsangbuk-do, Korea. FTSST was performed while sitting position on a chair. The subjects were classified into independent and dependent living groups according to their lifestyle, and their influence was then examined through logistic regression analysis. To determine the usefulness and cut-off value of the FTSST, the analysis was performed using the ROC curve. RESULTS: The elderly were more likely to live in a group rather than independently as the FTSST time increased (p<.05) (OR=1.098). The area of the lower part of the ROC curve was .707, and as the FTSST increased, a subject was more likely to live in a group rather than independently (p<.05). The cut-off value was assigned to the point where both the specificity and sensitivity were at the coordinates. The sensitivity and specificity were .626 and .753, respectively at 15.62 seconds. CONCLUSION: The elderly in Korea are more likely to live a group-dependent lifestyle than live independently; the likelihood of this outcome is increased further for every additional second beyond 15.62 seconds. The loss of independence of daily life could be predicted based on the status of a subject's lower leg strength using the FTSST.

Analysis of stage III proximal colon cancer using the Cox proportional hazards model (Cox 비례위험모형을 이용한 우측 대장암 3기 자료 분석)

  • Lee, Taeseob;Lee, Minjung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.349-359
    • /
    • 2017
  • In this paper, we conducted survival analyses by fitting the Cox proportional hazards model to stage III proximal colon cancer data obtained from the Surveillance, Epidemiology, and End Results program of the National Cancer Institute. We investigated the effect of covariates on the hazard function for death from proximal colon cancer in stage III with surgery performed and estimated the survival probability for a patient with specific covariates. We showed that the proportional hazards assumption is satisfied for covariates that were used to analyses, using a test based on the Schoenfeld residuals and plots of the Schoenfeld residuals and $log[-log\{{\hat{S}}(t)\}]$. We evaluated the model calibration and discriminatory accuracy by calibration plot and time-dependent area under the ROC curve, which were calculated using 10-fold cross validation.

Prognostic Value of 18F-FDG PET/CT Radiomics in Extranodal Nasal-Type NK/T Cell Lymphoma

  • Yu Luo;Zhun Huang;Zihan Gao;Bingbing Wang;Yanwei Zhang;Yan Bai;Qingxia Wu;Meiyun Wang
    • Korean Journal of Radiology
    • /
    • v.25 no.2
    • /
    • pp.189-198
    • /
    • 2024
  • Objective: To investigate the prognostic utility of radiomics features extracted from 18F-fluorodeoxyglucose (FDG) PET/CT combined with clinical factors and metabolic parameters in predicting progression-free survival (PFS) and overall survival (OS) in individuals diagnosed with extranodal nasal-type NK/T cell lymphoma (ENKTCL). Materials and Methods: A total of 126 adults with ENKTCL who underwent 18F-FDG PET/CT examination before treatment were retrospectively included and randomly divided into training (n = 88) and validation cohorts (n = 38) at a ratio of 7:3. Least absolute shrinkage and selection operation Cox regression analysis was used to select the best radiomics features and calculate each patient's radiomics scores (RadPFS and RadOS). Kaplan-Meier curve and Log-rank test were used to compare survival between patient groups risk-stratified by the radiomics scores. Various models to predict PFS and OS were constructed, including clinical, metabolic, clinical + metabolic, and clinical + metabolic + radiomics models. The discriminative ability of each model was evaluated using Harrell's C index. The performance of each model in predicting PFS and OS for 1-, 3-, and 5-years was evaluated using the time-dependent receiver operating characteristic (ROC) curve. Results: Kaplan-Meier curve analysis demonstrated that the radiomics scores effectively identified high- and low-risk patients (all P < 0.05). Multivariable Cox analysis showed that the Ann Arbor stage, maximum standardized uptake value (SUVmax), and RadPFS were independent risk factors associated with PFS. Further, β2-microglobulin, Eastern Cooperative Oncology Group performance status score, SUVmax, and RadOS were independent risk factors for OS. The clinical + metabolic + radiomics model exhibited the greatest discriminative ability for both PFS (Harrell's C-index: 0.805 in the validation cohort) and OS (Harrell's C-index: 0.833 in the validation cohort). The time-dependent ROC analysis indicated that the clinical + metabolic + radiomics model had the best predictive performance. Conclusion: The PET/CT-based clinical + metabolic + radiomics model can enhance prognostication among patients with ENKTCL and may be a non-invasive and efficient risk stratification tool for clinical practice.

A Study on NOx Emission Control Methods in the Cement Firing Process Using Data Mining Techniques (데이터 마이닝을 이용한 시멘트 소성공정 질소산화물(NOx)배출 관리 방법에 관한 연구)

  • Park, Chul Hong;Kim, Yong Soo
    • Journal of Korean Society for Quality Management
    • /
    • v.46 no.3
    • /
    • pp.739-752
    • /
    • 2018
  • Purpose: The purpose of this study was to investigate the relationship between kiln processing parameters and NOx emissions that occur in the sintering and calcination steps of the cement manufacturing process and to derive the main factors responsible for producing emissions outside emission limit criteria, as determined by category models and classification rules, using data mining techniques. The results from this study are expected to be useful as guidelines for NOx emission control standards. Methods: Data were collected from Precalciner Kiln No.3 used in one of the domestic cement plants in Korea. Thirty-four independent variables affecting NOx generation and dependent variables that exceeded or were below the NOx emiision limit (>1 and <0, respectively) were examined during kiln processing. These data were used to construct a detection model of NOx emission, in which emissions exceeded or were below the set limits. The model was validated using SPSS MODELER 18.0, artificial neural network, decision treee (C5.0), and logistic regression analysis data mining techniques. Results: The decision tree (C5.0) algorithm best represented NOx emission behavior and was used to identify 10 processing variables that resulted in NOx emissions outside limit criteria. Conclusion: The results of this study indicate that the decision tree (C5.0) can be applied for real-time monitoring and management of NOx emissions during the cement firing process to satisfy NOx emission control standards and to provide for a more eco-friendly cement product.

Performances of Prognostic Models in Stratifying Patients with Advanced Gastric Cancer Receiving First-line Chemotherapy: a Validation Study in a Chinese Cohort

  • Xu, Hui;Zhang, Xiaopeng;Wu, Zhijun;Feng, Ying;Zhang, Cheng;Xie, Minmin;Yang, Yahui;Zhang, Yi;Feng, Chong;Ma, Tai
    • Journal of Gastric Cancer
    • /
    • v.21 no.3
    • /
    • pp.268-278
    • /
    • 2021
  • Purpose: While several prognostic models for the stratification of death risk have been developed for patients with advanced gastric cancer receiving first-line chemotherapy, they have seldom been tested in the Chinese population. This study investigated the performance of these models and identified the optimal tools for Chinese patients. Materials and Methods: Patients diagnosed with metastatic or recurrent gastric adenocarcinoma who received first-line chemotherapy were eligible for inclusion in the validation cohort. Their clinical data and survival outcomes were retrieved and documented. Time-dependent receiver operating characteristic (ROC) and calibration curves were used to evaluate the predictive ability of the models. Kaplan-Meier curves were plotted for patients in different risk groups divided by 7 published stratification tools. Log-rank tests with pairwise comparisons were used to compare survival differences. Results: The analysis included a total of 346 patients with metastatic or recurrent disease. The median overall survival time was 11.9 months. The patients were different into different risk groups according to the prognostic stratification models, which showed variability in distinguishing mortality risk in these patients. The model proposed by Kim et al. showed relative higher predicting abilities compared to the other models, with the highest χ2 (25.8) value in log-rank tests across subgroups, and areas under the curve values at 6, 12, and 24 months of 0.65 (95% confidence interval [CI]: 0.59-0.72), 0.60 (0.54-0.65), and 0.63 (0.56-0.69), respectively. Conclusions: Among existing prognostic tools, the models constructed by Kim et al., which incorporated performance status score, neutrophil-to-lymphocyte ratio, alkaline phosphatase, albumin, and tumor differentiation, were more effective in stratifying Chinese patients with gastric cancer receiving first-line chemotherapy.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.