• Title/Summary/Keyword: time-dependent covariates

Search Result 12, Processing Time 0.017 seconds

Comparison of survival prediction models for pancreatic cancer: Cox model versus machine learning models

  • Kim, Hyunsuk;Park, Taesung;Jang, Jinyoung;Lee, Seungyeoun
    • Genomics & Informatics
    • /
    • v.20 no.2
    • /
    • pp.23.1-23.9
    • /
    • 2022
  • A survival prediction model has recently been developed to evaluate the prognosis of resected nonmetastatic pancreatic ductal adenocarcinoma based on a Cox model using two nationwide databases: Surveillance, Epidemiology and End Results (SEER) and Korea Tumor Registry System-Biliary Pancreas (KOTUS-BP). In this study, we applied two machine learning methods-random survival forests (RSF) and support vector machines (SVM)-for survival analysis and compared their prediction performance using the SEER and KOTUS-BP datasets. Three schemes were used for model development and evaluation. First, we utilized data from SEER for model development and used data from KOTUS-BP for external evaluation. Second, these two datasets were swapped by taking data from KOTUS-BP for model development and data from SEER for external evaluation. Finally, we mixed these two datasets half and half and utilized the mixed datasets for model development and validation. We used 9,624 patients from SEER and 3,281 patients from KOTUS-BP to construct a prediction model with seven covariates: age, sex, histologic differentiation, adjuvant treatment, resection margin status, and the American Joint Committee on Cancer 8th edition T-stage and N-stage. Comparing the three schemes, the performance of the Cox model, RSF, and SVM was better when using the mixed datasets than when using the unmixed datasets. When using the mixed datasets, the C-index, 1-year, 2-year, and 3-year time-dependent areas under the curve for the Cox model were 0.644, 0.698, 0.680, and 0.687, respectively. The Cox model performed slightly better than RSF and SVM.

Statistical analyses in an occupational health study (산업보건연구에서의 통계학적 분석)

  • 백도명;최정근;손미아
    • The Korean Journal of Applied Statistics
    • /
    • v.6 no.2
    • /
    • pp.201-215
    • /
    • 1993
  • The health status of workers in a foundry was analyzed in a study which consisted of evaluations of respiratory health together with environmental measurements. The results from environmental measurements showed values exceeding permissible exposure limits. A t-test was done with log transformed and untransformed data to examine the statistical significance for the noncompliance with exposure standards. For the analysis of categorical health outcomes, $\chi$-square test with 2 $\times$ 2 tables and logistic regression analysis were employed. For continuous variables, multiple linear regression was done against assessed risk factors. Pros and cons of different parameters in the compliance (or noncompliance) testing were presented. Respiratory function did not show any relation with occupational exposures, which may be due to the healthy worker effects. Strategies for controlling time dependent covariates were discussed in relation to the healthy worker effect. The scope of statistical analysis in occupational health studies is still limited in Korea without a suitable external comparison group such as credible vital statistics for the whole nation. Internal comparisons between different exposure status often result in unstable estimates of effect, and proportional morbidity study is discussed as an alternative potential research tool.

  • PDF