• Title/Summary/Keyword: 공변량 분석

Search Result 319, Processing Time 0.017 seconds

Comparison of GEE Estimation Methods for Repeated Binary Data with Time-Varying Covariates on Different Missing Mechanisms (시간-종속적 공변량이 포함된 이분형 반복측정자료의 GEE를 이용한 분석에서 결측 체계에 따른 회귀계수 추정방법 비교)

  • Park, Boram;Jung, Inkyung
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.5
    • /
    • pp.697-712
    • /
    • 2013
  • When analyzing repeated binary data, the generalized estimating equations(GEE) approach produces consistent estimates for regression parameters even if an incorrect working correlation matrix is used. However, time-varying covariates experience larger changes in coefficients than time-invariant covariates across various working correlation structures for finite samples. In addition, the GEE approach may give biased estimates under missing at random(MAR). Weighted estimating equations and multiple imputation methods have been proposed to reduce biases in parameter estimates under MAR. This article studies if the two methods produce robust estimates across various working correlation structures for longitudinal binary data with time-varying covariates under different missing mechanisms. Through simulation, we observe that time-varying covariates have greater differences in parameter estimates across different working correlation structures than time-invariant covariates. The multiple imputation method produces more robust estimates under any working correlation structure and smaller biases compared to the other two methods.

A study on prediction for attendances of Korean probaseball games using covariates (공변량을 이용한 한국프로야구 관중 수 예측에 대한 고찰)

  • Han, Ga-Hee;Chung, Jigyu;Yoo, Jae Keun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1481-1489
    • /
    • 2014
  • For predicting yearly total attendances in Korean probaseball games, ARIMA models have been widely adopted so far. In this paper, we discuss two other ways of ARIMAX and growth curves with an exogenous variable to predict the attendances. By using the exogenous variable, it turns out that the prediction has been improved compared to ARIMA. It is concluded that various statistical methods must be considered for better prediction, and its results can be applied to predict the attendances of other pro sports.

Pattern-Mixture Model of the Cox Proportional Hazards Model with Missing Binary Covariates (결측이 있는 이산형 공변량에 대한 Cox비례위험모형의 패턴-혼합 모델)

  • Youk, Tae-Mi;Song, Ju-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.2
    • /
    • pp.279-291
    • /
    • 2012
  • When fitting a Cox proportional hazards model with missing covariates, it is inefficient to exclude observations with missing values in the analysis. Furthermore, if the missing-data mechanism is not Missing Completely At Random(MCAR), it may lead to biased parameter estimation. Many approaches have been suggested to handle the Cox proportional hazards model when covariates are sometimes missing, but they are based on the selection model. This paper suggest an approach to handle Cox proportional hazards model with missing covariates by using the pattern-mixture model (Little, 1993). The pattern-mixture model is expressed by the joint distribution of survival time and the missing-data mechanism. In the pattern-mixture model, many models can be considered by setting up various restrictions, and different results under various restrictions indicate the sensitivity of the model due to missing covariates. A simulation study was conducted to show the sensitivity of parameter estimation under different restrictions in a pattern-mixture model. The proposed approach was also applied to mouse leukemia data.

The EM algorithm for mixture regression with missing covariates (결측 공변량을 갖는 혼합회귀모형에서의 EM 알고리즘)

  • Kim, Hyungmin;Ham, Geonhee;Seo, Byungtae
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1347-1359
    • /
    • 2016
  • Finite mixtures of regression models provide an effective tool to explore a hidden functional relationship between a response variable and covariates. However, it is common in practice that data are not fully observed due to several reasons. In this paper, we derived an expectation-maximization (EM) algorithm to obtain the maximum likelihood estimator when some covariates are missing at random in the finite mixture of regression models. We conduct some simulation studies and we also provide some real data examples to show the validity of the derived EM algorithm.

Non-stationary frequency analysis of monthly maximum daily rainfall in summer season considering surface air temperature and dew-point temperature (지표면 기온 및 이슬점 온도를 고려한 여름철 월 최대 일 강수량의 비정상성 빈도해석)

  • Lee, Okjeong;Sim, Ingyeong;Kim, Sangdan
    • Journal of Wetlands Research
    • /
    • v.20 no.4
    • /
    • pp.338-344
    • /
    • 2018
  • In this study, the surface air temperature (SAT) and the dew-point temperature (DPT) are applied as the covariance of the location parameter among three parameters of GEV distribution to reflect the non-stationarity of extreme rainfall due to climate change. Busan station is selected as the study site and the monthly maximum daily rainfall depth from May to October is used for analysis. Various models are constructed to select the most appropriate co-variate(SAT and DPT) function for location parameter of GEV distribution, and the model with the smallest AIC(Akaike Information Criterion) is selected as the optimal model. As a result, it is found that the non-stationary GEV distribution with co-variate of exp(DPT) is the best. The selected model is used to analyze the effect of climate change scenarios on extreme rainfall quantile. It is confirmed that the design rainfall depth is highly likely to increase as the future DPT increases.

ROC Curve Fitting with Normal Mixtures (정규혼합분포를 이용한 ROC 분석)

  • Hong, Chong-Sun;Lee, Won-Yong
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.269-278
    • /
    • 2011
  • There are many researches that have considered the distribution functions and appropriate covariates corresponding to the scores in order to improve the accuracy of a diagnostic test, including the ROC curve that is represented with the relations of the sensitivity and the specificity. The ROC analysis was used by the regression model including some covariates under the assumptions that its distribution function is known or estimable. In this work, we consider a general situation that both the distribution function and the elects of covariates are unknown. For the ROC analysis, the mixtures of normal distributions are used to estimate the distribution function fitted to the credit evaluation data that is consisted of the score random variable and two sub-populations of parameters. The AUC measure is explored to compare with the nonparametric and empirical ROC curve. We conclude that the method using normal mixtures is fitted to the classical one better than other methods.

Clinical data analysis in retrospective study through equality adjustment between groups (후향적연구의 집단 간 동등성확보를 통한 임상자료분석)

  • Kwak, Sang Gyu;Shin, Im Hee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1317-1325
    • /
    • 2015
  • There are two types of clinical research to figure out risk factor for disease using collected data. One is prospective study to approach the subjects from the present time and the other is retrospective study to find the risk factor using the subject's information in the past. Both approached and study design are different but the purpose of the two studies is to identify a significant difference between two groups and to find out what the variables to influence groups. Especially when comparing the two groups in clinical research, we have to look at the difference between the impact clinical variables by group while controlling the influence of the baseline characteristics variables such as age and sex. However, in the retrospective study, the difference of baseline characteristic variables can occur more frequently because the past records did not randomly assign subjects into two groups. In clinical data analysis use covariates to solve this problem. Typically, the analysis method using the analysis of covariance of variance, adjusted model, and propensity score matching method. This study is introduce the way of equality adjustment between groups data analysis using covariates in retrospective clinical studies and apply it to the recurrence of gastric cancer data.

Analysis of the cause-specific proportional hazards model with missing covariates (누락된 공변량을 가진 원인별 비례위험모형의 분석)

  • Minjung Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.225-237
    • /
    • 2024
  • In the analysis of competing risks data, some of covariates may not be fully observed for some subjects. In such cases, excluding subjects with missing covariate values from the analysis may result in biased estimates and loss of efficiency. In this paper, we studied multiple imputation and the augmented inverse probability weighting method for regression parameter estimation in the cause-specific proportional hazards model with missing covariates. The performance of estimators obtained from multiple imputation and the augmented inverse probability weighting method is evaluated by simulation studies, which show that those methods perform well. Multiple imputation and the augmented inverse probability weighting method were applied to investigate significant risk factors for the risk of death from breast cancer and from other causes for breast cancer data with missing values for tumor size obtained from the Prostate, Lung, Colorectal, and Ovarian Cancer Screen Trial Study. Under the cause-specific proportional hazards model, the methods show that race, marital status, stage, grade, and tumor size are significant risk factors for breast cancer mortality, and stage has the greatest effect on increasing the risk of breast cancer death. Age at diagnosis and tumor size have significant effects on increasing the risk of other-cause death.

Analysis of stage III proximal colon cancer using the Cox proportional hazards model (Cox 비례위험모형을 이용한 우측 대장암 3기 자료 분석)

  • Lee, Taeseob;Lee, Minjung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.349-359
    • /
    • 2017
  • In this paper, we conducted survival analyses by fitting the Cox proportional hazards model to stage III proximal colon cancer data obtained from the Surveillance, Epidemiology, and End Results program of the National Cancer Institute. We investigated the effect of covariates on the hazard function for death from proximal colon cancer in stage III with surgery performed and estimated the survival probability for a patient with specific covariates. We showed that the proportional hazards assumption is satisfied for covariates that were used to analyses, using a test based on the Schoenfeld residuals and plots of the Schoenfeld residuals and $log[-log\{{\hat{S}}(t)\}]$. We evaluated the model calibration and discriminatory accuracy by calibration plot and time-dependent area under the ROC curve, which were calculated using 10-fold cross validation.

Covariate selection criteria for controlling confounding bias in a causal study (인과연구에서 중첩편향을 제거하기 위한 공변량선택기준)

  • Thepepomma, Seethad;Kim, Ji-Hyun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.5
    • /
    • pp.849-858
    • /
    • 2016
  • It is important to control confounding bias when estimating the causal effect of treatment in an observational study. We illustrated that the covariate selection in the causal inference is different from the variable selection in the ANCOVA model. We then investigated the three criteria of covariate selection for controlling confounding bias, which can be used when we have inadequate information to draw a complete causal graph. VanderWeele and Shpitser (2011) proposed one of them and claimed it was better than the other two. We show by example that their criterion also has limitations and some disadvantages. There is no clear winner; however, their criterion is better (if some correction is made on its condition) than the other two because it can remove the confounding bias.