Bayesian Variable Selection in the Proportional Hazard Model with Application to Microarray Data

  • Lee, Kyeong-Eun (Department of Statistics, Kyungpook National University) ;
  • Mallick, Bani K. (Department of Statistics, Texas A&M University)
  • Published : 2005.05.20

Abstract

In this paper we consider the well-known semiparametric proportional hazards models for survival analysis. These models are usually used with few covariates and many observations (subjects). But, for a typical setting of gene expression data from DNA microarray, we need to consider the case where the number of covariates p exceeds the number of samples n. For a given vector of response values which are times to event (death or censored times) and p gene expressions(covariates), we address the issue of how to reduce the dimension by selecting the significant genes. This approach enables us to estimate the survival curve when n ${\ll}$p. In our approach, rather than fixing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional flexibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in effect works as a penalty To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology to diffuse large B-cell lymphoma (DLBCL) complementary DNA (cDNA) data and Breast Carcinomas data.

Keywords