Statistical Analysis for Feature Subset Selection Procedures.

  • Kim, In-Young (Cancer Metastasis Research Center, Yonsei University College of Medicine) ;
  • Lee, Sun-Ho (Department of Applied Mathematics, Sejong University) ;
  • Kim, Sang-Cheol (Cancer Metastasis Research Center, Yonsei University College of Medicine) ;
  • Rha, Sun-Young (Cancer Metastasis Research Center, Yonsei University College of Medicine) ;
  • Chung, Hyun-Cheol (Cancer Metastasis Research Center, Yonsei University College of Medicine) ;
  • Kim, Byung-Soo (Department of Applied Statistics, Yonsei University)
  • Published : 2003.10.31

Abstract

In this paper, we propose using Hotelling's T2 statistic for the detection of a set of a set of differentially expressed (DE) genes in colorectal cancer based on its gene expression level in tumor tissues compared with those in normal tissues and to evaluate its predictivity which let us rank genes for the development of biomarkers for population screening of colorectal cancer. We compared the prediction rate based on the DE genes selected by Hotelling's T2 statistic and univariate t statistic using various prediction methods, a regulized discrimination analysis and a support vector machine. The result shows that the prediction rate based on T2 is better than that of univatiate t. This implies that it may not be sufficient to look at each gene in a separate universe and that evaluating combinations of genes reveals interesting information that will not be discovered otherwise.

Keywords