Statistical Method of Ranking Candidate Genes for the Biomarker

  • Kim, Byung-Soo (Department of Applied Statistics, Yonsei University) ;
  • Kim, In-Young (Department of Epidemiology and Public Health, School of Medicine, Yale University) ;
  • Lee, Sun-Ho (Department of Applied Mathematics, Sejong University) ;
  • Rha, Sun-Young (Cancer Metastasis Research Cancer, College of Medicine, Yonsei University)
  • Published : 2007.04.30


Receive operating characteristic (ROC) approach can be employed to rank candidate genes from a microarray experiment, in particular, for the biomarker development with the purpose of population screening of a cancer. In the cancer microarray experiment based on n patients the researcher often wants to compare the tumor tissue with the normal tissue within the same individual using a common reference RNA. Ideally, this experiment produces n pairs of microarray data. However, it is often the case that there are missing values either in the normal or tumor tissue data. Practically, we have $n_1$ pairs of complete observations, $n_2$ "normal only" and $n_3$ "tumor only" data for the microarray. We refer to this data set as a mixed data set. We develop a ROC approach on the mixed data set to rank candidate genes for the biomarker development for the colorectal cancer screening. It turns out that the correlation between two ranks in terms of ROC and t statistics based on the top 50 genes of ROC rank is less than 0.6. This result indicates that employing a right approach of ranking candidate genes for the biomarker development is important for the allocation of resources.


  1. Hardwick, J. C., Van Den Brink, G. R., Bleuming, S. A., Ballester, I., Van Den Brande, J. M., Keller, J. J., Offerhaus, G. J., Van Deventer, S. J. and Peppelenbosch, M. P. (2004). Bone morphogenetic protein 2 is expressed by, and acts upon, mature epithelial cells in the colon. Gastmentemlogy, 126, 111-121
  2. Kim, B. S., Kim, I., Lee, S., Kim, S., Rha, S. Y. and Chung, H. C. (2005). Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer. Bioinformatics, 21, 517-528
  3. Li, M., Lin, Y.M., Hasegawa, S., Shimokawa, T., Murata, K., Kameyama, M., Ishikawa, O., Katagiri, T., Tsunoda, T., Nakamura, Y. and Furukawa, Y. (2004). Genes associated with liver metastasis of colon cancer identified by genome-wide cDNA microarray. International Journal of Oncology, 24, 305-312
  4. Mao, J. D., Wu, P., Xia, X. H., Hu, J. Q., Huang, W. B. and Xu, G. Q. (2005). Correlation between expression of gastrin, somatostatin and cell apoptosis regulation gene bcl-2/bax in large intestine carcinoma. World Journal of Gastroenterology, 11, 721-725
  5. Ottaiano, A., Palma, A. di., Napolitano, M., Pisano, C., Pignata, S., Tatangelo, F., Botti, G., Acquaviva, A. M., Castello, G., Ascierto, P. A., Iaffaioli, R. V. and Scala, S. (2004). Inhibitory effects of anti-CXCR4 antibodies on human colon cancer cells. Cancer Immunolgy, Immunotherapy, 54, 781-791
  6. Park, C. H., Jeong, H. J., Jung, J. J., Lee, G. Y, Kim, S. C., Kim, T. S., Yang, S. H., Chung, H. C. and Rha, S. Y. (2004). Fabrication of high quality cDNA microarray using a small amount of cDNA, International Journal of Molecular Medicine, 13,675-679
  7. Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press, New York
  8. Pepe, M. S., Etzioni, R., Feng, Z., Potter, J., Thompson, M. L., Thornquist, M., Winget, M. and Yasui, Y. (2001). Phases of biomarker development for early detection of cancer. Journal of the National Cancer Institute, 93, 1054-1061
  9. Pepe, M. S., Longton, G. M., Anderson, G. L. and Schummer, M. (2003). Selecting differentially expressed genes from microarray experiments. Biometrics, 59, 133-142
  10. Stulik, J., Koupilova, K., Osterreicher, J., Knizek, J., Macela, A., Bures, J., Jandik, P., Langr, F., Dedic, K. and Jungblut, P.R. (1999). Protein abundance alterations in matched sets of macroscopically normal colon mucosa and colorectal carcinoma. Electrophoresis, 20, 3638-3646<3638::AID-ELPS3638>3.0.CO;2-W
  11. Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J. and Speed, T. P. (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation, Nucleic Acids Research, 30, e15