DOI QR코드

DOI QR Code

Improved Statistical Testing of Two-class Microarrays with a Robust Statistical Approach

  • Oh, Hee-Seok (Department of Statistics, Seoul National University) ;
  • Jang, Dong-Ik (Department of Statistics, Seoul National University) ;
  • Oh, Seung-Yoon (Interdisciplinary Program in Bioinformatics, Seoul National University) ;
  • Kim, Hee-Bal (Interdisciplinary Program in Bioinformatics, Seoul National University)
  • Received : 2010.03.17
  • Accepted : 2010.05.31
  • Published : 2010.06.30

Abstract

The most common type of microarray experiment has a simple design using microarray data obtained from two different groups or conditions. A typical method to identify differentially expressed genes (DEGs) between two conditions is the conventional Student's t-test. The t-test is based on the simple estimation of the population variance for a gene using the sample variance of its expression levels. Although empirical Bayes approach improves on the t-statistic by not giving a high rank to genes only because they have a small sample variance, the basic assumption for this is same as the ordinary t-test which is the equality of variances across experimental groups. The t-test and empirical Bayes approach suffer from low statistical power because of the assumption of normal and unimodal distributions for the microarray data analysis. We propose a method to address these problems that is robust to outliers or skewed data, while maintaining the advantages of the classical t-test or modified t-statistics. The resulting data transformation to fit the normality assumption increases the statistical power for identifying DEGs using these statistics.

Keywords

References

  1. Aittokallio, T., Kurki, M., Nevalainen, O., Nikula, T., West, A. andLahesmaa, R. (2003). Computational strategies for analyzingdata in gene expression microarray experiments. J BioinformComput Biol 1, 541-586. https://doi.org/10.1142/S0219720003000319
  2. Allison, D. B., Cui, X., Page, G. P. and Sabripour, M. (2006).Microarray data analysis: from disarray to consolidation andconsensus. Nat Rev Genet 7, 55-65. https://doi.org/10.1038/nrg1749
  3. Benjamini, Y. and Hochberg, Y. (1995). Controlling the falsediscovery rate: a practical and powerful approach to multipletesting. Journal of the Royal Statistical Society, Series B 57,289-300.
  4. Cox, D. D. (1983). Asymptotics for M-type smoothing splines. Ann.Statist 11, 530-551. https://doi.org/10.1214/aos/1176346159
  5. Cui, X., Hwang, J. T., Qiu, J., Blades, N. J. and Churchill, G. A.(2005). Improved statistical tests for differential gene expressionby shrinking variance components estimates. Biostatistics 6, 59-75. https://doi.org/10.1093/biostatistics/kxh018
  6. Gosset, W. S. (1908). The probable error of a mean. Biometrika 6,1-25. https://doi.org/10.1093/biomet/6.1.1
  7. Hever, A., Roth, R. B., Hevezi, P., Marin, M. E., Acosta, J. A.,Acosta, H., Rojas, J., Herrera, R., Grigoriadis, D., White, E.,Conlon, P. J., Maki, R. A. and Zlotnik, A. (2007). Humanendometriosis is associated with plasma cells andoverexpression of B lymphocyte stimulator. Proceedings of theNational Academy of Sciences 104, 12451-12456. https://doi.org/10.1073/pnas.0703451104
  8. Huber, P. J. (1973). Robust regression: asymptotics, conjecturesand Monte Carlo. Annals of Statistics 1, 799-821. https://doi.org/10.1214/aos/1176342503
  9. Irizarry, R. A. (2005). From CEL files to annotated lists of interesting genes. Bioinformatics and Computational Biology Solutions Using R and Bioconductor?Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S, eds, 434-435.
  10. Ishwaran, H. and Rao, J. S. (2003). Detecting DifferentiallyExpressed Genes in Microarrays Using Bayesian ModelSelection. Journal of the American Statistical Association 98,438-456. https://doi.org/10.1198/016214503000224
  11. Ishwaran, H. and Rao, J. S. (2005). Spike and Slab Gene Selectionfor Multigroup Microarray Data. Journal of the AmericanStatistical Association 100, 764-781. https://doi.org/10.1198/016214505000000051
  12. Oh, H. S., Nychka, D. W. and Lee, T. (2007). The Role of PseudoData for Robust Smoothing with Application to WaveletRegression. Biometrika 94, 893. https://doi.org/10.1093/biomet/asm064
  13. Papana, A. and Ishwaran, H. (2006). CART variance stabilizationand regularization for high-throughput genomic data.Bioinformatics 22, 2254-2261. https://doi.org/10.1093/bioinformatics/btl384
  14. Pavlidis, P., Li, Q. and Noble, W. S. (2003). The effect of replicationon gene expression microarray experiments. Bioinformatics 19,1620-1627. https://doi.org/10.1093/bioinformatics/btg227
  15. Schena, M., Shalon, D., Davis, R. W. and Brown, P. O. (1995).Quantitative monitoring of gene expression patterns with acomplementary DNA microarray. Science 270, 467-470. https://doi.org/10.1126/science.270.5235.467
  16. Smyth, G. K. (2004). Linear Models and Empirical Bayes Methodsfor Assessing Differential Expression in Microarray Experiments.Statistical Applications in Genetics and Molecular Biology 3,1027.
  17. Tsai, C. A., Hsueh, H. M. and Chen, J. J. (2003). Estimation of falsediscovery rates in multiple testing: application to gene microarraydata. Biometrics 59, 1071-1081. https://doi.org/10.1111/j.0006-341X.2003.00123.x
  18. Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significanceanalysis of microarrays applied to the ionizing radiation response.Proc Natl Acad Sci U S A 98, 5116-5121. https://doi.org/10.1073/pnas.091062498
  19. Wang, S. and Ethier, S. (2004). A generalized likelihood ratio test toidentify differentially expressed genes from microarray data.Bioinformatics 20, 100-104. https://doi.org/10.1093/bioinformatics/btg384
  20. Yan, X., Deng, M., Fung, W. K. and Qian, M. (2005). Detectingdifferentially expressed genes by relative entropy. J Theor Biol234, 395-402. https://doi.org/10.1016/j.jtbi.2004.11.039
  21. Yoon, S., Yang, Y., Choi, J. and Seong, J. (2006). Large scale datamining approach for gene-specific standardization of microarraygene expression data. Bioinformatics 22, 2898-2904. https://doi.org/10.1093/bioinformatics/btl500