Improved Statistical Testing of Two-class Microarrays with a Robust Statistical Approach

Oh, Hee-Seok;Jang, Dong-Ik;Oh, Seung-Yoon;Kim, Hee-Bal;

doi:10.4051/ibc.2010.2.2.0004

Interdisciplinary Bio Central

Volume 2 Issue 2
/
Pages.4.1-4.6
/
2010
/
2005-8543(eISSN)

Korean Society for Bioinformatics (한국생명정보학회)

DOI QR Code

Improved Statistical Testing of Two-class Microarrays with a Robust Statistical Approach

Oh, Hee-Seok (Department of Statistics, Seoul National University) ;
Jang, Dong-Ik (Department of Statistics, Seoul National University) ;
Oh, Seung-Yoon (Interdisciplinary Program in Bioinformatics, Seoul National University) ;
Kim, Hee-Bal (Interdisciplinary Program in Bioinformatics, Seoul National University)

Received : 2010.03.17
Accepted : 2010.05.31
Published : 2010.06.30

https://doi.org/10.4051/ibc.2010.2.2.0004 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

The most common type of microarray experiment has a simple design using microarray data obtained from two different groups or conditions. A typical method to identify differentially expressed genes (DEGs) between two conditions is the conventional Student's t-test. The t-test is based on the simple estimation of the population variance for a gene using the sample variance of its expression levels. Although empirical Bayes approach improves on the t-statistic by not giving a high rank to genes only because they have a small sample variance, the basic assumption for this is same as the ordinary t-test which is the equality of variances across experimental groups. The t-test and empirical Bayes approach suffer from low statistical power because of the assumption of normal and unimodal distributions for the microarray data analysis. We propose a method to address these problems that is robust to outliers or skewed data, while maintaining the advantages of the classical t-test or modified t-statistics. The resulting data transformation to fit the normality assumption increases the statistical power for identifying DEGs using these statistics.

Keywords

References

Aittokallio, T., Kurki, M., Nevalainen, O., Nikula, T., West, A. andLahesmaa, R. (2003). Computational strategies for analyzingdata in gene expression microarray experiments. J BioinformComput Biol 1, 541-586. https://doi.org/10.1142/S0219720003000319
Allison, D. B., Cui, X., Page, G. P. and Sabripour, M. (2006).Microarray data analysis: from disarray to consolidation andconsensus. Nat Rev Genet 7, 55-65. https://doi.org/10.1038/nrg1749
Benjamini, Y. and Hochberg, Y. (1995). Controlling the falsediscovery rate: a practical and powerful approach to multipletesting. Journal of the Royal Statistical Society, Series B 57,289-300.
Cox, D. D. (1983). Asymptotics for M-type smoothing splines. Ann.Statist 11, 530-551. https://doi.org/10.1214/aos/1176346159
Cui, X., Hwang, J. T., Qiu, J., Blades, N. J. and Churchill, G. A.(2005). Improved statistical tests for differential gene expressionby shrinking variance components estimates. Biostatistics 6, 59-75. https://doi.org/10.1093/biostatistics/kxh018
Gosset, W. S. (1908). The probable error of a mean. Biometrika 6,1-25. https://doi.org/10.1093/biomet/6.1.1
Hever, A., Roth, R. B., Hevezi, P., Marin, M. E., Acosta, J. A.,Acosta, H., Rojas, J., Herrera, R., Grigoriadis, D., White, E.,Conlon, P. J., Maki, R. A. and Zlotnik, A. (2007). Humanendometriosis is associated with plasma cells andoverexpression of B lymphocyte stimulator. Proceedings of theNational Academy of Sciences 104, 12451-12456. https://doi.org/10.1073/pnas.0703451104
Huber, P. J. (1973). Robust regression: asymptotics, conjecturesand Monte Carlo. Annals of Statistics 1, 799-821. https://doi.org/10.1214/aos/1176342503
Irizarry, R. A. (2005). From CEL files to annotated lists of interesting genes. Bioinformatics and Computational Biology Solutions Using R and Bioconductor?Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit S, eds, 434-435.
Ishwaran, H. and Rao, J. S. (2003). Detecting DifferentiallyExpressed Genes in Microarrays Using Bayesian ModelSelection. Journal of the American Statistical Association 98,438-456. https://doi.org/10.1198/016214503000224
Ishwaran, H. and Rao, J. S. (2005). Spike and Slab Gene Selectionfor Multigroup Microarray Data. Journal of the AmericanStatistical Association 100, 764-781. https://doi.org/10.1198/016214505000000051
Oh, H. S., Nychka, D. W. and Lee, T. (2007). The Role of PseudoData for Robust Smoothing with Application to WaveletRegression. Biometrika 94, 893. https://doi.org/10.1093/biomet/asm064
Papana, A. and Ishwaran, H. (2006). CART variance stabilizationand regularization for high-throughput genomic data.Bioinformatics 22, 2254-2261. https://doi.org/10.1093/bioinformatics/btl384
Pavlidis, P., Li, Q. and Noble, W. S. (2003). The effect of replicationon gene expression microarray experiments. Bioinformatics 19,1620-1627. https://doi.org/10.1093/bioinformatics/btg227
Schena, M., Shalon, D., Davis, R. W. and Brown, P. O. (1995).Quantitative monitoring of gene expression patterns with acomplementary DNA microarray. Science 270, 467-470. https://doi.org/10.1126/science.270.5235.467
Smyth, G. K. (2004). Linear Models and Empirical Bayes Methodsfor Assessing Differential Expression in Microarray Experiments.Statistical Applications in Genetics and Molecular Biology 3,1027.
Tsai, C. A., Hsueh, H. M. and Chen, J. J. (2003). Estimation of falsediscovery rates in multiple testing: application to gene microarraydata. Biometrics 59, 1071-1081. https://doi.org/10.1111/j.0006-341X.2003.00123.x
Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significanceanalysis of microarrays applied to the ionizing radiation response.Proc Natl Acad Sci U S A 98, 5116-5121. https://doi.org/10.1073/pnas.091062498
Wang, S. and Ethier, S. (2004). A generalized likelihood ratio test toidentify differentially expressed genes from microarray data.Bioinformatics 20, 100-104. https://doi.org/10.1093/bioinformatics/btg384
Yan, X., Deng, M., Fung, W. K. and Qian, M. (2005). Detectingdifferentially expressed genes by relative entropy. J Theor Biol234, 395-402. https://doi.org/10.1016/j.jtbi.2004.11.039
Yoon, S., Yang, Y., Choi, J. and Seong, J. (2006). Large scale datamining approach for gene-specific standardization of microarraygene expression data. Bioinformatics 22, 2898-2904. https://doi.org/10.1093/bioinformatics/btl500

Interdisciplinary Bio Central

Improved Statistical Testing of Two-class Microarrays with a Robust Statistical Approach

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)