DOI QR코드

DOI QR Code

극소수 샘플에서 유의발현 유전자 탐색에 사용되는 순열에 근거한 검정법

Permutation-Based Test with Small Samples for Detecting Differentially Expressed Genes

  • 이주형 (가톨릭대학교 의과대학 의학통계학과) ;
  • 송혜향 (가톨릭대학교 의과대학 의학통계학과)
  • Lee, Ju-Hyoung (Department of Biostatistics, Medical College, The Catholic University of Korea) ;
  • Song, Hae-Hiang (Department of Biostatistics, Medical College, The Catholic University of Korea)
  • 발행 : 2009.10.31

초록

마이크로어레이 극소수 샘플(array) 자료의 분석에서는 유의한 발현수치를 나타내는 유전자를 검정통계량에 의해 결정하는 것이 주요과제이다. 이 때 수 천 또는 수 만개인 유전자의 발현수치로부터 귀무분포(null distribution)의 생성이 필수적이며, 극소수 샘플 자료의 경우에는 순열방법(permutation methods)에 의해 귀무분포를 생성하는 것이 가장 바람직하다. 본 논문에서는 귀무분포 생성에 사용될 수 있는 매우 단순한 검정통계량을 제시하면서 더불어 귀무분포 생성에 적절한 순열방법도 제안한다. 모의실험으로 기존의 검정통계량으로 생성된 귀무분포와 본 논문에서 제안하는 검정통계량의 귀무분포를 비교하며, 실제 자료에 적용하여 유의 유전자를 탐색한다.

In the analysis of microarray data with a small number of arrays, the most important task is the detection of differentially expressed genes by a significance test. For this purpose, one needs to construct a null distribution based on a large number of genes and one of the best way for constructing the null distribution for a small number of arrays is by means of permutation methods. In this paper we propose simple test statistics and permutation methods that are appropriate in constructing the null distribution. In a simulation study, we compare the null distributions generated by the proposed test statistics and permutation methods with the previous ones. With an example microarray data, differentially expressed genes are determined by applying these methods.

키워드

참고문헌

  1. 조희진, 송혜향 (2004). 변수가 관측치보다 많은 자료에서 표식 유전자를 찾기 위한 방법, 가톨릭대학교 의과대학 의학통계학과 석사학위 논문집
  2. Albers, W., Bickel, P. J. and van Zwet, W. R. (1976). Asymptotic expansions for the power of distribution free tests in the one-sample problem, Annals of Statistic, 4, 108-156 https://doi.org/10.1214/aos/1176343350
  3. Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D. and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proceedings of the National Academy of Sciences of the United States of America, 96, 6745-6750 https://doi.org/10.1073/pnas.96.12.6745
  4. Chernoff, H. and Savage, I. R. (1958). Asymptotic normality and efficiency of certain nonparametric test statistics, Annals of Mathematical Statistics, 29, 972-994 https://doi.org/10.1214/aoms/1177706436
  5. Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment, Journal of American Statistical Association, 96, 1151-2001 https://doi.org/10.1198/016214501753382129
  6. Fisher, R. A. (1935, 1966). The Design of Experiments (1st ed., 8th ed.). Oliver& Boyd, Edinburgh
  7. Gao, X. (2006). Construction of null statistics in permutation-based multiple testing for multi-factorial microarray experiments, Bioinformatics, 22, 1486-1494 https://doi.org/10.1093/bioinformatics/btl109
  8. Hoeffding, W. (1952). The large-sample power of tests based on permutations of observations, Annals of Mathematical Statistics, 23, 169-192 https://doi.org/10.1214/aoms/1177729436
  9. Jain, N., Thatte, J., Braciale, T., Ley, K., O'Connel, M. and Lee, J. K. (2003). Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays, Bioinformatics, 15, 1945-1951
  10. Lee, J.K. (2001). OAnalysis issues for gene expression array data, Clinical Chemistry, 47, 1350-1352
  11. Lehmann, E. L. and Stein, C. (1949). On the theory of some non-parametric hypotheses, The Annals of Mathematical Statistics, 20, 28-45 https://doi.org/10.1214/aoms/1177730089
  12. Pan, W. (2003a). On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression, Biometrics, 19, 1333-1340
  13. Pan, W., Lin, J. and Le, C. (2003b). A mixture model approach to detecting differentially expressed genes with microarray data, Functional Integrative Genomics, 3, 117-124 https://doi.org/10.1016/0888-7543(88)90141-3
  14. Pitman, E. J. G. (1937). Significance tests which may be applied to samples from any populations, Journal of the Royal Statistical Society, 4, 119-130
  15. Smyth, G. K., Yang, Y. H. and Speed, T. (2003). Statistical issues in cDNA microarray data analysis, Methods in Molecular Biology, 224, 111-136
  16. Spino, C. and Pagano, M. (1991). Efficient calculation of the permutation distribution of trimmed means, Journal of the American Statistical Association, 86, 729-737 https://doi.org/10.2307/2290405
  17. Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response, Proceedings of the National Academy of Sciences of the United States of America, 98, 5116-5121 https://doi.org/10.1073/pnas.091062498
  18. Welch, W. J. (1990). Construction of permutation Tests, Journal of the American Statistical Association, 85, 693-698 https://doi.org/10.2307/2290004
  19. Xie, Y., Pan, W. and Khodursky, A. B. (2005). A note on using permutation-based false discovery rate estiimates to compare different analysis methods for microarray data, Bioinformatics, 21, 4280-4288 https://doi.org/10.1093/bioinformatics/bti685
  20. Zhao, Y. and Pan, W. (2003). Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments, Bioinformatics, 19, 1046-1054 https://doi.org/10.1093/bioinformatics/btf879