DOI QR코드

DOI QR Code

Developing a Parametric Method for Testing the Significance of Gene Sets in Microarray Data Analysis

마이크로어레이 자료분석에서 모수적 방법을 이용한 유전자군의 유의성 검정

  • Lee, Sun-Ho (Department of Applied Statistics, Sejong University) ;
  • Lee, Seung-Kyu (Department of Applied Statistics, Sejong University) ;
  • Lee, Kwang-Hyun (Department of Applied Statistics, Sejong University)
  • 이선호 (세종대학교 응용통계학) ;
  • 이승규 (세종대학교 응용통계학) ;
  • 이광현 (세종대학교 응용통계학)
  • Published : 2009.05.31

Abstract

The development of microarray technology makes possible to analyse many thousands of genes simultaneously. While it is important to test each gene whether it shows changes in expression associated with a phenotype, human diseases are thought to occur through the interactions of multiple genes within a same functional cafe-gory. Recent research interests aims to directly test the behavior of sets of functionally related genes, instead of focusing on single genes. Gene set enrichment analysis(GSEA), significance analysis of microarray to gene-set analysis(SAM-GS) and parametric analysis of gene set enrichment(PAGE) have been applied widely as a tool for gene-set analyses. We describe their problems and propose an alternative method using a parametric analysis by adopting normal score transformation of gene expression values. Performance of the newly derived method is compared with previous methods on three real microarray datasets.

마이크로어레이 기술은 수만 개 유전자의 발현 패턴을 동시에 관찰하는 것을 가능하게 하였고, 이들을 하나씩 검정하여 찾아낸 특이발현 현상을 보이는 유전자를 중심으로 질병의 진단, 치료법 정립과 신약 개발을 위한 기본 정보를 확립하였다. 그러나 개별 유전자분석의 여러 문제점이 발견되면서 유전자들을 생물학적 대사경로나 염색체 위치가 같은 것끼리 묶은 집단을 분석하여 질병의 발생이나 생존에 영향을 미치는 집단을 찾는 방법이 제시되었다. 이러한 유전자 집단의 유의성에 대한 연구는 2002년에 MIT에서 비롯되어 GSEA, SAM-GS와 중심극한 정리의 개념을 이용한 모수적 방법인 PAGE 등이 사용되고 있다. 본 논문에서는 이들 통계량의 구조적 한계를 극복하고 계산이 간단한 새로운 모수적 방법을 제안하고 자료 분석을 통하여 효율성을 보였다.

Keywords

References

  1. 이광현, 이선호 (2008). 절대치와 절삭을 이용한 유전자 집단 분석, <응용통계연구>, 21, 523-535 https://doi.org/10.5351/KJAS.2008.21.3.523
  2. Barry, W. T., Nobel, A. B. and Wright, F. A. (2005). Significance analysis of functional categories in gene expression studies: A structured permutation approach, Bioinformatics, 21, 1943-1949 https://doi.org/10.1093/bioinformatics/bti260
  3. Blom, G. (1958). Statistical Estimates and Transformed Beta- Variables, John Wiley & Sons, New York
  4. Curtis, R. K., Oresic, M. and Vidal-Puig, A. (2005). Pathways to the analysis of microarray data, Trends in Biotechnology, 23, 429-435 https://doi.org/10.1016/j.tibtech.2005.05.011
  5. Damian, D. and Gorfine, M. (2004). Statistical concerns about the GSEA procedure, Nature genetics, 36, 663 https://doi.org/10.1038/ng0704-663a
  6. Dinu, I., Potter, J. D., Mueller, T., Adewale, A. J., Jhangri, G. S., Einecke, G., Famulski, K. S., Halloran, P. and Yasui, Y. (2007). Improving GSEA for analysis of biologic pathways for differential gene expression across a binary phenotype, COBRA Preprint Series, Article 16
  7. Doniger, S. W., Salomonis, N., Dahlquist, K. D., Vranizan, K., Lawlor, S. C and Conklin, B. R. (2003). MAPPFinder: Using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data, Genome Biology, 4, R7 https://doi.org/10.1186/gb-2003-4-1-r7
  8. Draghici, S., Khatri, P., Martins, R. P., Ostermeier, G. C. and Krawetz, S. A. (2003). Global functional profiling of gene expression, Genomics, 81, 98-104 https://doi.org/10.1016/S0888-7543(02)00021-6
  9. Efron, B. and Tibshirani, R. (2007). On testing the significance of sets of genes, The Annals of Applied Statistics, 1, 107-129 https://doi.org/10.1214/07-AOAS101
  10. Goeman, J. J., van de Geer, S. A., de Kort, F. and van Houwelingen, H. C. (2004). A global test for groups of genes: Testing association with a clinical outcome, Bioinformatics, 20, 93-99 https://doi.org/10.1093/bioinformatics/btg382
  11. Goeman, J. J., Oosting, J., Cleton-Jansen, A. M., Anninga, J. K. and van Houwelingen, H. C. (2005). Testing association of a pathway with survival using gene expression data, Bioinforrnatics, 21, 1950-1957 https://doi.org/10.1093/bioinformatics/bti267
  12. Golub, T. R,, Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science, 286, 531-537 https://doi.org/10.1126/science.286.5439.531
  13. Khatri, P., Bhavsar, P., Bawa, G. and Draghici, S. (2004). Onto-Tools: An ensemble of web-accessible, ontology- based tools for the functional design and interpretation of high- throughput gene expression experiments, Nucleic Acids Research, 32, 449-456 https://doi.org/10.1093/nar/gkh409
  14. Kim, S. Y. and Voisky, D. J. (2005). PAGE: Parametric analysis of gene set enrichment, BMC Bioinfor-matics, 6, 1471-2105 https://doi.org/10.1186/1471-2105-6-144
  15. Mootha, V. K., Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., Houstis, N., Daly, M. J., Patterson, N., Mesirov, J. P., Golub, T. R., Tamayo, P., Spiegelman, B., Lander, E. S., Hirschhorn, J. N., Altshuler, D. and Groop, L. C. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes, Nature Genetics, 34, 267-273 https://doi.org/10.1038/ng1180
  16. Newton, M. A., Quintana, F. A., den Boon, J. A., Sengupta, S. and Ahlquist, P. (2007). Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis, The Annals of Applied Statistics, 1, 85-106 https://doi.org/10.1214/07-AOAS104
  17. Pavlidis, P., Lewis, D. P. and Noble, W. S. (2002). Exploring gene expression data with class scores, In Proceedings of the Pacific Symposium on Biocomputing, 474-485
  18. Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. and Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge- based approach for interpreting genome-wide expression profiles, PNAS, 102, 15545-15550 https://doi.org/10.1073/pnas.0506580102
  19. Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression, PNAS, 99, 6567-6572 https://doi.org/10.1073/pnas.082099299
  20. Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response, PNAS, 98, 5116-5121 https://doi.org/10.1073/pnas.091062498

Cited by

  1. Identifying statistically significant gene sets based on differential expression and differential coexpression vol.29, pp.3, 2016, https://doi.org/10.5351/KJAS.2016.29.3.437