Comparison of Univariate and Multivariate Gene Set Analysis in Acute Lymphoblastic Leukemia

  • Soheila, Khodakarim (Department of Epidemiology, Faculty of Public Health, Shahid Beheshti University of Medical Sciences) ;
  • Hamid, AlaviMajd (Department of Biowtatistics, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences) ;
  • Farid, Zayeri (Department of Biowtatistics, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences) ;
  • Mostafa, Rezaei-Tavirani (Proteomics Research Center, Shahid Beheshti University of Medical Sciences, Tehran University) ;
  • Nasrin, Dehghan-Nayeri (Department of Proteomics, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences) ;
  • Syyed-Mohammad, Tabatabaee (Department of Medical Informatics, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences) ;
  • Vahide, Tajalli (Department of Linguistics, Faculty of Literature and Human Sciences, Tehran University)
  • Published : 2013.03.30


Background: Gene set analysis (GSA) incorporates biological with statistical knowledge to identify gene sets which are differentially expressed that between two or more phenotypes. Materials and Methods: In this paper gene sets differentially expressed between acute lymphoblastic leukaemia (ALL) with BCR-ABL and those with no observed cytogenetic abnormalities were determined by GSA methods. The BCR-ABL is an abnormal gene found in some people with ALL. Results: The results of two GSAs showed that the Category test identified 30 gene sets differentially expressed between two phenotypes, while the Hotelling's $T^2$ could discover just 19 gene sets. On the other hand, assessment of common genes among significant gene sets showed that there were high agreement between the results of GSA and the findings of biologists. In addition, the performance of these methods was compared by simulated and ALL data. Conclusions: The results on simulated data indicated decrease in the type I error rate and increase the power in multivariate (Hotelling's $T^2$) test as increasing the correlation between gene pairs in contrast to the univariate (Category) test.


  1. Al-Shahrour F, Diaz-Uriarte R, Dopazo J (2004). FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics, 20, 578-80.
  2. BioConductor: Open Source Software for Bioinformatics [cited;Available from:].
  3. Chiou SS, Huang JL, Tsai YS, et al (2007). Elevated mRNA transcripts of non-homologous end-joining genes in pediatric acute lymphoblastic leukemia. Leukemia, 21, 2061-4.
  4. Cory JG, Cory AH (2006). Critical roles of glutamine as nitrogen donors in purine and pyrimidine nucleotide synthesis:asparaginase treatment in childhood acute lymphoblastic leukemia. In Vivo, 20, 587-9.
  5. de Jonge R, Tissing WJ, Hooijberg JH, et al (2009). Polymorphisms in folate-related genes and risk of pediatric acute lymphoblastic leukemia. Blood, 113, 2284-9.
  6. Dinu I, Liu Q, Potter JD, et al (2008). A biological evaluation of six gene set analysis methods for identification of differentially expressed pathways in microarray data. Cancer Inform, 6, 357-68.
  7. Emmert-Streib F, Glazko GV (2011). Pathway Analysis of Expression Data: Deciphering Functional Building Blocks of Complex Diseases. PLoS Computational Biology, 7, 1002053.
  8. Gentleman R (2010). Using Categories to Model Genomic Data [cited; Available from:].
  9. Goeman JJ, Buhlmann P (2007). Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics, 23, 980-7.
  10. Goeman JJ, van de Geer SA, de Kort F, et al (2004). A global test for groups of genes: testing association with a clinical outcome. Bioinformatics, 20, 93-9.
  11. Houwing-Duistermaat JJ, Derkx BHF, Rosendaal FR, et al (1995). Testing familial aggregation. Biometrics, 51, 1292-301.
  12. Hummel M, Meister R, Mansmann U (2008). GlobalANCOVA:exploration and assessment of gene group effects. Bioinformatics, 24, 78-85.
  13. Jiang Z, Gentleman R (2007). Extensions to gene set enrichment. Bioinformatics, 23, 306-13.
  14. Kanehisa M, Goto S (2000). KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res, 28, 27-30.
  15. Khatri P, Draghici S (2005). Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 21, 3587-95.
  16. Kobayashi H, Takemura Y, Ohnuma T (1998). Variable expression of RFC1 in human leukemia cell lines resistant to antifolates. Cancer Lett, 124, 135-42.
  17. Kong SW, Pu WT, Park PJ (2006). A multivariate approach for integrating genome-wide expression data and biological knowledge. Bioinformatics, 22, 2373-80.
  18. Koppen IJ, Hermans FJ, Kaspers GJ (2010). Folate related gene polymorphisms and susceptibility to develop childhood acute lymphoblastic leukaemia. Br J Haematol, 148, 3-14.
  19. Ie Cessie S, van Houwelingen H (1995). Testing the fit of a regression model via score tests in random effects models. Biometrics, 51, 600-14.
  20. Li X (2009). ALL: A data package [cited; Available from:].
  21. Liu Q, Irina Dinu I, Adewaleet AJ, et al (2007). Comparative evaluation of gene-set analysis methods. BMC Bioinformatics, 8, 431.
  22. Man MZ, Wang X, Wang Y (2000). POWER SAGE: comparing statistical test for SAGE experiments. Bioinformatics, 16, 953-9.
  23. Matheson EC, Hall AG (1999). Expression of DNA mismatch repair proteins in acute lymphoblastic leukaemia and normal bone marrow. Adv Exp Med BioI, 457, 579-83.
  24. Merritt WD, Sztein MB, Reaman GH (1988). Detection of GD3 ganglioside in childhood acute lymphoblastic leukemia with monoclonal antibody to GD3: restriction to immunophenotypically defined T-cell disease. J Cell Biochem, 37, 11-9.
  25. Mootha VK, Lindgren CM, Eriksson KF, et al (2003). PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet, 34, 267-73.
  26. Nettleton,D, Recknor J, Reecy JM (2008). Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis. Bioinformatics, 24, 192-201.
  27. Shah KC, Rajshekhar V (2004). Glioblastoma multiforme in a child with acute lymphoblastic leukemia: case report and review of literature. Neural India, 52, 375-7.
  28. Song S, Black MA (2008). Microarray-based gene set analysis: a comparison of current methods. BMC Bioinformatics, 9, 502.
  29. Subramanian A, Tamayo P, Mootha VK, et al (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA,15545-50.
  30. Teresa Gomez CM, de la Iglesia S, Perera M, et al (2002). Renin expression in hematological malignancies and its role in the regulation of hematopoiesis. Leuk Lymphoma, 43, 2377-81.
  31. Tian L, Greenberg SA, Kong SW, et al (2005). Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA, 102, 13544-9.
  32. Tsai CA, Chen JJ (2009). Multivariate analysis of variance test for gene set analysis. Bioinformatics, 25, 897-903.
  33. van Laarhoven JP, Spierenburg GT, Bakkeren JA, et al (1983). Purine metabolism in childhood acute lymphoblastic leukemia: biochemical markers for diagnosis and chemotherapy. Leuk Res, 7, 407-20.
  34. Zolzer F, Basu O, Devi PU, et al (2010). Chromatin-bound PCNA as S-phase marker in mononuclear blood cells of patients with acute lymphoblastic leukaemia or multiple myeloma. Cell Prolif, 43, 579-83.

Cited by

  1. Allogeneic Hemopietic Stem Cell Transplants for the Treatment of B Cell Acute Lymphocytic Leukemia vol.15, pp.15, 2014,
  2. Chromosomal Abnormalities in Pakistani Children with Acute Lymphoblastic Leukemia vol.15, pp.9, 2014,
  3. A Null Model for Pearson Coexpression Networks vol.10, pp.6, 2015,
  4. Digital gene expression profiling analysis of childhood acute lymphoblastic leukemia vol.13, pp.5, 2016,