FCAnalyzer: A Functional Clustering Analysis Tool for Predicted Transcription Regulatory Elements and Gene Ontology Terms

  • Kim, Sang-Bae (Korean BioInformation Center, Korea Research Institute of Bioscience and Biotechnology) ;
  • Ryu, Gil-Mi (Center for Genome Science, National Institute of Health) ;
  • Kim, Young-Jin (Center for Genome Science, National Institute of Health) ;
  • Heo, Jee-Yeon (Center for Genome Science, National Institute of Health) ;
  • Park, Chan (Center for Genome Science, National Institute of Health) ;
  • Oh, Berm-Seok (Center for Genome Science, National Institute of Health) ;
  • Kim, Hyung-Lae (Center for Genome Science, National Institute of Health) ;
  • Kimm, Ku-Chan (Center for Genome Science, National Institute of Health) ;
  • Kim, Kyu-Won (College of Pharmacy, Seoul National University) ;
  • Kim, Young-Youl (Center for Genome Science, National Institute of Health)
  • Published : 2007.03.31

Abstract

Numerous studies have reported that genes with similar expression patterns are co-regulated. From gene expression data, we have assumed that genes having similar expression pattern would share similar transcription factor binding sites (TFBSs). These function as the binding regions for transcription factors (TFs) and thereby regulate gene expression. In this context, various analysis tools have been developed. However, they have shortcomings in the combined analysis of expression patterns and significant TFBSs and in the functional analysis of target genes of significantly overrepresented putative regulators. In this study, we present a web-based A Functional Clustering Analysis Tool for Predicted Transcription Regulatory Elements and Gene Ontology Terms (FCAnalyzer). This system integrates microarray clustering data with similar expression patterns, and TFBS data in each cluster. FCAnalyzer is designed to perform two independent clustering procedures. The first process clusters gene expression profiles using the K-means clustering method, and the second process clusters predicted TFBSs in the upstream region of previously clustered genes using the hierarchical biclustering method for simultaneous grouping of genes and samples. This system offers retrieved information for predicted TFBSs in each cluster using $Match^{TM}$ in the TRANSFAC database. We used gene ontology term analysis for functional annotation of genes in the same cluster. We also provide the user with a combinatorial TFBS analysis of TFBS pairs. The enrichment of TFBS analysis and GO term analysis is statistically by the calculation of P values based on Fisher’s exact test, hypergeometric distribution and Bonferroni correction. FCAnalyzer is a web-based, user-friendly functional clustering analysis system that facilitates the transcriptional regulatory analysis of co-expressed genes. This system presents the analyses of clustered genes, significant TFBSs, significantly enriched TFBS combinations, their target genes and TFBS-TF pairs.

Keywords

References

  1. Ashburner, M., Ball, C. A. et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25-29
  2. Boyle, E. I., Weng, S. et al. (2004). GO:TermFinder--open source software for accessing Gene Ontdogy information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics 20, 3710-3715 https://doi.org/10.1093/bioinformatics/bth456
  3. Chung, H. J., Kim, M. et al. (2004). ArrayXPath: mapping and visualizing microarray gene-expression data with integrated biological pathway resources using Scalable Vector Graphics. Nucleic Acids. Res. 32(Web Server issue), W460-W464 https://doi.org/10.1093/nar/gkh476
  4. Dai, H., Tian, B. et al. (2004). Dynamic integration of gene annotation and its application to microarray analysis. J. Bioinform. Comput. Biol. 1, 627-645 https://doi.org/10.1142/S0219720004000387
  5. Dennis, G., Jr., Sherman, B. T. et al. (2003). DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome. Biol. 4, P3 https://doi.org/10.1186/gb-2003-4-5-p3
  6. Eisen, M. B., Spellman, P. T. et al. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863-14868
  7. Germain-Desprez, D., Brun, T. et al. (2001). The SMN genes are subject to transcriptional regulation during cellular differentiation. Gene. 279, 109-117 https://doi.org/10.1016/S0378-1119(01)00758-2
  8. Gupta, S., Vingron, M. et al. (2005). T-STAG: resource and web-interface for tissue-specific transcripts and genes. Nucleic Acid. Res. 33(Web Server issue), W654-W658 https://doi.org/10.1093/nar/gki350
  9. Jenuth, J. P. (2000). The NCBI. Publicly available tools and resources on the Web. Methods Mol. Biol. 132, 301-312
  10. Kasturi, J. and Acharya, R. (2005). Clustering of diverse genomic data using information fusion. Bioinformatics 21, 423-429 https://doi.org/10.1093/bioinformatics/bti1020
  11. Kellis, M., Patterson, N. et al. (2003). Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241-254 https://doi.org/10.1038/nature01644
  12. Kim, J., Seo, J. et al. (2005). TFExplorer: integrated analysis database for predicted transcription regulatory elements. Bioinformatics 21, 548-550 https://doi.org/10.1093/bioinformatics/bti048
  13. Knuppel, R., Dietze, P. et al. (1994). TRANSFAC retrieval program: a network model database of eukaryotic transcription regulating sequences and proteins. J. Comput. Biol. 1, 191-198 https://doi.org/10.1089/cmb.1994.1.191
  14. Liu, Y., Wei, L. et al. (2004). A suite of web-based programs to search for transcriptional regulatory motifs. Nucleic Acids Res. 32(Web Server issue), W204-W207 https://doi.org/10.1093/nar/gkh461
  15. Lobenhofer, E. K., Bushel, P. R. et al. (2001). Progress in the application of DNA microarrays. Environ Health Perspect 109, 881-891 https://doi.org/10.2307/3454988
  16. Martin, D., Brun, C. et al. (2004). GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol. 5, R101 https://doi.org/10.1186/gb-2004-5-12-r101
  17. Maurer, M., Molidor, R. et al. (2005). MARS: microarray analysis, retrieval, and storage system. BMC Bioinformatics 6, 101 https://doi.org/10.1186/1471-2105-6-101
  18. Murakami, K., Kojima, T. et al. (2004). Assessment of clusters of transcription factor binding sites in relationship to human promoter, CpG islands and gene expression. BMC Genomics 5, 16 https://doi.org/10.1186/1471-2164-5-16
  19. Prestridge, D. S. (1995). Predicting Pol II promoter sequences using transcription factor binding sites. J. Mol. Biol. 249, 923-932 https://doi.org/10.1006/jmbi.1995.0349
  20. Pruitt, K. D., Tatusova, T. et al. (2003). NCBI Reference Sequence project: update and current status. Nucleic Acids Res. 31, 34-37 https://doi.org/10.1093/nar/gkg111
  21. Roth, F. P., Hughes, J. D. et al. (1998). Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 16, 939-945 https://doi.org/10.1038/nbt1098-939
  22. Sandelin, A., Wasserman, W. W. et al. (2004). ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res.32(Web Server issue), W249-W252 https://doi.org/10.1093/nar/gkh372
  23. Sandmann, T., Jensen, L. J. et al. (2006). A temporal map of transcription factor activity: mef2 directly regulates target genes at all stages of muscle development. Dev. Cell 10, 797-807 https://doi.org/10.1016/j.devcel.2006.04.009
  24. Sausville, E. A. and Holbeck, S. L. (2004). Transcripion profiling of gene expression in drug discovery and development: the NCI experience. Eur. J. Cancer 40, 2544-2549 https://doi.org/10.1016/j.ejca.2004.08.006
  25. Shamir, R., Maron-Katz, A. et al. (2005). EXPANDER--an integrative program suite for microarray data analysis. BMC Bioinformatics 6, 232 https://doi.org/10.1186/1471-2105-6-232
  26. Sinha, S. and Tompa, M. (2003). YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31, 3586-3588 https://doi.org/10.1093/nar/gkg618
  27. Solovyev, V. and Salamov, A. (1997). The Gene-Finder computer tools for analysis of human and model organisms genome sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 294-302
  28. Sosinsky, A., Bonin, C. P. et al. (2003). Target Explorer: An automated tool for the identification of new target genes for a specified set of transcription factors. Nucleic Acids Res. 31, 3589-3592 https://doi.org/10.1093/nar/gkg544
  29. Tavazoie, S., Hughes, J. D. et al. (1999). Systematic determination of genetic network architecture. Nat. Genet. 22, 281-285 https://doi.org/10.1038/10343
  30. Villard, J. (2004). Transcription regulation and human diseases. Swiss Med. Wkly. 134, 571-579
  31. Walsh, B. and Henderson, D. (2004). Microarrays and beyond: what potential do current and future genomics tools have for breeders? J. Anim. Sci. 82 E-Suppl, E292-E299 https://doi.org/10.2527/2004.821292x
  32. Yang, X., Long, L. et al.(2005). Dysfunctional Smadsignaling contributes to abnormal smooth muscle cell proliferation in familial pulmonary arterial hypertension. Circ. Res. 96, 1053-1063 https://doi.org/10.1161/01.RES.0000166926.54293.68