Genomic Tree of Gene Contents Based on Functional Groups of KEGG Orthology

  • Kim Jin-Sik (Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology) ;
  • Lee Sang-Yup (Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, Department of BioSystems, BioProcess Engineering Research Center and Bioinformatics Research Center, Korea Advanced Institute of Science and Technology)
  • Published : 2006.05.01

Abstract

We propose a genome-scale clustering approach to identify whole genome relationships using the functional groups given by the Kyoto Encyclopedia of Genes and Genomes Orthology (KO) database. The metabolic capabilities of each organism were defined by the number of genes in each functional category. The archaeal, bacterial, and eukaryotic genomes were compared by simultaneously applying a two-step clustering method, comprised of a self-organizing tree algorithm followed by unsupervised hierarchical clustering. The clustering results were consistent with various phenotypic characteristics of the organisms analyzed and, additionally, showed a different aspect of the relationship between genomes that have previously been established through rRNA-based comparisons. The proposed approach to collect and cluster the metabolic functional capabilities of organisms should make it a useful tool in predicting relationships among organisms.

Keywords

References

  1. Aravind, L., R. L. Tatusov, Y. I. Wolf, D. R. Walker, and E. V. Koonin. 1998. Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet. 14: 442-444 https://doi.org/10.1016/S0168-9525(98)01553-4
  2. Clarke, G. D., R. G. Beiko, M. A. Ragan, and R. L. Charlebois. 2002. Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. J. Bacteriol. 184: 2072-2080 https://doi.org/10.1128/JB.184.8.2072-2080.2002
  3. Daubin, V., M. Gouy, and G. Perriere. 2002. A phylogenomic approach to bacterial phylogeny: Evidence of a core of genes sharing a common history. Genome Res. 12: 1080-1090 https://doi.org/10.1101/gr.187002
  4. Do, J. H., M. J. Anderson, D. W. Denning, and E. B. Bauer. 2004. Inference of Aspergillus fumigatus pathways by computational genome analysis: Tricarboxylic acid cycle (TCA) and glyoxylate shunt. J. Microbiol. Biotechnol. 14: 74-80
  5. Dopazo, J. and J. M. Carazo. 1997. Phylogenetic reconstruction using a growing neural network that adopts the topology of a phylogenetic tree. J. Mol. Evol. 44: 226-233 https://doi.org/10.1007/PL00006139
  6. Edmondson, S. P., M. A. Kahsai, R. Gupta, and J. W. Shriver. 2004. Characterization of Sac10a, a hyperthermophile DNA-binding protein from Sulfolobus acidocaldarius. Biochemistry 43: 13026-13036 https://doi.org/10.1021/bi0491752
  7. Eisen, M. B., P. T. Spellman, P. O. Brown, and D. Botstein. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95: 14863-14868
  8. Fitz-Gibbon, S. T. and C. H. House. 1999. Whole genomebased phylogenetic analysis of free-living microorganisms. Nucleic Acids Res. 27: 4218-4222 https://doi.org/10.1093/nar/27.21.4218
  9. Gardner, M. J., N. Hall, E. Fung, O. White, M. Berriman, R. W. Hyman, J. M. Carlton, A. Pain, K. E. Nelson, S. Bowman, I. T. Paulsen, K. James, J. A. Eisen, K. Rutherford, S. L. Salzberg, A. Craig, S. Kyes, M. S. Chan, V. Nene, S. J. Shallom, B. Suh, J. Peterson, S. Angiuoli, M. Pertea, J. Allen, J. Selengut, D. Haft, M. W. Mather, A. B. Vaidya, D. M. Martin, A. H. Fairlamb, M. J. Fraunholz, D. S. Roos, S. A. Ralph, G. I. McFadden, L. M. Cummings, G. M. Subramanian, C. Mungall, J. C. Venter, D. J. Carucci, S. L. Hoffman, C. Newbold, R. W. Davis, C. M. Fraser, and B. Barrell. 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419: 498-511 https://doi.org/10.1038/nature01097
  10. Henson, B. J., L. E. Watson, and S. R. Barnum. 2004. The evolutionary history of nitrogen fixation, as assessed by NifD. J. Mol. Evol. 58: 390-399 https://doi.org/10.1007/s00239-003-2560-0
  11. Herrero, J., A. Valencia, and J. Dopazo. 2001. A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17: 126-136 https://doi.org/10.1093/bioinformatics/17.2.126
  12. Herrero, J., F. Al-Shahrour, R. Diaz-Uriarte, A. Mateos, J. M. Vaquerizas, J. Santoyo, and J. Dopzo. 2003. GEPAS: A Web-based resource for microarray gene expression data analysis. Nucleic Acids Res. 31: 3461-3467 https://doi.org/10.1093/nar/gkg591
  13. Hong, S. H., T. Y. Kim, and S. Y. Lee. 2004. Phylogenetic analysis based on genome-scale metabolic pathway reaction content. Appl. Microbiol. Biotechnol. 65: 203-210
  14. Jin, J. H., U. S. Jung, J. W. Nam, Y. H. In, S. Y. Lee, D. H. Lee, and J. W. Lee. 2005. Construction of comprehensive metabolic network for glycolysis with regulation mechanisms and effectors. J. Microbiol. Biotechnol. 15: 161-174
  15. Kanehisa, M., S. Goto, S. Kawashima, Y. Okuno, and M. Hattori. 2004. The KEGG resource for deciphering the genome. Nucleic Acids Res. 32: D277-D280 https://doi.org/10.1093/nar/gkh063
  16. Kapatral, V., I. Anderson, N. Ivanova, G. Reznik, T. Los, A. Lykidis, A. Bhattacharyya, A. Bartman, W. Gardner, G. Grechkin, L. Zhu, O. Vasieva, L. Chu, Y. Kogan, O. Chaga, E. Goltsman, A. Bernal, N. Larsen, M. D'Souza, T. Walunas, G. Pusch, R. Haselkorn, M. Fonstein, N. Kyrpides, and R. Overbeek. 2002. Genome sequence and analysis of the oral bacterium Fusobacterium nucleatum strain ATCC 25586. J Bacteriol. 184: 2005-2018 https://doi.org/10.1128/JB.184.7.2005-2018.2002
  17. Katinka, M. D., S. Duprat, E. Cornillot, G. Metenier, F. Thomarat, G. Prensier, V. Barbe, E. Peyretaillade, P. Brottier, P. Wincker, F. Delbac, H. El Alaoui, P. Peyret, W. Saurin, M. Gouy, J. Weissenbach, and C. P. Vivares. 2001. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 414: 450-453 https://doi.org/10.1038/35106579
  18. Kim, S. H., K. Y. Kim, C. H. Kim, W. S. Lee, M. Chang, and J. H. Lee. 2004. Phylogenetic analysis of harmful algal bloom (HAB)-causing dinoflagellates along the Korean coasts, based on SSU rRNA gene. J. Microbiol. Biotechnol. 14: 959-966
  19. Kohonen, T. 1997. Self-Organizing Maps. Springer-Verlag, Berlin, Germany
  20. Koonin, E. V., K. S. Makarova, and L. Aravind. 2001. Horizontal gene transfer in prokaryotes: Quantification and classification. Annu. Rev. Microbiol. 55: 709-742 https://doi.org/10.1146/annurev.micro.55.1.709
  21. Kunin, V., D. Ahren, L. Goldovsky, P. Janssen, and C. A. Ouzounis. 2005. Measuring genome conservation across taxa: Divided strains and united kingdoms. Nucleic Acids Res. 33: 616-621 https://doi.org/10.1093/nar/gki181
  22. Ma, H. W. and A. P. Zeng. 2004. Phylogenetic comparison of metabolic capacities of organisms at genome level. Mol. Phylogenet. Evol. 31: 204-213 https://doi.org/10.1016/j.ympev.2003.08.011
  23. Martin, M. J., J. Herrero, A. Mateos, and J. Dopazo. 2003. Comparing bacterial genomes through conservation profiles. Genome Res. 13: 991-998 https://doi.org/10.1101/gr.678303
  24. Nelson, K. E., I. T. Paulsen, J. F. Heidelberg, and C. M. Fraser. 2000. Status of genome projects for nonpathogenic bacteria and archaea. Nat. Biotechnol. 18: 1049-1054 https://doi.org/10.1038/80235
  25. Park, H. G., H. G. Ko, S. H. Kim, and W. M. Park. 2004. Molecular identification of asian isolates of medicinal mushroom Hericium erinaceum by phylogenetic analysis of nuclear ITS rDNA. J. Microbiol. Biotechnol. 14: 816-821
  26. Peregrin-Alvarez, J. M., S. Tsoka, and C. A. Ouzounis. 2003. The phylogenetic extent of metabolic enzymes. Genome Res. 13: 422-427 https://doi.org/10.1101/gr.246903
  27. Shigenobu, S., H. Watanabe, M. Hattori, Y. Sakaki, and H. Ishikawa. 2000. Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407: 81-86 https://doi.org/10.1038/35024074
  28. Snel, B., M. A. Huynen, and B. E. Dutilh. 2005. Genome trees and the nature of genome evolution. Annu. Rev. Microbiol. 59: 191-209 https://doi.org/10.1146/annurev.micro.59.030804.121233
  29. Snel, B., P. Bork, and M. A. Huynen. 1999. Genome phylogeny based on gene content. Nat Genet. 21: 108-110 https://doi.org/10.1038/5052
  30. Tamames, J., C. Ouzounis, C. Sander, and A. Valencia. 1996. Genomes with distinct function composition. FEBS Lett. 389: 96-101 https://doi.org/10.1016/0014-5793(96)00527-3
  31. Tekaia, F., A. Lazcano, and B. Dujon. 1999. The genomic tree as revealed from whole proteome comparisons. Genome Res. 9: 550-557
  32. Waters, E., M. J. Hohn, I. Ahel, D. E. Graham, M. D. Adams, M. Barnstead, K. Y. Beeson, L. Bibbs, R. Bolanos, M. Keller, K. Kretz, X. Lin, E. Mathur, J. Ni, M. Podar, T. Richardson, G. G. Sutton, M. Simon, D. Soll, K. O. Stetter, J. M. Short, and M. Noordewier. 2003. The genome of Nanoarchaeum equitans: Insights into early archaeal evolution and derived parasitism. Proc. Natl. Acad. Sci. USA 100: 12984-12988
  33. Wolf, Y. I., I. B. Rogozin, N. V. Grishin, and E. V. Koonin. 2002. Genome trees and the tree of life. Trends Genet. 18: 472-479 https://doi.org/10.1016/S0168-9525(02)02744-0