• Title/Summary/Keyword: orthologous groups

Search Result 41, Processing Time 0.024 seconds

An Orthologous Group Clustering Technique based on the Grid Computing

  • Oh, J.S.;Kim, T.K.;Kim, S.S.;Kwon, H.R.;Kim, Y.C.;Yoo, J.S.;Cho, W.S.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.72-77
    • /
    • 2005
  • Orthologs are genes having the same function across different species that specialize from a single gene in the last common ancestor of these species. Orthologous groups are useful in the genome annotation, studies on gene evolution, and comparative genomics. However, the construction of an orthologous group is difficult to automate and it takes so much time. It is also hard to guarantee the accuracy of the constructed orthologous groups. We propose a system to construct orthologous groups on many genomes automatically and rapidly. We utilize the grid computing to reduce the sequence alignment time, and we use clustering algorithm in the application of database to automate whole processes. We have generated orthologous groups for 20 complete prokaryotes genomes just in a day because of the grid computing. Furthermore, new genomes can be accommodated easily by the clustering algorithm and grid computing. We compared the generated orthologous groups with COGs (Clusters of orthologous Group of proteins) and KO (KEGG Ortholog). The comparison shows about 85 percent similarity compared with previous well-known orthologous databases.

  • PDF

Investigation of Conserved Genes in Eukaryotes Common to Prokaryotes (원핵생물과 공통인 진핵생물의 보존적 유전자 탐색)

  • Lee, Dong-Geun
    • Journal of Life Science
    • /
    • v.23 no.4
    • /
    • pp.595-601
    • /
    • 2013
  • The clusters of orthologous groups of proteins (COG) algorithm was applied to identify essential proteins in eukaryotes and to measure the degree of conservation. Sixty-three orthologous groups, which were conserved in 66 microbial genomes, enlarged to 104 eukaryotic orthologous groups (KOGs) and 71 KOGs were conserved at the nuclear genome of 7 eucaryotes. Fifty-four of 71 translation-related genes were conserved, highlighting the importance of proteins in modern organisms. Translation initiation factors (KOG0343, KOG3271) and prolyl-tRNA synthetase (KOG4163) showed high conservation based on the distance value analysis. The genes of Caenorhabditis elegans appear to harbor high genetic variation because the genome showed the highest variation at 71 conserved proteins among 7 genomes. The 71 conserved genes will be valuable in basic and applied research, for example, targeting for antibiotic development.

Verifying Orthologous Paralogenes using Whole Genome Alignment

  • Chan, P.Y.;Lam, T.W.;Yiu, S.M.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.109-112
    • /
    • 2005
  • Identifying orthologous paralogenes is a fundamental problem in comparative genomics and can facilitate the study of evolutionary history of the species. Existing approaches for locating paralogs make use of local alignment based algorithms such as BLAST. However, there are cases that genes with high alignment scores are not paralogenes. On the other hand, whole genome alignment tools are designed to locate orthologs. Most of these tools are based on some unique substrings (called anchors) in the corresponding orthologous pair to identify them. Intuitively, these tools may not be useful in identifying orthologous paralogenes as paralogenes are very similar and there may not be enough unique anchors. However, our study shows that this is not true. Paralogenes although are similar, they have undergone different mutations. So, there are enough unique anchors for identifying them. Our contributions include the followings. Based on this counter-intuitive finding, we propose to employ the whole genome alignment tools to help verifying paralogenes. Our experiments on five pairs of human-mouse chromosomes show that our approach is effective and can identify most of the mis-classified paralog groups (more than 80%). We verify our finding that whole genome alignment tools are able to locate orthologous paralogenes through a simulation study. The result from the study confirms our finding.

  • PDF

Investigation of COGs (Clusters of Orthologous Groups of proteins) in 1,309 Species of Prokaryotes (원핵생물 1,309종에 분포된 COGs (Clusters of Orthologous Groups of proteins) 연구)

  • Lee, Dong-Geun;Lee, Sang-Hyeon
    • Journal of Life Science
    • /
    • v.31 no.9
    • /
    • pp.834-839
    • /
    • 2021
  • Authors previously reported the results of analyses of COGs (Clusters of Orthologous Groups of proteins) in 711 prokaryotes. The data of COGs were significantly updated for 2020 using 1,309 prokaryotic genomes. Here, we report the results of analyses of 3,455,853 proteins comprising 4,877 updated COGs in terms of COGs and prokaryotes. The numbers of COGs in each prokaryote ranged from 97 to 2,281, with an average of 1,430.0 and a standard deviation of 414.2. Mean numbers of COGs at the phylum level were minimal 497.86 for Mollicutes and maximal 1,642.90 for Cyanobacteria. The top 10 species with the highest COG retention numbers were all Proteobacteria, and 9 out of the bottom 10 were those that could not be cultured in vitro. The numbers of proteins belonging to each COG ranged from 2 to 22,048, with over 12,000 proteins up to the top 11. Five of the top 11 were COGs that bind to DNA and were involved in the gene expression, indicating the importance of regulating gene expression in prokaryotes in a changing environment. COG data are expected to be widely utilized as they can be used for the identification of genes included in the genome and the selection of genes for the strain improvement.

Comparison of Mitochondria-related Conserved Genes in Eukaryotes and Prokaryotes (진핵생물과 원핵생물의 미토콘드리아 관련 보존적 유전자 비교)

  • Lee, Dong-Geun
    • Journal of Life Science
    • /
    • v.24 no.7
    • /
    • pp.791-797
    • /
    • 2014
  • Sixty-two conserved orthologous groups (OGs) of proteins, in 63 prokaryotes and seven eukaryotes were analyzed to identify essential proteins in the mitochondria of eukaryotes, and their counterparts in prokaryotes. Twenty OGs were common in eukaryotic mitochondria, and all were translation related. Encephalitozoon cuniculi, an obligate parasitic eukaryote, shares no common mitochondrial OGs with the other 69 organisms. Seventeen conserved OGs were mitochondria related in the 69 organisms. Mitochondria related- and nonrelated-OGs were divided into prokaryotic genomes (p<0.001, paired t-test) unlike eukaryotic genomes in the distance value analysis. The most commonly conserved mitochondria-related OG was COG0048-KOG1750 (ribosomal small subunit S12), whereas it was COG0100-KOG0407 (ribosomal small subunit S11) in nonrelated OGs. These results could be applied in scientific research to determine phylogenetic relationships and in areas such as drug development.

Detection of Conserved Genes in Proteobacteria by using a COG Algorithm (COG 알고리즘을 통한 Proteobacteria의 보존적 유전자 파악)

  • 이동근;강호영;이재화;김철민
    • KSBB Journal
    • /
    • v.17 no.6
    • /
    • pp.560-565
    • /
    • 2002
  • A COG(clusters of orthologous groups of proteins) algorithm was used to detect conserved genes within Proteobacteria and to figure out their relationships. Restricting comparison to the sequences of 42 procaryotes, 33 eubacteria and 16 Proteobacteria, the number of conserved genes was increased. All analyzed procaryotes shared 75 COGs. COG0195, COG0358 and COG0528 were only represented by the 42 procaryotes. Sixtyfour COGs were added as conserved genes in 33 eubacteria. Each Proteobacteria group has a unique repertoire of COGs. Metabolic COGs were more diverse in the beta Proteobacteria group than in the other groups. These results could be used to determine the origins and the evolutionary relationships of Proteobacteria. The possibilities of detecting new biological molecules is high in phylogenetically related organisms, hence the identification of useful proteins by using this algorithm is possible.

COG 알고리즘으로 파악한 Proteobacteria의 보존적 유전자

  • Lee, Dong-Geun;Lee, Jin-Ok;Lee, Jae-Hwa
    • 한국생물공학회:학술대회논문집
    • /
    • 2003.04a
    • /
    • pp.715-718
    • /
    • 2003
  • A COG (clusters of orthologous groups of proteins) algorithm, protein similarities among genomes, was used to detect conserved genes and to figure out their relationships within 42 procaryote, 33 Bacteria and 16 Proteobacteria All analyzed procaryotes shared 75 COGs. COG0195, COG0358 and COG0528 were only represented by the 42 procaryotes. Sixty-four COGs were added as conserved genes in 33 eubacteria. Each Proteobacteria group has a unique repertoire of COGs. Metabolic COGs were more diverse in the beta-Proteobacteria group than in the other groups. The possibilities of detecting new biological molecules is high in phylogenetically related organisms, hence the identification of useful proteins by using this algorithm is possible.

  • PDF

Expressed Sequence Tags of Trichinella spiralis Muscle Stage Larvae

  • Park, Hae-Kyung;Chang, Seong-Won;Kang, Se-Won;Cho, Min-Kyoung;Choi, Sun-Hee;Hong, Yeon-Chul;Lee, Yong-Seok;Jeong, Hae-Jin;Yu, Hak-Sun
    • Parasites, Hosts and Diseases
    • /
    • v.46 no.2
    • /
    • pp.59-63
    • /
    • 2008
  • In order to obtain greater insight into the relevant genomic expression patterns of Trichinella spiralis, 992 expressed sequence tags (ESTs) were collected from a cDNA library of T. spiralis muscle stage larvae and assembled into 60 clusters and 385 singletons. Of them, 445 (44.7%) ESTs were annotated to their homologous genes, and small fractions were matched to known genes of nematodes. The annotated ESTs were classified into 25 eukaryotic orthologous groups (KOG). Cytochrome C oxidase (34 clones) was found to be most frequent species.

Conservative Genes of Less Orthologous Prokaryotes (Orthologs 수가 적은 원핵생물들의 보존적 유전자)

  • Lee, Dong-Geun
    • Journal of Life Science
    • /
    • v.27 no.6
    • /
    • pp.694-701
    • /
    • 2017
  • Mycoplasma genitalium represents the smallest genome among mono-cultivable prokaryotes. To discover and compare the orthologs (conservative genes) among M. genitalium and 14 prokaryotes that are uncultivable and have less orthologs than M. genitalium, COG (clusters of orthologous groups of protein) analyses were applied. The analyzed prokaryotes were M. genitalium, one hyperthermophilic exosymbiotic archaeon Nanoarchaeum equitans, four intracellular plant pathogenic eubacteria of Candidatus Phytoplasma genus, and nine endosymbiotic eubacteria of phloem- and xylem-feeding insects. Among 367 orthologs of M. genitalium, 284 orthologs were conservative between M. genitalium and at least one other prokaryote. All 15 prokaryotes commonly have 29 orthologs, representing the significance of proteins in life. They belong to 25 translation-related, including 22 ribosomal proteins, 3 subunits of RNA polymerase, and 1 protein-folding-related. Among the 15 prokaryotes, 40 orthologs were only found in all four Candidatus Phytoplasma. The other nine Candidatus, all endosymbionts with insects, showed only a single common COG0539 (ribosomal protein S1), representing the diversity of orthologs among them. These results might provide clues to understand conservative genes in uncultivable prokaryotes, and may be helpful in industrial areas, such as handling prokaryotes producing amino acids and antibiotics, and as precursors of organic synthesis.

Genetic Composition Analysis of Marine-Origin Euryarchaeota by using a COG Algorithm (COG 알고리즘을 통한 해양성 Euryarchaeota의 유전적 조성 분석)

  • 이재화;이동근;김철민;이은열
    • Journal of Life Science
    • /
    • v.13 no.3
    • /
    • pp.298-307
    • /
    • 2003
  • To figure out the conserved genes and newly added genes at each phylogenetic level of Archaea, COG (clusters of orthologous groups of proteins) algorithm was applied. The number of conserved genes within 9 species of Archaea was 340 and that of 8 species of Euryarchaeota was 388. Many of conserved 265 COGs, which are specific to Archaea and absent in Bacteria and S. cerevisiae, were concerned with 'information storage and processing' (94 COG, 35.5%) and 'metabolism' (82 COG, 30.9%). COGs related to these functions were assumed as highly conserved and permit peculiar life form to Archaea. It seemed that there was some difference in 'nucleotide transport and metabolism' and there was little difference in 'information storage and processing' between Euryarchaeota and Crenarchaeota. Marine-origin Euryarchaeota showed different conserved COGs with terrestrial Euryarchaeota. Conserved COGs, related to carbohydrate transport and metabolism and others, were different between marine- and terrestrial-origin Euryarchaeota. Hence it was assumed that their physiology might be different. This study may help to understand the origin and conserved genes at each phylogenetic level of marine-origin Euryarchaeota and may help in the mining of useful genes in marine Archaea as Manco et al. (Arch. Biochem. Biophy. 373, 182 (2000)).