• Title/Summary/Keyword: Genome similarity

Search Result 202, Processing Time 0.023 seconds

Gene Set Analyses of Genome-Wide Association Studies on 49 Quantitative Traits Measured in a Single Genetic Epidemiology Dataset

  • Kim, Jihye;Kwon, Ji-Sun;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • v.11 no.3
    • /
    • pp.135-141
    • /
    • 2013
  • Gene set analysis is a powerful tool for interpreting a genome-wide association study result and is gaining popularity these days. Comparison of the gene sets obtained for a variety of traits measured from a single genetic epidemiology dataset may give insights into the biological mechanisms underlying these traits. Based on the previously published single nucleotide polymorphism (SNP) genotype data on 8,842 individuals enrolled in the Korea Association Resource project, we performed a series of systematic genome-wide association analyses for 49 quantitative traits of basic epidemiological, anthropometric, or blood chemistry parameters. Each analysis result was subjected to subsequent gene set analyses based on Gene Ontology (GO) terms using gene set analysis software, GSA-SNP, identifying a set of GO terms significantly associated to each trait ($p_{corr}$ < 0.05). Pairwise comparison of the traits in terms of the semantic similarity in their GO sets revealed surprising cases where phenotypically uncorrelated traits showed high similarity in terms of biological pathways. For example, the pH level was related to 7 other traits that showed low phenotypic correlations with it. A literature survey implies that these traits may be regulated partly by common pathways that involve neuronal or nerve systems.

Relationship Between Genome Similarity and DNA-DNA Hybridization Among Closely Related Bacteria

  • Kang, Cheol-Hee;Nam, Young-Do;Chung, Won-Hyong;Quan, Zhe-Xue;Park, Yong-Ha;Park, Soo-Je;Desmone, Racheal;Wan, Xiu-Feng;Rhee, Sung-Keun
    • Journal of Microbiology and Biotechnology
    • /
    • v.17 no.6
    • /
    • pp.945-951
    • /
    • 2007
  • DNA-DNA hybridization has been established as an important technology in bacterial species taxonomy and phylogenetic analysis. In this study, we analyzed how the efficiency with which the genomic DNA from one species hybridizes to the genomic DNA of another species (DNA-DNA hybridization) in microarray analysis relates to the similarity between two genomes. We found that the predicted DNA-DNA hybridization based on genome sequence similarity correlated well with the experimentally determined microarray hybridization. Between closely related strains, significant numbers of highly divergent genes (>55% identity) and/or the accumulation of mismatches between conserved genes lowered the DNA-DNA hybridization signal, and this reduced the hybridization signals to below 70% for even bacterial strains with over 97% 16S rRNA gene identity. In addition, our results also suggest that a DNA-DNA hybridization signal intensity of over 40% indicates that two genomes at least shared 30% conserved genes (>60% gene identity). This study may expand our knowledge of DNA-DNA hybridization based on genomic sequence similarity comparison and further provide insights for bacterial phylogeny analyses.

A New Approach to Find Orthologous Proteins Using Sequence and Protein-Protein Interaction Similarity

  • Kim, Min-Kyung;Seol, Young-Joo;Park, Hyun-Seok;Jang, Seung-Hwan;Shin, Hang-Cheol;Cho, Kwang-Hwi
    • Genomics & Informatics
    • /
    • v.7 no.3
    • /
    • pp.141-147
    • /
    • 2009
  • Developed proteome-scale ortholog and paralog prediction methods are mainly based on sequence similarity. However, it is known that even the closest BLAST hit often does not mean the closest neighbor. For this reason, we added conserved interaction information to find orthologs. We propose a genome-scale, automated ortholog prediction method, named OrthoInterBlast. The method is based on both sequence and interaction similarity. When we applied this method to fly and yeast, 17% of the ortholog candidates were different compared with the results of Inparanoid. By adding protein-protein interaction information, proteins that have low sequence similarity still can be selected as orthologs, which can not be easily detected by sequence homology alone.

The Analysis of Genome Database Compaction based on Sequence Similarity (시퀀스 유사도에 기반한 유전체 데이터베이스 압축 및 영향 분석)

  • Kwon, Sunyoung;Lee, Byunghan;Park, Seunghyun;Jo, Jeonghee;Yoon, Sungroh
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.4
    • /
    • pp.250-255
    • /
    • 2017
  • Given the explosion of genomic data and expansion of applications such as precision medicine, the importance of efficient genome-database management continues to grow. Traditional compression techniques may be effective in reducing the size of a database, but a new challenge follows in terms of performing operations such as comparison and searches on the compressed database. Based on that many genome databases typically have numerous duplicated or similar sequences, and that the runtime of genome analyses is normally proportional to the number of sequences in a database, we propose a technique that can compress a genome database by eliminating similar entries from the database. Through our experiments, we show that we can remove approximately 84% of sequences with 1% similarity threshold, accelerating the downstream classification tasks by approximately 10 times. We also confirm that our compression method does not significantly affect the accuracy of taxonomy diversity assessments or classification.

Compositional Correlations in Canine Genome Reflects Similarity with Human Genes

  • Joy, Faustin;Basak, Surajit;Gupta, Sanjib Kumar;Das, Pranab Jyoti;Ghosh, Shankar Kumar;Ghosh, Tapash Chandra
    • BMB Reports
    • /
    • v.39 no.3
    • /
    • pp.240-246
    • /
    • 2006
  • The base compositional correlations that hold among various coding and noncoding regions of the canine genome have been analysed. The distribution pattern of genes, on the basis of $GC_3$ composition, shows a wide range similar to that observed in human. However the occurrence of maximum number of genes was observed in the range of 65-75% of $GC_3$ composition. The correlation between the coding DNA sequences of canine with the different noncoding regions (introns and flanking regions) is found to be significant and in many cases the degree of correlation show similarity to human genome. We found that these correlations are not limited to the GC content alone, but is holding at the level of the frequency of individual bases as well. The present study suggests that canines ideally belong to the predicted 'general mammalian pattern' of genome composition along with human beings.

Evaluation of DNA Microarray Approach for Identifying Strain-Specific Genes

  • Hwang, Keum-Ok;Cho, Jae-Chang
    • Journal of Microbiology and Biotechnology
    • /
    • v.16 no.11
    • /
    • pp.1773-1777
    • /
    • 2006
  • We evaluated the usefulness of DNA microarray as a comparative genomics tool, and tested the validity of the cutoff values for defining absent genes in test genomes. Three genome-sequenced E. coli strains (K-12, EDL933, and CFT073) were subjected to comparative genomic hybridization with DNA microarrays covering almost all ORFs of the reference strain K-12, and the microarray results were compared with the results obtained from in silico analyses of genome sequences. For defining the K-12 ORFs absent in test genomes (reference strain-specific ORFs), we applied and evaluated the cutoff level of -1. The average sequence similarity between ORFs, to which corresponding spots showed a log-ratio of>-1, was $96.9{\pm}4.8$. The numbers of spots showing a log-ratio of <-1 (P<0.05, t-test) were 90 (2.5%) and 417 (10.6%) for the EDL933 genome and the CFT073 genome, respectively. Frequency of false negatives (FN) was ca. 0.2, and the cutoff level of -1.3 was required to achieve the FN of 0.1. The average sequence similarity of the false negative ORFs was $77.8{\pm}14.8$, indicating that the majority of the false negatives were caused by highly divergent genes. We concluded that the microarray is useful for identifying missing or divergent ORFs in closely related prokaryotic genomes.

NOGSEC: A NOnparametric method for Genome SEquence Clustering (녹섹(NOGSEC): A NOnparametric method for Genome SEquence Clustering)

  • 이영복;김판규;조환규
    • Korean Journal of Microbiology
    • /
    • v.39 no.2
    • /
    • pp.67-75
    • /
    • 2003
  • One large topic in comparative genomics is to predict functional annotation by classifying protein sequences. Computational approaches for function prediction include protein structure prediction, sequence alignment and domain prediction or binding site prediction. This paper is on another computational approach searching for sets of homologous sequences from sequence similarity graph. Methods based on similarity graph do not need previous knowledges about sequences, but largely depend on the researcher's subjective threshold settings. In this paper, we propose a genome sequence clustering method of iterative testing and graph decomposition, and a simple method to calculate a strict threshold having biochemical meaning. Proposed method was applied to known bacterial genome sequences and the result was shown with the BAG algorithm's. Result clusters are lacking some completeness, but the confidence level is very high and the method does not need user-defined thresholds.

KUGI: A Database and Search System for Korean Unigene and Pathway Information

  • Yang, Jin-Ok;Hahn, Yoon-Soo;Kim, Nam-Soon;Yu, Ung-Sik;Woo, Hyun-Goo;Chu, In-Sun;Kim, Yong-Sung;Yoo, Hyang-Sook;Kim, Sang-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.407-411
    • /
    • 2005
  • KUGI (Korean UniGene Information) database contains the annotation information of the cDNA sequences obtained from the disease samples prevalent in Korean. A total of about 157,000 5'-EST high throughput sequences collected from cDNA libraries of stomach, liver, and some cancer tissues or established cell lines from Korean patients were clustered to about 35,000 contigs. From each cluster a representative clone having the longest high quality sequence or the start codon was selected. We stored the sequences of the representative clones and the clustered contigs in the KUGI database together with their information analyzed by running Blast against RefSeq, human mRNA, and UniGene databases from NCBI. We provide a web-based search engine fur the KUGI database using two types of user interfaces: attribute-based search and similarity search of the sequences. For attribute-based search, we use DBMS technology while we use BLAST that supports various similarity search options. The search system allows not only multiple queries, but also various query types. The results are as follows: 1) information of clones and libraries, 2) accession keys, location on genome, gene ontology, and pathways to public databases, 3) links to external programs, and 4) sequence information of contig and 5'-end of clones. We believe that the KUGI database and search system may provide very useful information that can be used in the study for elucidating the causes of the disease that are prevalent in Korean.

  • PDF

Complete chloroplast genome sequence of Clematis calcicola (Ranunculaceae), a species endemic to Korea

  • Beom Kyun PARK;Young-Jong JANG;Dong Chan SON;Hee-Young GIL;Sang-Chul KIM
    • Korean Journal of Plant Taxonomy
    • /
    • v.52 no.4
    • /
    • pp.262-268
    • /
    • 2022
  • The complete chloroplast genome (cp genome) sequence of Clematis calcicola J. S. Kim (Ranunculaceae) is 159,655 bp in length. It consists of large (79,451 bp) and small (18,126 bp) single-copy regions and a pair of identical inverted repeats (31,039 bp). The genome contains 92 protein-coding genes, 36 transfer RNA genes, eight ribosomal RNA genes, and two pseudogenes. A phylogenetic analysis based on the cp genome of 19 taxa showed high similarity between our cp genome and data published for C. calcicola, which is recognized as a species endemic to the Korean Peninsula. The complete cp genome sequence of C. calcicola reported here provides important information for future phylogenetic and evolutionary studies of Ranunculaceae.

Complete Sequence Analysis of a Korean Isolate of Chinese Yam Necrotic Mosaic Virus and Generation of the Virus Specific Primers for Molecular Detection

  • Kwon, Sun-Jung;Cho, In-Sook;Choi, Seung-Kook;Yoon, Ju-Yeon;Choi, Gug-Seoun
    • Research in Plant Disease
    • /
    • v.22 no.3
    • /
    • pp.194-197
    • /
    • 2016
  • Chinese yam necrotic mosaic virus (CYNMV) is one of the most widespread viruses in Chinese yam (Dioscorea opposita Thunb.) and causes serious yield losses. Currently, genetic information of CYNMV is very restricted and complete genome sequences of only two isolates (one from Japan and another from China) have been reported. In this study, we determined complete genome sequence of the CYNMV isolate AD collected from Andong, Korea. Genetic analysis of the polyprotein amino acid sequence revealed that the Korean isolate AD has high similarity with the Japanese isolate PES3 (97%) but relatively low similarity with the Chinese isolate FX1 (78%). Phylogenetic analysis using the CYNMV 3' proximal nucleotide sequences harboring the coat protein and 3' untranslated region further supported genetic relationship among the CYNMV isolates. Based on comparative analysis of the CYNMV genome sequences determined in this study and other previous studies, we generated molecular detection primers that are highly specific and efficient for CYNMV diagnosis.