• Title/Summary/Keyword: gene annotation

Search Result 181, Processing Time 0.027 seconds

Gene annotation by the "interactome"analysis in KEGG

  • Kanehisa, Minoru
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.56-58
    • /
    • 2000
  • Post-genomics may be defined in different ways depending on how one views the challenges after the genome. A popular view is to follow the concept of the central dogma in molecular biology, namely from genome to transcriptome to proteome. Projects are going on to analyze gene expression profiles both at the mRNA and protein levels and to catalog protein 3D structure families, which will no doubt help the understanding of information in the genome. However complete, such catalogs of genes, RNAs, and proteins only tell us about the building blocks of life. They do not tell us much about the wiring (interaction) of building blocks, which is essential for uncovering systemic functional behaviors of the cell or the organism. Thus, an alternative view of post-genomics is to go up from the molecular level to the cellular level, and to understand, what I call, the "interactome"or a complete picture of molecular interactions in the cell. KEGG (http://www.genome.ad.jp/kegg/) is our attempt to computerize current knowledge on various cellular processes as a collection of "generalized"protein-protein interaction networks, to develop new graph-based algorithms for predicting such networks from the genome information, and to actually reconstruct the interactomes for all the completely sequenced genomes and some partial genomes. During the reconstruction process, it becomes readily apparent that certain pathways and molecular complexes are present or absent in each organism, indicating modular structures of the interactome. In addition, the reconstruction uncovers missing components in an otherwise complete pathway or complex, which may result from misannotation of the genome or misrepresentation of the KEGG pathway. When combined with additional experimental data on protein-protein interactions, such as by yeast two-hybrid systems, the reconstruction possibly uncovers unknown partners for a particular pathway or complex. Thus, the reconstruction is tightly coupled with the annotation of individual genes, which is maintained in the GENES database in KEGG. We are also trying to expand our literature surrey to include in the GENES database most up-to-date information about gene functions.

  • PDF

Development of an Analysis Program of Type I Polyketide Synthase Gene Clusters Using Homology Search and Profile Hidden Markov Model

  • Tae, Hong-Seok;Sohng, Jae-Kyung;Park, Kie-Jung
    • Journal of Microbiology and Biotechnology
    • /
    • v.19 no.2
    • /
    • pp.140-146
    • /
    • 2009
  • MAPSI(Management and Analysis for Polyketide Synthase Type I) has been developed to offer computational analysis methods to detect type I PKS(polyketide synthase) gene clusters in genome sequences. MAPSI provides a genome analysis component, which detects PKS gene clusters by identifying domains in proteins of a genome. MAPSI also contains databases on polyketides and genome annotation data, as well as analytic components such as new PKS assembly and domain analysis. The polyketide data and analysis component are accessible through Web interfaces and are displayed with diverse information. MAPSI, which was developed to aid researchers studying type I polyketides, provides diverse components to access and analyze polyketide information and should become a very powerful computational tool for polyketide research. The system can be extended through further studies of factors related to the biological activities of polyketides.

A Comprehensive Review of Emerging Computational Methods for Gene Identification

  • Yu, Ning;Yu, Zeng;Li, Bing;Gu, Feng;Pan, Yi
    • Journal of Information Processing Systems
    • /
    • v.12 no.1
    • /
    • pp.1-34
    • /
    • 2016
  • Gene identification is at the center of genomic studies. Although the first phase of the Encyclopedia of DNA Elements (ENCODE) project has been claimed to be complete, the annotation of the functional elements is far from being so. Computational methods in gene identification continue to play important roles in this area and other relevant issues. So far, a lot of work has been performed on this area, and a plethora of computational methods and avenues have been developed. Many review papers have summarized these methods and other related work. However, most of them focus on the methodologies from a particular aspect or perspective. Different from these existing bodies of research, this paper aims to comprehensively summarize the mainstream computational methods in gene identification and tries to provide a short but concise technical reference for future studies. Moreover, this review sheds light on the emerging trends and cutting-edge techniques that are believed to be capable of leading the research on this field in the future.

Identifying Statistically Significant Gene-Sets by Gene Set Enrichment Analysis Using Fisher Criterion (Fisher Criterion을 이용한 Gene Set Enrichment Analysis 기반 유의 유전자 집합의 검출 방법 연구)

  • Kim, Jae-Young;Shin, Mi-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.4
    • /
    • pp.19-26
    • /
    • 2008
  • Gene set enrichment analysis (GSEA) is a computational method to identify statistically significant gene sets showing significant differences between two groups of microarray expression profiles and simultaneously uncover their biological meanings in an elegant way by employing gene annotation databases, such as Cytogenetic Band, KEGG pathways, gene ontology, and etc. For the gone set enrichment analysis, all the genes in a given dataset are first ordered by the signal-to-noise ratio between the groups and then further analyses are proceeded. Despite of its impressive results in several previous studies, however, gene ranking by the signal-to-noise ratio makes it difficult to consider highly up-regulated genes and highly down-regulated genes at the same time as the candidates of significant genes, which possibly reflect certain situations incurred in metabolic and signaling pathways. To deal with this problem, in this article, we investigate the gene set enrichment analysis method with Fisher criterion for gene ranking and also evaluate its effects in Leukemia related pathway analyses.

OryzaGP 2021 update: a rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Liu, Yusha;Yao, Xinzhi;Xia, Jingbo
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.27.1-27.4
    • /
    • 2021
  • Due to the rapid evolution of high-throughput technologies, a tremendous amount of data is being produced in the biological domain, which poses a challenging task for information extraction and natural language understanding. Biological named entity recognition (NER) and named entity normalisation (NEN) are two common tasks aiming at identifying and linking biologically important entities such as genes or gene products mentioned in the literature to biological databases. In this paper, we present an updated version of OryzaGP, a gene and protein dataset for rice species created to help natural language processing (NLP) tools in processing NER and NEN tasks. To create the dataset, we selected more than 15,000 abstracts associated with articles previously curated for rice genes. We developed four dictionaries of gene and protein names associated with database identifiers. We used these dictionaries to annotate the dataset. We also annotated the dataset using pretrained NLP models. Finally, we analysed the annotation results and discussed how to improve OryzaGP.

Development of Integrated Retrieval System Based on Web Service for Gene Annotation Database (웹 서비스 기반 윤전자 주석정보 통합검색 시스템 구축)

  • 이희전;용환승
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2003.11a
    • /
    • pp.355-358
    • /
    • 2003
  • 최근 바이오인포매틱스 분야에서는 유전자 주석정보 데이터들의 통합 방안에 대한 논의가 활발하게 진행 중에 있다. 본 논문에서는 BioDAS의 웹 서비스 개념을 이용, 분산된 주석 데이터서버들간의 통합검색 시스템을 구축함으로써 메타검색 시스템을 구현하였다. 본 시스템은 사용자에게 메타검색 기능 및 결과 저장기능을 제공해 주며 외부 사용자에게 웹 서비스를 제공한다.

  • PDF

Genome Sequencing and Genome-Wide Identification of Carbohydrate-Active Enzymes (CAZymes) in the White Rot Fungus Flammulina fennae

  • Lee, Chang-Soo;Kong, Won-Sik;Park, Young-Jin
    • Microbiology and Biotechnology Letters
    • /
    • v.46 no.3
    • /
    • pp.300-312
    • /
    • 2018
  • Whole-genome sequencing of the wood-rotting fungus, Flammulina fennae, was carried out to identify carbohydrate-active enzymes (CAZymes). De novo genome assembly (31 kmer) of short reads by next-generation sequencing revealed a total genome length of 32,423,623 base pairs (39% GC). A total of 11,591 gene models in the assembled genome sequence of F. fennae were predicted by ab initio gene prediction using the AUGUSTUS tool. In a genome-wide comparison, 6,715 orthologous groups shared at least one gene with F. fennae and 10,667 (92%) of 11,591 genes for F. fennae proteins had orthologs among the Dikarya. Additionally, F. fennae contained 23 species-specific genes, of which 16 were paralogous. CAZyme identification and annotation revealed 513 CAZymes, including 82 auxiliary activities, 220 glycoside hydrolases, 85 glycosyltransferases, 20 polysaccharide lyases, 57 carbohydrate esterases, and 45 carbohydrate binding-modules in the F. fennae genome. The genome information of F. fennae increases the understanding of this basidiomycete fungus. CAZyme gene information will be useful for detailed studies of lignocellulosic biomass degradation for biotechnological and industrial applications.

HorseDB; an Integrated Horse Resource and Web Service (말 데이터베이스 구축)

  • Kim Dae-Soo;Jo Un-Jong;Huh Jae-Won;Choe Eun-Sang;Cho Byung-Wook;Kim Heui-Soo
    • Journal of Life Science
    • /
    • v.16 no.3 s.76
    • /
    • pp.472-476
    • /
    • 2006
  • We have built a database server called HorseDB which contains the genome annotation information and biological information for horse from public database entries. The aims of HorseDB are the integration of biological information and horse genome data on genome scale using bioinformatic methods. To facilitate the extraction of useful information among collected horse genome and biological data, we developed a user-friendly interface system, HorseDB; an Integrated Horse Resource and web Service. The database is categorized by the general horse information data, a sequence annotation data, and a world-wide web analysis program interface. The database also provides an easy access for user to find out the useful information within horse genomes and support analyzed information, such as sequence alignment and gene annotation results. HorseDB can be accessed at http://www.primate.or.kr./horse.

Genomic Analysis of a Freshwater Actinobacterium, "Candidatus Limnosphaera aquatica" Strain IMCC26207, Isolated from Lake Soyang

  • Kim, Suhyun;Kang, Ilnam;Cho, Jang-Cheon
    • Journal of Microbiology and Biotechnology
    • /
    • v.27 no.4
    • /
    • pp.825-833
    • /
    • 2017
  • Strain IMCC26207 was isolated from the surface layer of Lake Soyang in Korea by the dilutionto-extinction culturing method, using a liquid medium prepared with filtered and autoclaved lake water. The strain could neither be maintained in a synthetic medium other than natural freshwater medium nor grown on solid agar plates. Phylogenetic analysis of 16S rRNA gene sequences indicated that strain IMCC26207 formed a distinct lineage in the order Acidimicrobiales of the phylum Actinobacteria. The closest relative among the previously identified bacterial taxa was "Candidatus Microthrix parvicella" with 16S rRNA gene sequence similarity of 91.7%. Here, the draft genome sequence of strain IMCC26207, a freshwater actinobacterium, is reported with the description of the genome properties and annotation summary. The draft genome consisted of 10 contigs with a total size of 3,316,799 bp and an average G+C content of 57.3%. The IMCC26207 genome was predicted to contain 2,975 protein-coding genes and 51 non-coding RNA genes, including 45 tRNA genes. Approximately 76.8% of the protein coding genes could be assigned with a specific function. Annotation of the IMCC26207 genome showed several traits of adaptation to living in oligotrophic freshwater environments, such as phosphorus-limited condition. Comparative genomic analysis revealed that the genome of strain IMCC26207 was distinct from that of "Candidatus Microthrix" strains; therefore, we propose the name "Candidatus Limnosphaera aquatica" for this bacterium.