• Title/Summary/Keyword: genome annotation

Search Result 182, Processing Time 0.02 seconds

Hybrid Fungal Genome Annotation Pipeline Combining ab initio, Evidence-, and Homology-based gene model evaluation

  • Min, Byoungnam;Choi, In-Geol
    • 한국균학회소식:학술대회논문집
    • /
    • 2018.05a
    • /
    • pp.22-22
    • /
    • 2018
  • Fungal genome sequencing and assembly have been trivial in these days. Genome analysis relies on high quality of gene prediction and annotation. Automatic fungal genome annotation pipeline is essential for handling genomic sequence data accumulated exponentially. However, building an automatic annotation procedure for fungal genomes is not an easy task. FunGAP (Fungal Genome Annotation Pipeline) is developed for precise and accurate prediction of gene models from any fungal genome assembly. To make high-quality gene models, this pipeline employs multiple gene prediction programs encompassing ab initio, evidence-, and homology-based evaluation. FunGAP aims to evaluate all predicted genes by filtering gene models. To make a successful filtering guide for removal of false-positive genes, we used a scoring function that seeks for a consensus by estimating each gene model based on homology to the known proteins or domains. FunGAP is freely available for non-commercial users at the GitHub site (https://github.com/CompSynBioLab-KoreaUniv/FunGAP).

  • PDF

Caution and Curation for Complete Mitochondrial Genome from Next-Generation Sequencing: A Case Study from Dermatobranchus otome (Gastropoda, Nudibranchia)

  • Do, Thinh Dinh;Choi, Yisoo;Jung, Dae-Wui;Kim, Chang-Bae
    • Animal Systematics, Evolution and Diversity
    • /
    • v.36 no.4
    • /
    • pp.336-346
    • /
    • 2020
  • Mitochondrial genome is an important molecule for systematic and evolutionary studies in metazoans. The development of next-generation sequencing (NGS) technique has rapidly increased the number of mitogenome sequences. The process of generating mitochondrial genome based on NGS includes different steps, from DNA preparation, sequencing, assembly, and annotation. Despite the effort to improve sequencing, assembly, and annotation methods of mitogenome, the low quality and/or quantity sequence in the final map can still be generated through the work. Therefore, it is necessary to check and curate mitochondrial genome sequence after annotation for proofreading and feedback. In this study, we introduce the pipeline for sequencing and curation for mitogenome based on NGS. For this purpose, two mitogenome sequences of Dermatobranchus otome were sequenced by Illumina Miseq system with different amount of raw read data. Generated reads were targeted for assembly and annotation with commonly used programs. As abnormal repeat regions present in the mitogenomes after annotation, primers covering these regions were designed and conventional PCR followed by Sanger sequencing were performed to curate the mitogenome sequences. The obtained sequences were used to replace the abnormal region. Following the replacement, each mitochondrial genome was compared with the other as well as the sequences of close species available on the Genbank for confirmation. After curation, two mitogenomes of D. otome showed a typically circular molecule with 14,559 bp in size and contained 13 protein-coding genes, 22 tRNA genes, two rRNA genes. The phylogenetic tree revealed a close relationship between D. otome and Tritonia diomea. The finding of this study indicated the importance of caution and curation for the generation of mitogenome from NGS.

Complete genome sequence of Pantoea intestinalis SRCM103226, a microbial C40 carotenoid zeaxanthin producer (식용곤충 갈색거저리에서 분리한 카로테노이드 생성균주인 Pantoea intestinalis SRCM103226 균주의 유전체 해독)

  • Kim, Jin Won;Ha, Gwangsu;Jeong, Seong-Yeop;Jeong, Do-Youn
    • Korean Journal of Microbiology
    • /
    • v.55 no.2
    • /
    • pp.167-170
    • /
    • 2019
  • Pantoea intestinalis SRCM103226, isolated from edible insect mealworm overproduces zeaxanthin as a main carotenoid. The complete genome of P. intestinalis SRCM103226 was sequenced using the Pacific Biosciences (PacBio) RS II platform. The genome of P. intestinalis SRCM103226 comprises a 4,784,919 bp circular chromosome (53.41% G+C content), and is devoid of any extrachromosomal plasmids. Annotation using the RAST server reveals 4,332 coding sequences and 107 RNAs (22 rRNA genes, 85 tRNA genes). Genome annotation analysis revealed that it has five genes involved in the carotenoid pathway. The genome information provides fundamental knowledge for comparative genomics studies of the zeaxanthin pathway.

PromoterWizard: An Integrated Promoter Prediction Program Using Hybrid Methods

  • Park, Kie-Jung;Kim, Ki-Bong
    • Genomics & Informatics
    • /
    • v.9 no.4
    • /
    • pp.194-196
    • /
    • 2011
  • Promoter prediction is a very important problem and is closely related to the main problems of bioinformatics such as the construction of gene regulatory networks and gene function annotation. In this context, we developed an integrated promoter prediction program using hybrid methods, PromoterWizard, which can be employed to detect the core promoter region and the transcription start site (TSS) in vertebrate genomic DNA sequences, an issue of obvious importance for genome annotation efforts. PromoterWizard consists of three main modules and two auxiliary modules. The three main modules include CDRM (Composite Dependency Reflecting Model) module, SVM (Support Vector Machine) module, and ICM (Interpolated Context Model) module. The two auxiliary modules are CpG Island Detector and GCPlot that may contribute to improving the predictive accuracy of the three main modules and facilitating human curator to decide on the final annotation.

Patome: Database of Patented Bio-sequences

  • Kim, SeonKyu;Lee, ByungWook
    • Genomics & Informatics
    • /
    • v.3 no.3
    • /
    • pp.94-97
    • /
    • 2005
  • We have built a database server called Patome which contains the annotation information for patented bio-sequences from the Korean Intellectual Property Office (KIPO). The aims of the Patome are to annotate Korean patent bio-sequences and to provide information on patent relationship of public database entries. The patent sequences were annotated with Reference Sequence (RefSeq) or NCBI's nr database. The raw patent data and the annotated data were stored in the database. Annotation information can be used to determine whether a particular RefSeq ID or NCBI's nr ID is related to Korean patent. Patome infrastructure consists of three components­the database itself, a sequence data loader, and an online database query interface. The database can be queried using submission number, organism, title, applicant name, or accession number. Patome can be accessed at http://www.patome.net. The information will be updated every two months.

In Silico Identification of 6-Phosphogluconolactonase Genes that are Frequently Missing from Completely Sequenced Bacterial Genomes

  • Jeong, Hae-Young;F. Kim, Ji-Hyun;Park, Hong-Seog
    • Genomics & Informatics
    • /
    • v.4 no.4
    • /
    • pp.182-187
    • /
    • 2006
  • 6-Phosphogluconolactonase (6PGL) is one of the key enzymes in the ubiquitous pathways of central carbon metabolism, but bacterial 6PGL had been long known as a missing enzyme even after complete bacterial genome sequence information became available. Although recent experimental characterization suggests that there are two types of 6PGLs (DevB and YbhE), their phylogenetic distribution is severely biased. Here we present that proteins in COG group previously described as 3-oarboxymuconate cyclase (COG2706) are actually the YbhE-type 6PGLs, which are widely distributed in Proteobacteria and Fimicutes. This case exemplifies how erroneous functional description of a member in the reference database commonly used in transitive genome annotation cause systematic problem in the prediction of genes even with universal cellular functions.

Functional annotation of de novo variants from healthy individuals

  • Lee, Jean;Hong, Sung Eun
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.46.1-46.7
    • /
    • 2019
  • The implications of germline de novo variants (DNVs) in diseases are well documented. Despite extensive research, inconsistencies between studies remain a challenge, and the distribution and genetic characteristics of DNVs need to be precisely evaluated. To address this issue at the whole-genome scale, a large number of DNVs identified from the whole-genome sequencing of 1,902 healthy trios (i.e., parents and progeny) from the Simons Foundation for Autism Research Initiative study and 20 healthy Korean trios were analyzed. These apparently nonpathogenic DNVs were enriched in functional elements of the genome but relatively depleted in regions of common copy number variants, implying their potential function as triggers of evolution even in healthy groups. No strong mutational hotspots were identified. The pathogenicity of the DNVs was not strongly elevated, reflecting the health status of the cohort. The mutational signatures were consistent with previous studies. This study will serve as a reference for future DNV studies.

Towards cross-platform interoperability for machine-assisted text annotation

  • de Castilho, Richard Eckart;Ide, Nancy;Kim, Jin-Dong;Klie, Jan-Christoph;Suderman, Keith
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.19.1-19.10
    • /
    • 2019
  • In this paper, we investigate cross-platform interoperability for natural language processing (NLP) and, in particular, annotation of textual resources, with an eye toward identifying the design elements of annotation models and processes that are particularly problematic for, or amenable to, enabling seamless communication across different platforms. The study is conducted in the context of a specific annotation methodology, namely machine-assisted interactive annotation (also known as human-in-the-loop annotation). This methodology requires the ability to freely combine resources from different document repositories, access a wide array of NLP tools that automatically annotate corpora for various linguistic phenomena, and use a sophisticated annotation editor that enables interactive manual annotation coupled with on-the-fly machine learning. We consider three independently developed platforms, each of which utilizes a different model for representing annotations over text, and each of which performs a different role in the process.

Rough Computational Annotation and Hierarchical Conserved Area Viewing Tool for Genomes Using Multiple Relation Graph. (다중 관계 그래프를 이용한 유전체 보존영역의 계층적 시각화와 개략적 전사 annotation 도구)

  • Lee, Do-Hoon
    • Journal of Life Science
    • /
    • v.18 no.4
    • /
    • pp.565-571
    • /
    • 2008
  • Due to rapid development of bioinformatics technologies, various biological data have been produced in silico. So now days complicated and large scale biodata are used to accomplish requirement of researcher. Developing visualization and annotation tool using them is still hot issues although those have been studied for a decade. However, diversity and various requirements of users make us hard to develop general purpose tool. In this paper, I propose a novel system, Genome Viewer and Annotation tool (GenoVA), to annotate and visualize among genomes using known information and multiple relation graph. There are several multiple alignment tools but they lose conserved area for complexity of its constrains. The GenoVA extracts all associated information between all pair genomes by extending pairwise alignment. High frequency conserved area and high BLAST score make a block node of relation graph. To represent multiple relation graph, the system connects among associated block nodes. Also the system shows the known information, COG, gene and hierarchical path of block node. In this case, the system can annotates missed area and unknown gene by navigating the special block node's clustering. I experimented ten bacteria genomes for extracting the feature to visualize and annotate among them. GenoVA also supports simple and rough computational annotation of new genome.