• Title/Summary/Keyword: RefSeq

Search Result 8, Processing Time 0.019 seconds

Patome: Database of Patented Bio-sequences

  • Kim, SeonKyu;Lee, ByungWook
    • Genomics & Informatics
    • /
    • v.3 no.3
    • /
    • pp.94-97
    • /
    • 2005
  • We have built a database server called Patome which contains the annotation information for patented bio-sequences from the Korean Intellectual Property Office (KIPO). The aims of the Patome are to annotate Korean patent bio-sequences and to provide information on patent relationship of public database entries. The patent sequences were annotated with Reference Sequence (RefSeq) or NCBI's nr database. The raw patent data and the annotated data were stored in the database. Annotation information can be used to determine whether a particular RefSeq ID or NCBI's nr ID is related to Korean patent. Patome infrastructure consists of three components­the database itself, a sequence data loader, and an online database query interface. The database can be queried using submission number, organism, title, applicant name, or accession number. Patome can be accessed at http://www.patome.net. The information will be updated every two months.

Comparative analysis of HiSeq3000 and BGISEQ-500 sequencing platform with shotgun metagenomic sequencing data

  • Animesh Kumar;Espen M. Robertsen;Nils P. Willassen;Juan Fu;Erik Hjerde
    • Genomics & Informatics
    • /
    • v.21 no.4
    • /
    • pp.49.1-49.11
    • /
    • 2023
  • Recent advances in sequencing technologies and platforms have enabled to generate metagenomics sequences using different sequencing platforms. In this study, we analyzed and compared shotgun metagenomic sequences generated by HiSeq3000 and BGISEQ-500 platforms from 12 sediment samples collected across the Norwegian coast. Metagenomics DNA sequences were normalized to an equal number of bases for both platforms and further evaluated by using different taxonomic classifiers, reference databases, and assemblers. Normalized BGISEQ-500 sequences retained more reads and base counts after preprocessing, while a slightly higher fraction of HiSeq3000 sequences were taxonomically classified. Kaiju classified a higher percentage of reads relative to Kraken2 for both platforms, and comparison of reference database for taxonomic classification showed that MAR database outperformed RefSeq. Assembly using MEGAHIT produced longer assemblies and higher total contigs count in majority of HiSeq3000 samples than using metaSPAdes, but the assembly statistics notably improved with unprocessed or normalized reads. Our results indicate that both platforms perform comparably in terms of the percentage of taxonomically classified reads and assembled contig statistics for metagenomics samples. This study provides valuable insights for researchers in selecting an appropriate sequencing platform and bioinformatics pipeline for their metagenomics studies.

Functional Annotation and Analysis of Korean Patented Biological Sequences Using Bioinformatics

  • Lee, Byung Wook;Kim, Tae Hyung;Kim, Seon Kyu;Kim, Sang Soo;Ryu, Gee Chan;Bhak, Jong
    • Molecules and Cells
    • /
    • v.21 no.2
    • /
    • pp.269-275
    • /
    • 2006
  • A recent report of the Korean Intellectual Property Office(KIPO) showed that the number of biological sequence-based patents is rapidly increasing in Korea. We present biological features of Korean patented sequences though bioinformatic analysis. The analysis is divided into two steps. The first is an annotation step in which the patented sequences were annotated with the Reference Sequence (RefSeq) database. The second is an association step in which the patented sequences were linked to genes, diseases, pathway, and biological functions. We used Entrez Gene, Online Mendelian Inheritance in Man (OMIM), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology (GO) databases. Through the association analysis, we found that nearly 2.6% of human genes were associated with Korean patenting, compared to 20% of human genes in the U.S. patent. The association between the biological functions and the patented sequences indicated that genes whose products act as hormones on defense responses in the extra-cellular environments were the most highly targeted for patenting. The analysis data are available at http://www.patome.net

Compiling Multicopy Single-Stranded DNA Sequences from Bacterial Genome Sequences

  • Yoo, Wonseok;Lim, Dongbin;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • v.14 no.1
    • /
    • pp.29-33
    • /
    • 2016
  • A retron is a bacterial retroelement that encodes an RNA gene and a reverse transcriptase (RT). The former, once transcribed, works as a template primer for reverse transcription by the latter. The resulting DNA is covalently linked to the upstream part of the RNA; this chimera is called multicopy single-stranded DNA (msDNA), which is extrachromosomal DNA found in many bacterial species. Based on the conserved features in the eight known msDNA sequences, we developed a detection method and applied it to scan National Center for Biotechnology Information (NCBI) RefSeq bacterial genome sequences. Among 16,844 bacterial sequences possessing a retron-type RT domain, we identified 48 unique types of msDNA. Currently, the biological role of msDNA is not well understood. Our work will be a useful tool in studying the distribution, evolution, and physiological role of msDNA.

KUGI: A Database and Search System for Korean Unigene and Pathway Information

  • Yang, Jin-Ok;Hahn, Yoon-Soo;Kim, Nam-Soon;Yu, Ung-Sik;Woo, Hyun-Goo;Chu, In-Sun;Kim, Yong-Sung;Yoo, Hyang-Sook;Kim, Sang-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.407-411
    • /
    • 2005
  • KUGI (Korean UniGene Information) database contains the annotation information of the cDNA sequences obtained from the disease samples prevalent in Korean. A total of about 157,000 5'-EST high throughput sequences collected from cDNA libraries of stomach, liver, and some cancer tissues or established cell lines from Korean patients were clustered to about 35,000 contigs. From each cluster a representative clone having the longest high quality sequence or the start codon was selected. We stored the sequences of the representative clones and the clustered contigs in the KUGI database together with their information analyzed by running Blast against RefSeq, human mRNA, and UniGene databases from NCBI. We provide a web-based search engine fur the KUGI database using two types of user interfaces: attribute-based search and similarity search of the sequences. For attribute-based search, we use DBMS technology while we use BLAST that supports various similarity search options. The search system allows not only multiple queries, but also various query types. The results are as follows: 1) information of clones and libraries, 2) accession keys, location on genome, gene ontology, and pathways to public databases, 3) links to external programs, and 4) sequence information of contig and 5'-end of clones. We believe that the KUGI database and search system may provide very useful information that can be used in the study for elucidating the causes of the disease that are prevalent in Korean.

  • PDF

StrokeBase: A Database of Cerebrovascular Disease-related Candidate Genes

  • Kim, Young-Uk;Kim, Il-Hyun;Bang, Ok-Sun;Kim, Young-Joo
    • Genomics & Informatics
    • /
    • v.6 no.3
    • /
    • pp.153-156
    • /
    • 2008
  • Complex diseases such as stroke and cancer have two or more genetic loci and are affected by environmental factors that contribute to the diseases. Due to the complex characteristics of these diseases, identifying candidate genes requires a system-level analysis of the following: gene ontology, pathway, and interactions. A database and user interface, termed StrokeBase, was developed; StrokeBase provides queries that search for pathways, candidate genes, candidate SNPs, and gene networks. The database was developed by using in silico data mining of HGNC, ENSEMBL, STRING, RefSeq, UCSC, GO, HPRD, KEGG, GAD, and OMIM. Forty candidate genes that are associated with cerebrovascular disease were selected by human experts and public databases. The networked cerebrovascular disease gene maps also were developed; these maps describe genegene interactions and biological pathways. We identified 1127 genes, related indirectly to cerebrovascular disease but directly to the etiology of cerebrovascular disease. We found that a protein-protein interaction (PPI) network that was associated with cerebrovascular disease follows the power-law degree distribution that is evident in other biological networks. Not only was in silico data mining utilized, but also 250K Affymetrix SNP chips were utilized in the 320 control/disease association study to generate associated markers that were pertinent to the cerebrovascular disease as a genome-wide search. The associated genes and the genes that were retrieved from the in silico data mining system were compared and analyzed. We developed a well-curated cerebrovascular disease-associated gene network and provided bioinformatic resources to cerebrovascular disease researchers. This cerebrovascular disease network can be used as a frame of systematic genomic research, applicable to other complex diseases. Therefore, the ongoing database efficiently supports medical and genetic research in order to overcome cerebrovascular disease.

KBUD: The Korea Brain UniGene Database

  • Jeon, Yeo-Jin;Oh, Jung-Hwa;Yang, Jin-Ok;Kim, Nam-Soon
    • Genomics & Informatics
    • /
    • v.3 no.3
    • /
    • pp.86-93
    • /
    • 2005
  • Human brain EST data provide important clues for our understanding of the molecular biology associated with the function of the normal brain and the molecular pathophysiology with brain disorders. To systematically and efficiently study the function and disorders of the human brain, 45,773 human brain ESTs were collected from 27 human brain cDNA libraries, which were constructed from normal brains and brain disorders such as brain tumors, Parkinson's disease (PO) and epilepsy. An analysis of 45,773 human brain ESTs using our EST analysis pipeline resulted in 38,396 high-quality ESTs and 35,906 ESTs, which were coalesced into 8,246 unique gene clusters, showing a significant similarity to known genes in the human RefSeq, human mRNAs and UniGene database. In addition, among 8,246 gene clusters, 4,287 genes ($52\%$) were found to contain full-length cONA clones. To facilitate the extraction of useful information in collected these human brain ESTs, we developed a user-friendly interface system, the Korea Brain Unigene Database (KBUD). The KBUD web interface allows access to our human brain data through three major search modes, the BioCarta pathway, keywords and BLAST searches. Each result when viewed in KBUD offers comprehensive information concerning the analyzed human brain ESTs provided by our data as well as data linked to various other publiC databases. The user-friendly developed KBUD, the first world-wide web interface for human brain EST data with ESTs of human brain disorders as well as normal brains, will be a helpful system for developing a better understanding of the underlying mechanisms of the normal brain well as brain disorders. The KBUD system is freely accessible at http://kugi.kribb.re.kr/KU/cgi -bin/brain. pI.

Current Status and Prospect of Wheat Functional Genomics using Next Generation Sequencing (차세대 염기서열분석을 통한 밀 기능유전체 연구의 현황과 전망)

  • Choi, Changhyun;Yoon, Young-Mi;Son, Jae-Han;Cho, Seong-Woo;Kang, Chon-Sik
    • Korean Journal of Breeding Science
    • /
    • v.50 no.4
    • /
    • pp.364-377
    • /
    • 2018
  • Hexaploid wheat (common wheat/bread wheat) is one of the most important cereal crops in the world and a model for research of an allopolyploid plant with a large, highly repetitive genome. In the heritability of agronomic traits, variation in gene presence/absence plays an important role. However, there have been relatively few studies on the variation in gene presence/absence in crop species, including common wheat. Recently, a reference genome sequence of common wheat has been fully annotated and published. In addition, advanced next-generation sequencing (NGS) technology provides high quality genome sequences with continually decreasing NGS prices, thereby dawning full-scale wheat functional genomic studies in other crops as well as common wheat, in spite of their large and complex genomes. In this review, we provide information about the available tools and methodologies for wheat functional genomics research supported by NGS technology. The use of the NGS and functional genomics technology is expected to be a powerful strategy to select elite lines for a number of germplasms.