• 제목/요약/키워드: RefSeq

검색결과 8건 처리시간 0.016초

Patome: Database of Patented Bio-sequences

  • Kim, SeonKyu;Lee, ByungWook
    • Genomics & Informatics
    • /
    • 제3권3호
    • /
    • pp.94-97
    • /
    • 2005
  • We have built a database server called Patome which contains the annotation information for patented bio-sequences from the Korean Intellectual Property Office (KIPO). The aims of the Patome are to annotate Korean patent bio-sequences and to provide information on patent relationship of public database entries. The patent sequences were annotated with Reference Sequence (RefSeq) or NCBI's nr database. The raw patent data and the annotated data were stored in the database. Annotation information can be used to determine whether a particular RefSeq ID or NCBI's nr ID is related to Korean patent. Patome infrastructure consists of three components­the database itself, a sequence data loader, and an online database query interface. The database can be queried using submission number, organism, title, applicant name, or accession number. Patome can be accessed at http://www.patome.net. The information will be updated every two months.

Comparative analysis of HiSeq3000 and BGISEQ-500 sequencing platform with shotgun metagenomic sequencing data

  • Animesh Kumar;Espen M. Robertsen;Nils P. Willassen;Juan Fu;Erik Hjerde
    • Genomics & Informatics
    • /
    • 제21권4호
    • /
    • pp.49.1-49.11
    • /
    • 2023
  • Recent advances in sequencing technologies and platforms have enabled to generate metagenomics sequences using different sequencing platforms. In this study, we analyzed and compared shotgun metagenomic sequences generated by HiSeq3000 and BGISEQ-500 platforms from 12 sediment samples collected across the Norwegian coast. Metagenomics DNA sequences were normalized to an equal number of bases for both platforms and further evaluated by using different taxonomic classifiers, reference databases, and assemblers. Normalized BGISEQ-500 sequences retained more reads and base counts after preprocessing, while a slightly higher fraction of HiSeq3000 sequences were taxonomically classified. Kaiju classified a higher percentage of reads relative to Kraken2 for both platforms, and comparison of reference database for taxonomic classification showed that MAR database outperformed RefSeq. Assembly using MEGAHIT produced longer assemblies and higher total contigs count in majority of HiSeq3000 samples than using metaSPAdes, but the assembly statistics notably improved with unprocessed or normalized reads. Our results indicate that both platforms perform comparably in terms of the percentage of taxonomically classified reads and assembled contig statistics for metagenomics samples. This study provides valuable insights for researchers in selecting an appropriate sequencing platform and bioinformatics pipeline for their metagenomics studies.

Functional Annotation and Analysis of Korean Patented Biological Sequences Using Bioinformatics

  • Lee, Byung Wook;Kim, Tae Hyung;Kim, Seon Kyu;Kim, Sang Soo;Ryu, Gee Chan;Bhak, Jong
    • Molecules and Cells
    • /
    • 제21권2호
    • /
    • pp.269-275
    • /
    • 2006
  • A recent report of the Korean Intellectual Property Office(KIPO) showed that the number of biological sequence-based patents is rapidly increasing in Korea. We present biological features of Korean patented sequences though bioinformatic analysis. The analysis is divided into two steps. The first is an annotation step in which the patented sequences were annotated with the Reference Sequence (RefSeq) database. The second is an association step in which the patented sequences were linked to genes, diseases, pathway, and biological functions. We used Entrez Gene, Online Mendelian Inheritance in Man (OMIM), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology (GO) databases. Through the association analysis, we found that nearly 2.6% of human genes were associated with Korean patenting, compared to 20% of human genes in the U.S. patent. The association between the biological functions and the patented sequences indicated that genes whose products act as hormones on defense responses in the extra-cellular environments were the most highly targeted for patenting. The analysis data are available at http://www.patome.net

Compiling Multicopy Single-Stranded DNA Sequences from Bacterial Genome Sequences

  • Yoo, Wonseok;Lim, Dongbin;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • 제14권1호
    • /
    • pp.29-33
    • /
    • 2016
  • A retron is a bacterial retroelement that encodes an RNA gene and a reverse transcriptase (RT). The former, once transcribed, works as a template primer for reverse transcription by the latter. The resulting DNA is covalently linked to the upstream part of the RNA; this chimera is called multicopy single-stranded DNA (msDNA), which is extrachromosomal DNA found in many bacterial species. Based on the conserved features in the eight known msDNA sequences, we developed a detection method and applied it to scan National Center for Biotechnology Information (NCBI) RefSeq bacterial genome sequences. Among 16,844 bacterial sequences possessing a retron-type RT domain, we identified 48 unique types of msDNA. Currently, the biological role of msDNA is not well understood. Our work will be a useful tool in studying the distribution, evolution, and physiological role of msDNA.

KUGI: A Database and Search System for Korean Unigene and Pathway Information

  • Yang, Jin-Ok;Hahn, Yoon-Soo;Kim, Nam-Soon;Yu, Ung-Sik;Woo, Hyun-Goo;Chu, In-Sun;Kim, Yong-Sung;Yoo, Hyang-Sook;Kim, Sang-Soo
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.407-411
    • /
    • 2005
  • KUGI (Korean UniGene Information) database contains the annotation information of the cDNA sequences obtained from the disease samples prevalent in Korean. A total of about 157,000 5'-EST high throughput sequences collected from cDNA libraries of stomach, liver, and some cancer tissues or established cell lines from Korean patients were clustered to about 35,000 contigs. From each cluster a representative clone having the longest high quality sequence or the start codon was selected. We stored the sequences of the representative clones and the clustered contigs in the KUGI database together with their information analyzed by running Blast against RefSeq, human mRNA, and UniGene databases from NCBI. We provide a web-based search engine fur the KUGI database using two types of user interfaces: attribute-based search and similarity search of the sequences. For attribute-based search, we use DBMS technology while we use BLAST that supports various similarity search options. The search system allows not only multiple queries, but also various query types. The results are as follows: 1) information of clones and libraries, 2) accession keys, location on genome, gene ontology, and pathways to public databases, 3) links to external programs, and 4) sequence information of contig and 5'-end of clones. We believe that the KUGI database and search system may provide very useful information that can be used in the study for elucidating the causes of the disease that are prevalent in Korean.

  • PDF

StrokeBase: A Database of Cerebrovascular Disease-related Candidate Genes

  • Kim, Young-Uk;Kim, Il-Hyun;Bang, Ok-Sun;Kim, Young-Joo
    • Genomics & Informatics
    • /
    • 제6권3호
    • /
    • pp.153-156
    • /
    • 2008
  • Complex diseases such as stroke and cancer have two or more genetic loci and are affected by environmental factors that contribute to the diseases. Due to the complex characteristics of these diseases, identifying candidate genes requires a system-level analysis of the following: gene ontology, pathway, and interactions. A database and user interface, termed StrokeBase, was developed; StrokeBase provides queries that search for pathways, candidate genes, candidate SNPs, and gene networks. The database was developed by using in silico data mining of HGNC, ENSEMBL, STRING, RefSeq, UCSC, GO, HPRD, KEGG, GAD, and OMIM. Forty candidate genes that are associated with cerebrovascular disease were selected by human experts and public databases. The networked cerebrovascular disease gene maps also were developed; these maps describe genegene interactions and biological pathways. We identified 1127 genes, related indirectly to cerebrovascular disease but directly to the etiology of cerebrovascular disease. We found that a protein-protein interaction (PPI) network that was associated with cerebrovascular disease follows the power-law degree distribution that is evident in other biological networks. Not only was in silico data mining utilized, but also 250K Affymetrix SNP chips were utilized in the 320 control/disease association study to generate associated markers that were pertinent to the cerebrovascular disease as a genome-wide search. The associated genes and the genes that were retrieved from the in silico data mining system were compared and analyzed. We developed a well-curated cerebrovascular disease-associated gene network and provided bioinformatic resources to cerebrovascular disease researchers. This cerebrovascular disease network can be used as a frame of systematic genomic research, applicable to other complex diseases. Therefore, the ongoing database efficiently supports medical and genetic research in order to overcome cerebrovascular disease.

KBUD: The Korea Brain UniGene Database

  • Jeon, Yeo-Jin;Oh, Jung-Hwa;Yang, Jin-Ok;Kim, Nam-Soon
    • Genomics & Informatics
    • /
    • 제3권3호
    • /
    • pp.86-93
    • /
    • 2005
  • Human brain EST data provide important clues for our understanding of the molecular biology associated with the function of the normal brain and the molecular pathophysiology with brain disorders. To systematically and efficiently study the function and disorders of the human brain, 45,773 human brain ESTs were collected from 27 human brain cDNA libraries, which were constructed from normal brains and brain disorders such as brain tumors, Parkinson's disease (PO) and epilepsy. An analysis of 45,773 human brain ESTs using our EST analysis pipeline resulted in 38,396 high-quality ESTs and 35,906 ESTs, which were coalesced into 8,246 unique gene clusters, showing a significant similarity to known genes in the human RefSeq, human mRNAs and UniGene database. In addition, among 8,246 gene clusters, 4,287 genes ($52\%$) were found to contain full-length cONA clones. To facilitate the extraction of useful information in collected these human brain ESTs, we developed a user-friendly interface system, the Korea Brain Unigene Database (KBUD). The KBUD web interface allows access to our human brain data through three major search modes, the BioCarta pathway, keywords and BLAST searches. Each result when viewed in KBUD offers comprehensive information concerning the analyzed human brain ESTs provided by our data as well as data linked to various other publiC databases. The user-friendly developed KBUD, the first world-wide web interface for human brain EST data with ESTs of human brain disorders as well as normal brains, will be a helpful system for developing a better understanding of the underlying mechanisms of the normal brain well as brain disorders. The KBUD system is freely accessible at http://kugi.kribb.re.kr/KU/cgi -bin/brain. pI.

차세대 염기서열분석을 통한 밀 기능유전체 연구의 현황과 전망 (Current Status and Prospect of Wheat Functional Genomics using Next Generation Sequencing)

  • 최창현;윤영미;손재한;조성우;강천식
    • 한국육종학회지
    • /
    • 제50권4호
    • /
    • pp.364-377
    • /
    • 2018
  • 차세대 염기 서열 분석 기술의 적용은 빠르게 식물 유전체학의 지식을 확장시킴으로 기능유전자 연구의 발전을 도모하고 있다. 특히, 밀의 기능유전체학의 발전은 기존의 염기서열 분석 기술로는 가능성이 없어 보였다. 하지만 NGS의 발전은 고품질 보통밀의 RefSeq를 완성뿐만 아니라 다양한 밀 계통들의 재염기서열분석을 가능하게 한다. 현재 이렇게 얻어진 고품질 유전정보와 유전적 다형성이 밝혀진 유전자원의 이용으로 밀 기능유전체 연구가 새로운 단계로 접어들고 있다. NGS 기술 및 reverse genetics의 발전은 앞으로 전세계에 펼쳐져 있는 야생형 밀과 재배종 밀 계통들의 유전적인 다양성 분석을 가능케 하고 밀의 유전과 진화 과정을 깊게 이해하는데 큰 도움이 될 것이다. NGS 기술의 사용과 생물정보학의 결합은 타 작물에 비해 뒤쳐진 밀의 기능유전체 연구 속도를 가속화할 것이다. 기능유전체 연구를 활용한 밀 육종의 시대가, 애기장대 및 벼 분야와 같이, 다가오고 있다.