• Title/Summary/Keyword: Biological Sequence Database

Search Result 94, Processing Time 0.029 seconds

DEVELOPMENT OF XML BASED PERSONALIZED DATAASE MANAGEMENT SYTEM FOR BIOLOGISTS

  • Cho Kyung Hwan;Jung Kwang Su;Kim Sun Shin;Ryu Keun Ho
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.770-773
    • /
    • 2005
  • In most biological laboratory, sequences from sequence machine are stored into file disks as simple files. It will be hard work to store and manage the sequence data with consistency and integrity such as storing redundant files. It is required needed to develop a system which integrated and managed genome data with consistency and integrity for accurate sequence analysis. There fore, in this paper, we not only store gene and protein sequence data through sequencing but also manage them. We also make a integrate schema for transforming the file formats and design database system using it. As integrated schema is designed as a BSML, it is possible to apply a style language of XSL. From this, we can transfer among heterogeneous sequence formats.

  • PDF

Building an Integrated Protein Data Management System Using the XPath Query Process

  • Cha Hyo Soung;Jung Kwang Su;Jung Young Jin;Ryu Keun Ho
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.99-102
    • /
    • 2004
  • Recently according to developing of bioinformatics techniques, there are a lot of researches about large amount of biological data. And a variety of files and databases are being used to manage these data efficiently. However, because of the deficiency of standardization there are a lot of problems to manage the data and transform one into the other among heterogeneous formats. We are interested in integrating. saving, and managing gene and protein sequence data generated through sequencing. Accordingly, in this paper the goal of our research is to implement the system to manage sequence data and transform a sequence file format into other format. To satisfy these requirements, we adopt BSML (Bioinformatics Sequence Markup Language) as the standard to manage the bioinformatics data. And then we integrate and store the heterogeneous 리at file formats using BSML schema based DTD. And we developed the system to apply the characteristics of object-oriented database and to process XPath query, one of the efficient structural query. that saves and manages XML documents easily.

  • PDF

Sequence Validation for the Identification of the White-Rot Fungi Bjerkandera in Public Sequence Databases

  • Jung, Paul Eunil;Fong, Jonathan J.;Park, Myung Soo;Oh, Seung-Yoon;Kim, Changmu;Lim, Young Woon
    • Journal of Microbiology and Biotechnology
    • /
    • v.24 no.10
    • /
    • pp.1301-1307
    • /
    • 2014
  • White-rot fungi of the genus Bjerkandera are cosmopolitan and have shown potential for industrial application and bioremediation. When distinguishing morphological characters are no longer present (e.g., cultures or dried specimen fragments), characterizing true sequences of Bjerkandera is crucial for accurate identification and application of the species. To build a framework for molecular identification of Bjerkandera, we carefully identified specimens of B. adusta and B. fumosa from Korea based on morphological characters, followed by sequencing the internal transcribed spacer region and 28S nuclear ribosomal large subunit. The phylogenetic analysis of Korean Bjerkandera specimens showed clear genetic differentiation between the two species. Using this phylogeny as a framework, we examined the identification accuracy of sequences available in GenBank. Analyses revealed that many Bjerkandera sequences in the database are either misidentified or unidentified. This study provides robust reference sequences for sequence-based identification of Bjerkandera, and further demonstrates the presence and dangers of incorrect sequences in GenBank.

PC-Based Hybrid Grid Computing for Huge Biological Data Processing

  • Cho, Wan-Sup;Kim, Tae-Kyung;Na, Jong-Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.569-579
    • /
    • 2006
  • Recently, the amount of genome sequence is increasing rapidly due to advanced computational techniques and experimental tools in the biological area. Sequence comparisons are very useful operations to predict the functions of the genes or proteins. However, it takes too much time to compare long sequence data and there are many research results for fast sequence comparisons. In this paper, we propose a hybrid grid system to improve the performance of the sequence comparisons based on the LanLinux system. Compared with conventional approaches, hybrid grid is easy to construct, maintain, and manage because there is no need to install SWs for every node. As a real experiment, we constructed an orthologous database for 89 prokaryotes just in a week under hybrid grid; note that it requires 33 weeks on a single computer.

  • PDF

HorseDB; an Integrated Horse Resource and Web Service (말 데이터베이스 구축)

  • Kim Dae-Soo;Jo Un-Jong;Huh Jae-Won;Choe Eun-Sang;Cho Byung-Wook;Kim Heui-Soo
    • Journal of Life Science
    • /
    • v.16 no.3 s.76
    • /
    • pp.472-476
    • /
    • 2006
  • We have built a database server called HorseDB which contains the genome annotation information and biological information for horse from public database entries. The aims of HorseDB are the integration of biological information and horse genome data on genome scale using bioinformatic methods. To facilitate the extraction of useful information among collected horse genome and biological data, we developed a user-friendly interface system, HorseDB; an Integrated Horse Resource and web Service. The database is categorized by the general horse information data, a sequence annotation data, and a world-wide web analysis program interface. The database also provides an easy access for user to find out the useful information within horse genomes and support analyzed information, such as sequence alignment and gene annotation results. HorseDB can be accessed at http://www.primate.or.kr./horse.

Identification and Phylogenetic Analysis of SINE-R Retroposon Family in cDNA Library of Human Fetal Brain

  • Yi, Joo-Mi;Shin, Kyung-Mi;Lee, Ji-Won;Paik, In-Ho;Jang, Kyung-Lib;Kim, Heui-Soo
    • Animal cells and systems
    • /
    • v.5 no.3
    • /
    • pp.231-236
    • /
    • 2001
  • SINE-R retroposons have been derived from human endogenous retrovirus HERV-K family and found to be hominoid specific. Both SINE-R retroposons and HERV-K family are potentially capable of affecting the expression of closely located genes. From cDNA library of human fetal brain, we identified seven SINE-R retroposons and compared them with sequences derived from GenBank database. The SINE-R retroposons from human feta1 brain showed 85∼97% sequence similarities with the human-specific retroposon SINE-R.C2. They also showed 88∼96% sequence similarities with the sequence of the schizo-cDNA clone that derived from postmortem frontal cortex tissue of a schizophrenic patient. Phylogenetic analysis using the neiqhbor-joining method revealed that the seven new SINE-R retroposons from cDNA library of the human feta1 brain have proliferated independently during human evolution. The data indicate that such SINE-R retroposons are expressed in human fetal brain and deserve further investigation as potential leads to understanding of neuropsychiatric diseases.

  • PDF

Implementation of an Information Management System for Nucleotide Sequences based on BSML using Active Trigger Rules (BSML 기반 능동 트리거 규칙을 이용한 염기서열정보관리시스템의 구현)

  • Park Sung Hee;Jung Kwang Su;Ryu Keun Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.1
    • /
    • pp.24-42
    • /
    • 2005
  • Characteristics of biological data including genome sequences are heterogeneous and various. Although the need of management systems for genome sequencing which should reflect biological characteristics has been raised, most current biological databases provide restricted function as repositories for biological data. Therefore, this paper describes a management system of nucleotide sequences at the level of biological laboratories. It includes format transformation, editing, storing and retrieval for collected nucleotide sequences from public databases, and handles sequence produced by experiments. It uses BSML based on XML as a common format in order to extract data fields and transfer heterogeneous sequence formats. To manage sequences and their changes, version management system for originated DNA is required so as to detect transformed new sequencing appearance and trigger database update. Our experimental results show that applying active trigger rules to manage changes of sequences can automatically store changes of sequences into databases.

Construction of EST Database for Comparative Gene Studies of Acanthamoeba

  • Moon, Eun-Kyung;Kim, Joung-Ok;Xuan, Ying-Hua;Yun, Young-Sun;Kang, Se-Won;Lee, Yong-Seok;Ahn, Tae-In;Hong, Yeon-Chul;Chung, Dong-Il;Kong, Hyun-Hee
    • Parasites, Hosts and Diseases
    • /
    • v.47 no.2
    • /
    • pp.103-107
    • /
    • 2009
  • The genus Acanthamoeba can cause severe infections such as granulomatous amebic encephalitis and amebic keratitis in humans. However, little genomic information of Acanthamoeba has been reported. Here, we constructed Acanthamoeba expressed sequence tags (EST) database (Acanthamoeba EST DB) derived from our 4 kinds of Acanthamoeba cDNA library. The Acanthamoeba EST DB contains 3,897 EST generated from amebae under various conditions of long term in vitro culture, mouse brain passage, or encystation, and downloaded data of Acanthamoeba from National Center for Biotechnology Information (NCBI) and Taxonomically Broad EST Database (TBestDB). The almost reported eDNA/genomic sequences of Acanthamoeba provide stand alone BLAST system with nucleotide (BLAST NT) and amino acid (BLAST AA) sequence database. In BLAST results, each gene links for the significant information including sequence data, gene orthology annotations, relevant references, and a BlastX result. This is the first attempt for construction of Acanthamoeba database with genes expressed in diverse conditions. These data were integrated into a database (http://www. amoeba.or.kr).

GEDA: New Knowledge Base of Gene Expression in Drug Addiction

  • Suh, Young-Ju;Yang, Moon-Hee;Yoon, Suk-Joon;Park, Jong-Hoon
    • BMB Reports
    • /
    • v.39 no.4
    • /
    • pp.441-447
    • /
    • 2006
  • Abuse of drugs can elicit compulsive drug seeking behaviors upon repeated administration, and ultimately leads to the phenomenon of addiction. We developed a procedure for the standardization of microarray gene expression data of rat brain in drug addiction and stored them in a single integrated database system, focusing on more effective data processing and interpretation. Another characteristic of the present database is that it has a systematic flexibility for statistical analysis and linking with other databases. Basically, we adopt an intelligent SQL querying system, as the foundation of our DB, in order to set up an interactive module which can automatically read the raw gene expression data in the standardized format. We maximize the usability of this DB, helping users study significant gene expression and identify biological function of the genes through integrated up-to-date gene information such as GO annotation and metabolic pathway. For collecting the latest information of selected gene from the database, we also set up the local BLAST search engine and non-redundant sequence database updated by NCBI server on a daily basis. We find that the present database is a useful query interface and data-mining tool, specifically for finding out the genes related to drug addiction. We apply this system to the identification and characterization of methamphetamine-induced genes' behavior in rat brain.

Construction of PANM Database (Protostome DB) for rapid annotation of NGS data in Mollusks

  • Kang, Se Won;Park, So Young;Patnaik, Bharat Bhusan;Hwang, Hee Ju;Kim, Changmu;Kim, Soonok;Lee, Jun Sang;Han, Yeon Soo;Lee, Yong Seok
    • The Korean Journal of Malacology
    • /
    • v.31 no.3
    • /
    • pp.243-247
    • /
    • 2015
  • A stand-alone BLAST server is available that provides a convenient and amenable platform for the analysis of molluscan sequence information especially the EST sequences generated by traditional sequencing methods. However, it is found that the server has limitations in the annotation of molluscan sequences generated using next-generation sequencing (NGS) platforms due to inconsistencies in molluscan sequence available at NCBI. We constructed a web-based interface for a new stand-alone BLAST, called PANM-DB (Protostome DB) for the analysis of molluscan NGS data. The PANM-DB includes the amino acid sequences from the protostome groups-Arthropoda, Nematoda, and Mollusca downloaded from GenBank with the NCBI taxonomy Browser. The sequences were translated into multi-FASTA format and stored in the database by using the formatdb program at NCBI. PANM-DB contains 6% of NCBInr database sequences (as of 24-06-2015), and for an input of 10,000 RNA-seq sequences the processing speed was 15 times faster by using PANM-DB when compared with NCBInr DB. It was also noted that PANM-DB show two times more significant hits with diverse annotation profiles as compared with Mollusks DB. Hence, the construction of PANM-DB is a significant step in the annotation of molluscan sequence information obtained from NGS platforms. The PANM-DB is freely downloadable from the web-based interface (Malacological Society of Korea, http://malacol.or/kr/blast) as compressed file system and can run on any compatible operating system.