• Title/Summary/Keyword: Genome Database

Search Result 354, Processing Time 0.022 seconds

Higher Order Knowledge Processing: Pathway Database and Ontologies

  • Fukuda, Ken Ichiro
    • Genomics & Informatics
    • /
    • v.3 no.2
    • /
    • pp.47-51
    • /
    • 2005
  • Molecular mechanisms of biological processes are typically represented as 'pathways' that have a graph­analogical network structure. However, due to the diversity of topics that pathways cover, their constituent biological entities are highly diverse and the semantics is embedded implicitly. The kinds of interactions that connect biological entities are likewise diverse. Consequently, how to model or process pathway data is not a trivial issue. In this review article, we give an overview of the challenges in pathway database development by taking the INOH project as an example.

WebChemDB: An Integrated Chemical Database Retrieval System

  • Hou, Bo-Kyeng;Moon, Eun-Joung;Moon, Sung-Chul;Kim, Hae-Jin
    • Genomics & Informatics
    • /
    • v.7 no.4
    • /
    • pp.212-216
    • /
    • 2009
  • WebChemDB is an integrated chemical database retrieval system that provides access to over 8 million publicly available chemical structures, including related information on their biological activities and direct links to other public chemical resources, such as PubChem, ChEBI, and DrugBank. The data are publicly available over the web, using two-dimensional (2D) and three-dimensional (3D) structure retrieval systems with various filters and molecular descriptors. The web services API also provides researchers with functionalities to programmatically manipulate, search, and analyze the data.

Genomic and Proteomic Databases: Foundations, Current Status and Future Applications

  • Navathe, Shamkant B.;Patil, Upen;Guan, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.1 no.1
    • /
    • pp.1-30
    • /
    • 2007
  • In this paper we have provided an extensive survey of the databases and other resources related to the current research in bioinformatics and the issues that confront the database researcher in helping the biologists. Initially we give an overview of the concepts and principles that are fundamental in understanding the basis of the data that has been captured in these databases. We briefly trace the evolution of biological advances and point out the importance of capturing data about genes, the fundamental building blocks that encode the characteristics of life and proteins that are the essential ingredients for sustaining life. The study of genes and proteins is becoming extremely important and is being known as genomics and proteomics, respectively. Whereas there are numerous databases related to various subfields of biology, we have maintained a focus on genomic and proteomic databases which are the crucial stepping stones for other fields and are expected to play an important role in the future applications of biology and medicine. A detailed listing of these databases with information about their sizes, formats and current status is presented. Related databases like molecular pathways and interconnection network databases are mentioned, but their full coverage would be beyond the scope of a single paper. We comment on the peculiar nature of the data in biology that presents special problems in organizing and accessing these databases. We also discuss the capabilities needed for database development and information management in the bioinformatics arena with particular attention to ontology development. Two research case studies based on our own research are summarized dealing with the development of a new genome database called Mitomap and the creation of a framework for discovery of relationships among genes from the biomedical literature. The paper concludes with an overview of the applications that will be driven from these databases in medicine and healthcare. A glossary of important terms is provided at the end of the paper.

Calibrating Thresholds to Improve the Detection Accuracy of Putative Transcription Factor Binding Sites

  • Kim, Young-Jin;Ryu, Gil-Mi;Park, Chan;Kim, Kyu-Won;Oh, Berm-Seok;Kim, Young-Youl;Gu, Man-Bok
    • Genomics & Informatics
    • /
    • v.5 no.4
    • /
    • pp.143-151
    • /
    • 2007
  • To understand the mechanism of transcriptional regulation, it is essential to detect promoters and regulatory elements. Various kinds of methods have been introduced to improve the prediction accuracy of regulatory elements. Since there are few experimentally validated regulatory elements, previous studies have used criteria based solely on the level of scores over background sequences. However, selecting the detection criteria for different prediction methods is not feasible. Here, we studied the calibration of thresholds to improve regulatory element prediction. We predicted a regulatory element using MATCH, which is a powerful tool for transcription factor binding site (TFBS) detection. To increase the prediction accuracy, we used a regulatory potential (RP) score measuring the similarity of patterns in alignments to those in known regulatory regions. Next, we calibrated the thresholds to find relevant scores, increasing the true positives while decreasing possible false positives. By applying various thresholds, we compared predicted regulatory elements with validated regulatory elements from the Open Regulatory Annotation (ORegAnno) database. The predicted regulators by the selected threshold were validated through enrichment analysis of muscle-specific gene sets from the Tissue-Specific Transcripts and Genes (T-STAG) database. We found 14 known muscle-specific regulators with a less than a 5% false discovery rate (FDR) in a single TFBS analysis, as well as known transcription factor combinations in our combinatorial TFBS analysis.

Theoretical Peptide Mass Distribution in the Non-Redundant Protein Database of the NCBI

  • Lim Da-Jeong;Oh Hee-Seok;Kim Hee-Bal
    • Genomics & Informatics
    • /
    • v.4 no.2
    • /
    • pp.65-70
    • /
    • 2006
  • Peptide mass mapping is the matching of experimentally generated peptides masses with the predicted masses of digested proteins contained in a database. To identify proteins by matching their constituent fragment masses to the theoretical peptide masses generated from a protein database, the peptide mass fingerprinting technique is used for the protein identification. Thus, it is important to know the theoretical mass distribution of the database. However, few researches have reported the peptide mass distribution of a database. We analyzed the peptide mass distribution of non-redundant protein sequence database in the NCBI after digestion with 15 different types of enzymes. In order to characterize the peptide mass distribution with different digestion enzymes, a power law distribution (Zipfs law) was applied to the distribution. After constructing simulated digestion of a protein database, rank-frequency plot of peptide fragments was applied to generalize a Zipfs law curve for all enzymes. As a result, our data appear to fit Zipfs law with statistically significant parameter values.

REPEATOME: A Database for Repeat Element Comparative Analysis in Human and Chimpanzee

  • Woo, Tae-Ha;Hong, Tae-Hui;Kim, Sang-Soo;Chung, Won-Hyong;Kang, Hyo-Jin;Kim, Chang-Bae;Seo, Jung-Min
    • Genomics & Informatics
    • /
    • v.5 no.4
    • /
    • pp.179-187
    • /
    • 2007
  • An increasing number of primate genomes are being sequenced. A direct comparison of repeat elements in human genes and their corresponding chimpanzee orthologs will not only give information on their evolution, but also shed light on the major evolutionary events that shaped our species. We have developed REPEATOME to enable visualization and subsequent comparisons of human and chimpanzee repeat elements. REPEATOME (http://www.repeatome.org/) provides easy access to a complete repeat element map of the human genome, as well as repeat element-associated information. It provides a convenient and effective way to access the repeat elements within or spanning the functional regions in human and chimpanzee genome sequences. REPEATOME includes information to compare repeat elements and gene structures of human genes and their counterparts in chimpanzee. This database can be accessed using comparative search options such as intersection, union, and difference to find lineage-specific or common repeat elements. REPEATOME allows researchers to perform visualization and comparative analysis of repeat elements in human and chimpanzee.

Trend and Technology of Gene and Genome Research (유전자 및 유전체 연구 기술과 동향)

  • 이진성;김기환;서동상;강석우;황재삼
    • Journal of Sericultural and Entomological Science
    • /
    • v.42 no.2
    • /
    • pp.126-141
    • /
    • 2000
  • A major step towards understanding of the genetic basis of an organism is the complete sequence determination of all genes in target genome. The nucleotide sequence encoded in the genome contains the information that specifies the amino acid sequence of every protein and functional RNA molecule. In principle, it will be possible to identify every protein resposible for the structure and function of the body of the target organism. The pattern of expression in different cell types will specify where and when each protein is used. The amino acid sequence of the proteins encoded by each gene will be derived from the conceptional translation of the nucleotide sequence. Comparison of these sequences with those of known proteins, whose sequences are sorted in database, will suggest an approximate function for many proteins. This mini review describes the development of new sequencing methods and the optimization of sequencing strategies for whole genome, various cDNA and genomic analysis.

  • PDF

Meta- and Gene Set Analysis of Stomach Cancer Gene Expression Data

  • Kim, Seon-Young;Kim, Jeong-Hwan;Lee, Heun-Sik;Noh, Seung-Moo;Song, Kyu-Sang;Cho, June-Sik;Jeong, Hyun-Yong;Kim, Woo Ho;Yeom, Young-Il;Kim, Nam-Soon;Kim, Sangsoo;Yoo, Hyang-Sook;Kim, Yong Sung
    • Molecules and Cells
    • /
    • v.24 no.2
    • /
    • pp.200-209
    • /
    • 2007
  • We generated gene expression data from the tissues of 50 gastric cancer patients, and applied meta-analysis and gene set analysis to this data and three other stomach cancer gene expression data sets to define the gene expression changes in gastric tumors. By meta-analysis we identified genes consistently changed in gastric carcinomas, while gene set analysis revealed consistently changed biological themes. Genes and gene sets involved in digestion, fatty acid metabolism, and ion transport were consistently down-regulated in gastric carcinomas, while those involved in cellular proliferation, cell cycle, and DNA replication were consistently up-regulated. We also found significant differences between the genes and gene sets expressed in diffuse and intestinal type gastric carcinoma. By gene set analysis of cytogenetic bands, we identified many chromosomal regions with possible gross chromosomal changes (amplifications or deletions). Similar analysis of transcription factor binding sites (TFBSs), revealed transcription factors that may have caused the observed gene expression changes in gastric carcinomas, and we confirmed the overexpression of one of these, E2F1, in many gastric carcinomas by tissue array and immunohistochemistry. We have incorporated the results of our meta- and gene set analyses into a web accessible database (http://human-genome.kribb.re.kr/stomach/).

WinBioDBs: A Windows-based Integrated Program for Manipulating Major Biological Databases

  • Nam, Hye-Weon;Lee, Jin-Ho;Park, Kie-Jung
    • Genomics & Informatics
    • /
    • v.7 no.3
    • /
    • pp.175-177
    • /
    • 2009
  • We have developed WinBioDBs with Windows interfaces, which include importing modules and searching interfaces for 10 major public databases such as GenBank, PIR, SwissProt, Pathway, EPD, ENZYME, REBASE, Prosite, Blocks, and Pfam. User databases can be constructed with searching results of queries and their entries can be edited. The program is a stand-alone database searching program on Windows PC. Database update features are supported by importing raw database files and indexing after downloading them. Users can adjust their own searching environments and report format and construct their own projects consisting of a combination of a local databases. WinBioDBs are implemented with VC++ and its database is based on MySQL.

BioSubroutine: an Open Web Server for Bioinformatics Algorithms and Subroutines

  • Lee, Joowon;Kim, Hana;Lee, Wonhye;Chung, Dongil;Bhak, Jong
    • Genomics & Informatics
    • /
    • v.3 no.1
    • /
    • pp.35-38
    • /
    • 2005
  • We present BioSubroutine, an open depository server that automatically categorizes various subroutines frequently used in bioinformatics research. We processed a large bioinformatics subroutine library called Bio.pl that was the first Bioperl subroutine library built in 1995. Over 1000 subroutines were processed automatically and an HTML interface has been created. BioSubroutine can accept new subroutines and algorithms from any such subroutine library, as well as provide interactive user forms. The subroutines are stored in an SQL database for quick searching and accessing. BioSubroutine is an open access project under the BioLicense license scheme.