Retrieving Protein Domain Encoding DNA Sequences Automatically Through Database Cross-referencing

  • Choi, Yoon-Sup (School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology) ;
  • Yang, Jae-Seong (School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology) ;
  • Ryu, Sung-Ho (School of Interdisciplinary Bioscience and Bioengineering, Department of Molecular and Life Science, Pohang University of Science and Technology) ;
  • Kim, Sang-Uk (School of Interdisciplinary Bioscience and Bioengineering, Department of Molecular and Life Science, Biological Research Information Center, Pohang University of Science and Technology)
  • Published : 2006.05.28

Abstract

Recent proteomic studies of protein domains require high-throughput and systematic approaches. Since most experiments using protein domains, the modules of protein-protein interactions, require gene cloning, the first experimental step should be retrieving DNA sequences of domain encoding regions from databases. For a large scale proteomic research, however, it is a laborious task to extract a large number of domain sequences manually from several inter-linked databases. We present a new methodology to retrieve DNA sequences of domain encoding regions through automatic database cross-referencing. To extract protein domain encoding regions, it traverses several inter-connected database with validation process. And we applied this method to retrieve all the EGF domain encoding DNA sequences of homo sapiens. This new algorithm was implemented using Python library PAMIE, which enables to cross-reference across distinct databases automatically.

Keywords