DOI QR코드

DOI QR Code

Development of an Organism-specific Protein Interaction Database with Supplementary Data from the Web Sources

다양한 웹 데이터를 이용한 특정 유기체의 단백질 상호작용 데이터베이스 개발

  • Published : 2002.12.01

Abstract

This paper presents the development of a protein interaction database. The developed system is characterized as follows. First, the proposed system not only maintains interaction data collected by an experiment, but also the genomic information of the protein data. Secondly, the system can extract details on interacting proteins through the developed wrappers. Thirdly, the system is based on wrapper-based system in order to extract the biologically meaningful data from various web sources and integrate them into a relational database. The system inherits a layered-modular architecture by introducing a wrapper-mediator approach in order to solve the syntactic and semantic heterogeneity among multiple data sources. Currently the system has wrapped the relevant data for about 40% of about 11,500 proteins on average from various accessible sources. A wrapper-mediator approach makes a protein interaction data comprehensive and useful with support of data interoperability and integration. The developing database will be useful for mining further knowledge and analysis of human life in proteomics studies.

이 논문은 단백질 상호작용 데이터베이스 개발에 관해 기술한다. 개발된 시스템의 특징으로서는 첫째, 생물학자들의 직접적인 실험을 통해 얻어진 단백질 상호작용 및 유전인자 데이터를 제공한다. 둘째, 생물학적으로 관련 있는 다양한 형식의 데이터를 wrapper를 통해 광범위하게 분포된 웹사이트들로부터 추출한다. 셋째, 다양한 웹 데이터들 간의 어휘적, 의미적 이질성을 완화하기 위해 wrapper-mediator에 의한 계층적 모듈 구조를 이용하여 추출된 데이터는 통합 과정을 거친 후, 데이터베이스 저장 및 검색을 가능하게 하였다. 현재까지, 주어진 약 11,500 단백질들에 대해, 생물적으로 의미 있는 데이터를 약 40% 정도 데이터베이스 화 했다. 본 개발된 시스템은 프로티오믹스 연구에서 데이터 분석에 유용할 것으로 기대된다.

Keywords

References

  1. The FlyBase Consortium, The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Research 30, http://flybase.org, pp.106-108, 2002 https://doi.org/10.1093/nar/30.1.106
  2. GadFly : Genome Annotation Database of Drosophila, http://flybase.bio.indiana.edu/annot/
  3. Munich information center for protein sequence, http://mips.gsf.de/proj/yeast/
  4. SWISS-PROT, http://www.expasy.ch/sprot/sprot-top.html
  5. GENE ONTOLOGY CONSORTIUM, http://www.geneontology.org
  6. IncyteGenomics For Life, http://www.incyte.com/proteome/YPD
  7. GenBank, http://www.ncbi.nlm.nih.gov/GenBank
  8. W3C World Wide Web, http://www.w3.org
  9. Frederic Achard, Guy Vaysseix and Emmnauel Barillot, 'XML, Bioinformics and data integration,' Bioinformics Review, Vol.17, No.2, pp.115-125, 2001 https://doi.org/10.1093/bioinformatics/17.2.115
  10. Gio Wiederhold and Michael Geneserecth, 'The Conceptual Basis for Mediation Services,' IEEE Expert, Intelligent Systems and their Applications, Vol.12, No.5, 1997 https://doi.org/10.1109/64.621227
  11. Hector Garcis-Molina, Yannis Papakonstantinos, Dallan Quass, Anand Rajaraman, Jeffrey Ullman, Jennifer Widom and Vasilis Vassalos, 'The TSIMMIS Approach to Mediation : Data Models and Languages,' Journal of Intelligent Information Systems, Vol.8, No.2, pp.117-132, 1997 https://doi.org/10.1023/A:1008683107812
  12. Mary Tork Roth and Peter Schwarz, 'A Wrapper Architecture for Legacy Data Sources,' IBM Technical Report RJ 10077, 1997
  13. Merja Ek, Heli Hakkarainen, Pekka Kilpelainen, Eila Kuikka and Tommi Penttinen, 'Describing XML Wrappers for Information Integration,' University of Kuopio, Technical Report A/2001/2, 2001
  14. Vincent H Guerrini and David Jackson, 'Bioinformics and Extended Markup Language(XML),' Online Journal of Bioinformics, Vol.1, No.1, 2000
  15. Wei Han, David Buttler and Calton Pu, 'Wrapping Web Data into XML,' SIGMOD Record, Vol.30, No.3, pp.33-45, 2001 https://doi.org/10.1145/603867.603873
  16. T. Critchlow, K. Fidelis, M. Ganesh, R. Musick and T. Slezak, 'DataFoundry : Information Management for Scientific Data,' Processdings of IEEE Advances in Digital Libraries, 2000 https://doi.org/10.1109/4233.826859
  17. C. A. Goble R. Stevens, G. Ng, S. Bechhofer, N. W. Paton, P. G. Baker, M. Peim and A. Brass, 'Transparent access to multiple Bioinformics information sources,' IBM Systems Journal, Vol.40, No.2, pp.532-551, 2001 https://doi.org/10.1147/sj.402.0532
  18. IBM Life Science Solution Team, IBM Life Science Solutions : Turing Data into Discovery with DiscoveryLink, Redbooks, 2002
  19. Gary D. Bader, Ian Donaldson, Cheryl Wolting, B. F. Francis Oullette, Tony Pawson, and Christopher W. V. Hogue, 'BIND-The Biomolecular Interaction Network Database,' Nucleic Acids Research, Vol.29, No.1, pp.242-245, 2001 https://doi.org/10.1093/nar/29.1.242
  20. Ioannis Xenarios, Danny W. Rice, Lukasz Salwinski, Marisa K. Baron, Edward M. Marchotte and David Eisenberg, 'DIP : the Database of Interacting Proteins,' Nucleic Acids Research, Vol.28, No.1, pp.289-291, 2000 https://doi.org/10.1093/nar/28.1.289