Implementation of an Information Management System for Nucleotide Sequences based on BSML using Active Trigger Rules

BSML 기반 능동 트리거 규칙을 이용한 염기서열정보관리시스템의 구현

  • 박성희 (충북대학교 전자계산학과) ;
  • 정광수 (충북대학교 전자계산학과) ;
  • 류근호 (충북대학교 전기전자컴퓨터공학부)
  • Published : 2005.01.01

Abstract

Characteristics of biological data including genome sequences are heterogeneous and various. Although the need of management systems for genome sequencing which should reflect biological characteristics has been raised, most current biological databases provide restricted function as repositories for biological data. Therefore, this paper describes a management system of nucleotide sequences at the level of biological laboratories. It includes format transformation, editing, storing and retrieval for collected nucleotide sequences from public databases, and handles sequence produced by experiments. It uses BSML based on XML as a common format in order to extract data fields and transfer heterogeneous sequence formats. To manage sequences and their changes, version management system for originated DNA is required so as to detect transformed new sequencing appearance and trigger database update. Our experimental results show that applying active trigger rules to manage changes of sequences can automatically store changes of sequences into databases.

유전체 서열을 포함하는 생물정보는 지속적으로 변화하며 이질적이고 다양하다는 특성을 갖는다. 이러한 생물 정보의 특성을 반영한 관리시스템이 요구되지만 현재 대부분의 기존 생물정보 데이타베이스는 생물 데이타에 대한 저장소로만 이용된다. 따라서 이 논문에서는 생물학 연구실 수준에서 시퀀싱 실험을 통해 생산되거나 다양한 공개용 데이타베이스로부터 수집된 염기 서열 데이타를 파일 포맷 변환, 편집, 저장 및 검색을 수행하는 서열정보관리 시스템을 제시한다. 이질적인 서열 포맷간의 파일 변환을 위하여 XML기반 BSML을 공통 포맷으로 이용한다. 서열 저장관리에서는 동일한 DNA 조각에 대한 서열 구성의 변경정보를 저장하기 위해 서열 버전을 정의하고 능동 트리거 규칙을 이용하여 변경 정보 검출 및 생성 방법을 보여준다. 트리거 기능을 이용하여 서열의 변경 정보를 자동적으로 데이타베이스에서 저장관리 할 수 있음을 보이고 성능을 평가하였다.

Keywords

References

  1. S. H. Park, K. H. Ryu, H. S. Son, A Protein Structural Information Management Based on Spatial Concepts and Active Trigger Rules, DEXA03 : Database and Expert Systems Applications, LNCS2736 : 413-422, 2003
  2. S. H. Park, K. H. Ryu, B.J. Jeong, H. S. Son, Version Management of a genomic sequence database using active rules and temporal concepts, ISMB 03', Australia, Jun 29, July 3, 2003
  3. K. S. Jung, S. H. Park, K. H. Ryu, H. S. Son, Sequence Version Management System based on Trigger, Korean Society for Bioinformatics Annual Meeting, Vol.1, pp. 134-141, 2002
  4. J. Ostell, S.J. Wheelan, J.A. Kans, The NCBI data model. Chapter 2 in Bioinformatics : A Practical Guide to the Analysis of Genes and Proteins, 2nd ed., edited by Baxevanis, A.D. and Ouellette, B.F.F. New York : John Wiley & Sons, pp. 19-43, 2001
  5. R. Elmasri, S. B. Navathe, 'Fundamentals of Database Systems,' Addison-Wesley, 2000
  6. R. H. Li, S. H. Park, K. S. Jeong, K. H. Ryu, Integrated data modeling of protein structures using a fact constellation model based on a XML mediated warehouse system, ISMB 03', Australia, Jun 29-July 3, 2003
  7. S. H. Park, Y. Han, K. H. Ryu, 'Building Genime and Protein Sequence Information Management System,' 7th KOSTI Workshop on Korean Infrastructure for Science and Technology Information, pp. 234-247, 2002
  8. A.D. Baxevanis, B.F.F. Ouellette, Bioinformatics : A Practical Guide to the Analysis of Genes and Proteins, pp. 45-59, Wiley-Liss, Inc, 2001
  9. S. I. Letovsky, Bioinformatics Databases and Systems, Kluwer Academic Publishers, 2000
  10. G. Stoesser, W. Baker, A. V.D Broek, E. Camon, M. Garcia-Pastor, C. Kanz, T. Kulikova, V. Lombard, R. Lopez, H. Parkinson, N. Redaschi, P. Sterk, P. Stoehr, M. Ann T., 'The EMBL nucleotide sequence database,' Nucl. Acids. Res. Vol.29, pp. 17-21, 2001 https://doi.org/10.1093/nar/29.1.17
  11. J. Widom, S. Ceri, Introduction to Active Database Systems. Active Database Systems : Triggers and Rules For Advanced Database Processing, Morgan Kaufmann (1996)1-41
  12. J. Spitzner, Bioinformatics Sequence Markup Language Manual, LabBook Inc., 1997
  13. D. L. Wheeler, D. M. C. A. E. Lash, D. D. Leipe, T. L. Madden, J. U. Pontius, G. D. Schuler, L. M. Schriml, T. A. Tatusova, L. Wagner, B. A. Rapp, Database resources of the National Center for Biotechnology Information : 2002 update, Nucl. Acids. Res. Vol : 30. pp. 13-16, 2002 https://doi.org/10.1093/nar/30.1.13
  14. R. Staden, K. F. Beal, J. K. Bonfield The Staden Package, 1998. Computer Methods in Molecular Biology, pages 115-130, vol. 132 : Bioinformatics Methods and Protocols Eds Stephen Misener and Steve A. Krawetz. The Humana Press Inc., Totowa, NJ 07512
  15. D. A. Benson, I. K. Mizrachi, D. J. Lipman, J. Ostell, B. A. Rapp, D. L. Wheeler 'GenBank' Nucl. Acids. Res. Vol : 30, pp. 17-20, 2002 https://doi.org/10.1093/nar/30.1.17
  16. B. James, K. Beal, K. F. Betts, J. Matthew, S. Rodger. Trev : a DNA trace editor and viewer. Bioinformatics Vol.18, pp. 194-195, 2002 https://doi.org/10.1093/bioinformatics/18.1.194
  17. J. Bonfiled, K. F. Beal , M. Jordan, Y. Cheng, R. Staden, The Staden Package Manual, Medical Research Council Labortory of Molecular Biology, 2001
  18. R. Staden, D. P. Judge, J. K. Bonfield SEQUENCE ASSEMBLY AND FINISHING. A Practical Guide to the Analysis of Genes and Proteins. Second Edition Eds. Andreas D. Baxevanis and B. F. Francis Ouellette. John Wiley & Sons, New York, NY, USA, 2001
  19. Altschul, S. F., Carrol, R. J., and Lipman, D. J.(1990). Basic local alignment search tool. J. Mol. Biol., Vol. 215, pp. 403, 1990
  20. Karlin, S. & Altschul, S.F, 'Methods for assessing the statistical significance of molecular sequence features by using general scorling schemes,' Proc. Natl. Acad. Sci. USA 87, 1990
  21. F. Achard, G. Vaysseix, XML, bioinformatics and data integration, Society Technical Committee on Data Engineering, 1999
  22. J. Ostell, The NCBI software tools. In Nucleic Acid and Protein Analysis : A Practical Approach, M. Bishop and C. Rawlings, Eds. Oxford: IRL Press, pp. 31-43, 1996
  23. D. W. Mount, 'Bioinformatics : Sequence and Genome Analysis,' Cold Spring Harbor Laboratory Press, 2001
  24. Pearson W.R., Lipman D.J., 'Improved tools for biological sequence comparison,' Proc. Narl. Acad. Sci. vol 85 pp. 2444-2448, 1988 https://doi.org/10.1073/pnas.85.8.2444
  25. D. Fenyo, The Biopolymer Markup Language, Oxford University Press, 1999
  26. The Genomic Workspace User Manual 4.0, Technical Memo, Rescentris, Ltd., 2003