Computational Detection of Prokaryotic Core Promoters in Genomic Sequences

  • Kim Ki-Bong (Department of Bioinformatics Engineering, Sangmyung University) ;
  • Sim Jeong Seop (Department of Computer Science and Engineering, Inha University)
  • Published : 2005.10.01

Abstract

The high-throughput sequencing of microbial genomes has resulted in the relatively rapid accumulation of an enormous amount of genomic sequence data. In this context, the problem posed by the detection of promoters in genomic DNA sequences via computational methods has attracted considerable research attention in recent years. This paper addresses the development of a predictive model, known as the dependence decomposition weight matrix model (DDWMM), which was designed to detect the core promoter region, including the -10 region and the transcription start sites (TSSs), in prokaryotic genomic DNA sequences. This is an issue of some importance with regard to genome annotation efforts. Our predictive model captures the most significant dependencies between positions (allowing for non­adjacent as well as adjacent dependencies) via the maximal dependence decomposition (MDD) procedure, which iteratively decomposes data sets into subsets, based on the significant dependence between positions in the promoter region to be modeled. Such dependencies may be intimately related to biological and structural concerns, since promoter elements are present in a variety of combinations, which are separated by various distances. In this respect, the DDWMM may prove to be appropriate with regard to the detection of core promoter regions and TSSs in long microbial genomic contigs. In order to demonstrate the effectiveness of our predictive model, we applied 10-fold cross-validation experiments on the 607 experimentally-verified promoter sequences, which evidenced good performance in terms of sensitivity.

Keywords

References

  1. Burge, C. and S. Karlin. 1997. Prediction of complete gene structure in human genomic DNA. J. Mol. Biol. 268, 78-94 https://doi.org/10.1006/jmbi.1997.0951
  2. Collado-Vides, J. 1992. Grammatical model of the regulation of gene expression. Proc. Natl. Acad. Sci. USA. 89, 9405-9409
  3. Fickett, J. and A. Hatzigeorgiou. 1997. Eukaryotic promoter recognition. Genome Research. 7, 861-878
  4. Frech, K., K. Quandt, and T. Werner. 1997. Software for the analysis of DNA sequence elements of transcription. Comput. Appl. Biosci. 13, 89-97
  5. Gross, C.A. and M. Lonetto. 1992. Bacterial sigma factors. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York
  6. Hernandez, E., A. Johnson, V. Notario, A. Chen, and J. Richert. 2002. AUA as a translation initiation site in vitro for the human transcription factor Sp3. J. Biochem. Mol. Biol. 35, 273-282 https://doi.org/10.5483/BMBRep.2002.35.3.273
  7. Hertz, G.Z., G.W. Hartzell III, and G.D. Stormo. 1990. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Comput. Applic. Biosci. 6, 81-92
  8. Jones, B.D. 2005. Salmonella invasion gene regulation: a story of environmental awarenss. J. Microbiol. 43, 110-117
  9. Kim, E.Y., M.S. Shin, J.H. Rhee, and H.E. Choy. 2004. Factor influencing preferential utilization of RNA polymerase containing sigma-38 in stationary-phase gene expression in Escherichia coli. J. Microbiol. 42, 103-110
  10. Ko, J., D.S. Na, Y.H. Lee, S.Y. Shin, J.H. Kim, B.G. Hwang, B.I. Min, and D.S. Park. 2002. cDNA microarray analysis of the differential gene expression in the neuropathic pain and electroacupunction treatment models. J. Biochem. Mol. Biol. 35, 420-427 https://doi.org/10.5483/BMBRep.2002.35.4.420
  11. Mount, D.W. 2001. Bioinformatics : sequence and genome analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York
  12. Ohler, U. and H. Niemann. 2001. Identification and analysis of eukaryotic promoters: recent computational approaches. Trends in Genetics. 17, 56-60 https://doi.org/10.1016/S0168-9525(00)02174-0
  13. Pedersen, A., P. Baldi, Y. Chauvin, and S. Brunak. 1999. The biology of eukaryotic promoter prediction - a review. Comput. Chemistry. 23, 191-207 https://doi.org/10.1016/S0097-8485(99)00015-7
  14. Salgado, H., S. Gama-Castro, A. Martinez-Antonio, E. Diaz-Peredo, F. Sanchez-Solano, M. Peralta-Gil, D. Garcia-Alonso, V. Jimenez- Jacinto, A. Santos-Zavaleta, C. Bonavides-Martinez, and J.H. Collado-Vides. 2004. RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12. Nucleic. Acids Res. 29, 72-74
  15. Schneider, T.D. and R.M. Stephens. 1990. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097-6100 https://doi.org/10.1093/nar/18.20.6097
  16. Sinha, S. and M. Tompa. 2002. Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 30, 5549-5560 https://doi.org/10.1093/nar/gkf669
  17. Thieffry, D., H. Salgado, A.M. Huerta, and J. Collado-Vides. 1998. Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12. Bioinformatics. 14, 391-400 https://doi.org/10.1093/bioinformatics/14.5.391