DOI QR코드

DOI QR Code

A Review of Three Different Studies on Hidden Markov Models for Epigenetic Problems: A Computational Perspective

  • Lee, Kyung-Eun (Ewha Information and Telecommunication Institute, Ewha Womans University) ;
  • Park, Hyun-Seok (Ewha Information and Telecommunication Institute, Ewha Womans University)
  • Received : 2014.10.17
  • Accepted : 2014.11.23
  • Published : 2014.12.31

Abstract

Recent technical advances, such as chromatin immunoprecipitation combined with DNA microarrays (ChIp-chip) and chromatin immunoprecipitation-sequencing (ChIP-seq), have generated large quantities of high-throughput data. Considering that epigenomic datasets are arranged over chromosomes, their analysis must account for spatial or temporal characteristics. In that sense, simple clustering or classification methodologies are inadequate for the analysis of multi-track ChIP-chip or ChIP-seq data. Approaches that are based on hidden Markov models (HMMs) can integrate dependencies between directly adjacent measurements in the genome. Here, we review three HMM-based studies that have contributed to epigenetic research, from a computational perspective. We also give a brief tutorial on HMM modelling-targeted at bioinformaticians who are new to the field.

Keywords

References

  1. Park HS, Galbadrakh B, Kim YM. Recent progresses in the linguistic modeling of biological sequences based on formal language theory. Genomics Inform 2011;9:5-11. https://doi.org/10.5808/GI.2011.9.1.005
  2. Searls DB. The language of genes. Nature 2002;420:211-217. https://doi.org/10.1038/nature01255
  3. Munch K, Krogh A. Automatic generation of gene finders for eukaryotic species. BMC Bioinformatics 2006;7:263. https://doi.org/10.1186/1471-2105-7-263
  4. Durbin R, Eddy SR, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge: Cambridge University Press, 1998.
  5. Pachter L, Alexandersson M, Cawley S. Applications of generalized pair hidden Markov models to alignment and gene finding problems. J Comput Biol 2002;9:389-399. https://doi.org/10.1089/10665270252935520
  6. Liang KC, Wang X, Anastassiou D. Bayesian basecalling for DNA sequence analysis using hidden Markov models. IEEE/ACM Trans Comput Biol Bioinform 2007;4:430-440.
  7. Lottaz C, Iseli C, Jongeneel CV, Bucher P. Modeling sequencing errors by combining Hidden Markov models. Bioinformatics 2003;19 Suppl 2:ii103-ii112.
  8. Won KJ, Hamelryck T, Prugel-Bennett A, Krogh A. An evolutionary method for learning HMM structure: prediction of protein secondary structure. BMC Bioinformatics 2007;8:357. https://doi.org/10.1186/1471-2105-8-357
  9. Zhang S, Borovok I, Aharonowitz Y, Sharan R, Bafna V. A sequence- based filtering method for ncRNA identification and its application to searching for riboswitch elements. Bioinformatics 2006;22:e557-e565. https://doi.org/10.1093/bioinformatics/btl232
  10. Yoon BJ, Vaidyanathan PP. Structural alignment of RNAs using profile-csHMMs and its application to RNA homology search: overview and new results. IEEE Trans Automat Contr 2008;53:10-25. https://doi.org/10.1109/TAC.2007.911322
  11. Harmanci AO, Sharma G, Mathews DH. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinformatics 2007;8:130. https://doi.org/10.1186/1471-2105-8-130
  12. Weinberg Z, Ruzzo WL. Sequence-based heuristics for faster annotation of non-coding RNA families. Bioinformatics 2006; 22:35-39. https://doi.org/10.1093/bioinformatics/bti743
  13. Shen L, Waterland RA. Methods of DNA methylation analysis. Curr Opin Clin Nutr Metab Care 2007;10:576-581. https://doi.org/10.1097/MCO.0b013e3282bf6f43
  14. Bailey T, Krajewski P, Ladunga I, Lefebvre C, Li Q, Liu T, et al. Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Comput Biol 2013;9:e1003326. https://doi.org/10.1371/journal.pcbi.1003326
  15. ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 2004;306:636-640. https://doi.org/10.1126/science.1105136
  16. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012;489:57-74. https://doi.org/10.1038/nature11247
  17. Li W, Meyer CA, Liu XS. A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences. Bioinformatics 2005;21 Suppl 1:i274-i282. https://doi.org/10.1093/bioinformatics/bti1046
  18. Xu H, Wei CL, Lin F, Sung WK. An HMM approach to genome- wide identification of differential histone modification sites from ChIP-seq data. Bioinformatics 2008;24:2344-2349. https://doi.org/10.1093/bioinformatics/btn402
  19. Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol 2010;28:817-825. https://doi.org/10.1038/nbt.1662
  20. Lieberfarb ME, Lin M, Lechpammer M, Li C, Tanenbaum DM, Febbo PG, et al. Genome-wide loss of heterozygosity analysis from laser capture microdissected prostate cancer using single nucleotide polymorphic allele (SNP) arrays and a novel bioinformatics platform dChipSNP. Cancer Res 2003;63:4781-4785.
  21. Baum LE, Petrie T, Soules G, Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 1970;41:164-171. https://doi.org/10.1214/aoms/1177697196
  22. Ji H, Wong WH. TileMap: create chromosomal map of tiling array hybridizations. Bioinformatics 2005;21:3629-3636. https://doi.org/10.1093/bioinformatics/bti593
  23. Martin-Magniette ML, Mary-Huard T, Berard C, Robin S. ChIPmix: mixture model of regressions for two-color ChIPchip analysis. Bioinformatics 2008;24:i181-i186. https://doi.org/10.1093/bioinformatics/btn280
  24. Johannes F, Wardenaar R, Colome-Tatche M, Mousson F, de Graaf P, Mokry M, et al. Comparing genome-wide chromatin profiles using ChIP-chip or ChIP-seq. Bioinformatics 2010;26: 1000-1006. https://doi.org/10.1093/bioinformatics/btq087
  25. Moghaddam AM, Roudier F, Seifert M, Bérard C, Magniette ML, Ashtiyani RK, et al. Additive inheritance of histone modifications in Arabidopsis thaliana intra-specific hybrids. Plant J 2011;67:691-700. https://doi.org/10.1111/j.1365-313X.2011.04628.x
  26. Seifert M, Cortijo S, Colome-Tatche M, Johannes F, Roudier F, Colot V. MeDIP-HMM: genome-wide identification of distinct DNA methylation states from high-density tiling arrays. Bioinformatics 2012;28:2930-2939. https://doi.org/10.1093/bioinformatics/bts562
  27. Arand J, Spieler D, Karius T, Branco MR, Meilinger D, Meissner A, et al. In vivo control of CpG and non-CpG DNA methylation by DNA methyltransferases. PLoS Genet 2012;8:e1002750. https://doi.org/10.1371/journal.pgen.1002750
  28. Jaschek R, Tanay A. Spatial clustering of multivariate genomic and epigenomic information. Res Comput Mol Biol 2009;5541:170-183. https://doi.org/10.1007/978-3-642-02008-7_12
  29. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 2011;473:43-49. https://doi.org/10.1038/nature09906

Cited by

  1. A critical assessment of hidden markov model sub-optimal sampling strategies applied to the generation of peptide 3D models vol.37, pp.21, 2016, https://doi.org/10.1002/jcc.24422