DOI QR코드

DOI QR Code

Comparative analysis of commonly used peak calling programs for ChIP-Seq analysis

  • Jeon, Hyeongrin (Department of Life Sciences, Pohang University of Science and Technology (POSTECH)) ;
  • Lee, Hyunji (Department of Life Sciences, Pohang University of Science and Technology (POSTECH)) ;
  • Kang, Byunghee (Department of Life Sciences, Pohang University of Science and Technology (POSTECH)) ;
  • Jang, Insoon (Department of Life Sciences, Pohang University of Science and Technology (POSTECH)) ;
  • Roh, Tae-Young (Department of Life Sciences, Pohang University of Science and Technology (POSTECH))
  • Received : 2020.10.06
  • Accepted : 2020.11.22
  • Published : 2020.12.31

Abstract

Chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-Seq) is a powerful technology to profile the location of proteins of interest on a whole-genome scale. To identify the enrichment location of proteins, many programs and algorithms have been proposed. However, none of the commonly used peak calling programs could accurately explain the binding features of target proteins detected by ChIP-Seq. Here, publicly available data on 12 histone modifications, including H3K4ac/me1/me2/me3, H3K9ac/me3, H3K27ac/me3, H3K36me3, H3K56ac, and H3K79me1/me2, generated from a human embryonic stem cell line (H1), were profiled with five peak callers (CisGenome, MACS1, MACS2, PeakSeq, and SISSRs). The performance of the peak calling programs was compared in terms of reproducibility between replicates, examination of enriched regions to variable sequencing depths, the specificity-to-noise signal, and sensitivity of peak prediction. There were no major differences among peak callers when analyzing point source histone modifications. The peak calling results from histone modifications with low fidelity, such as H3K4ac, H3K56ac, and H3K79me1/me2, showed low performance in all parameters, which indicates that their peak positions might not be located accurately. Our comparative results could provide a helpful guide to choose a suitable peak calling program for specific histone modifications.

Keywords

Acknowledgement

This work was supported by grants from the National Research Foundation of Korea (NRF-2014M3C9A3064548 and NRF-2017M3C9A6047625) and BK21 Plus program funded by the Ministry of Education, Republic of Korea (10Z20130012243).

References

  1. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, et al. High-resolution profiling of histone methylations in the human genome. Cell 2007;129:823-837. https://doi.org/10.1016/j.cell.2007.05.009
  2. Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science 2007;316:1497-1502. https://doi.org/10.1126/science.1141319
  3. Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 2009;10:669-680. https://doi.org/10.1038/nrg2641
  4. Barth TK, Imhof A. Fast signals and slow marks: the dynamics of histone modifications. Trends Biochem Sci 2010;35:618-626. https://doi.org/10.1016/j.tibs.2010.05.006
  5. Zhou VW, Goren A, Bernstein BE. Charting histone modifications and the functional organization of mammalian genomes. Nat Rev Genet 2011;12:7-18.
  6. Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res 2012;22:1813-1831. https://doi.org/10.1101/gr.136184.111
  7. Ji H, Jiang H, Ma W, Johnson DS, Myers RM, Wong WH. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat Biotechnol 2008;26:1293-1300. https://doi.org/10.1038/nbt.1505
  8. Jothi R, Cuddapah S, Barski A, Cui K, Zhao K. Genome-wide identification of in vivo protein-DNA binding sites from ChIPSeq data. Nucleic Acids Res 2008;36:5221-5231. https://doi.org/10.1093/nar/gkn488
  9. Rozowsky J, Euskirchen G, Auerbach RK, Zhang ZD, Gibson T, Bjornson R, et al. PeakSeq enables systematic scoring of ChIPseq experiments relative to controls. Nat Biotechnol 2009;27:66-75. https://doi.org/10.1038/nbt.1518
  10. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 2008;9:R137. https://doi.org/10.1186/gb-2008-9-9-r137
  11. Chen Y, Negre N, Li Q, Mieczkowska JO, Slattery M, Liu T, et al. Systematic evaluation of factors influencing ChIP-seq fidelity. Nat Methods 2012;9:609-614. https://doi.org/10.1038/nmeth.1985
  12. Jung YL, Luquette LJ, Ho JW, Ferrari F, Tolstorukov M, Minoda A, et al. Impact of sequencing depth in ChIP-seq experiments. Nucleic Acids Res 2014;42:e74. https://doi.org/10.1093/nar/gku178
  13. Laajala TD, Raghav S, Tuomela S, Lahesmaa R, Aittokallio T, Elo LL. A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments. BMC Genomics 2009;10:618. https://doi.org/10.1186/1471-2164-10-618
  14. Malone BM, Tan F, Bridges SM, Peng Z. Comparison of four ChIP-Seq analytical algorithms using rice endosperm H3K27 trimethylation profiling data. PLoS One 2011;6:e25260. https://doi.org/10.1371/journal.pone.0025260
  15. Rye MB, Saetrom P, Drablos F. A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs. Nucleic Acids Res 2011;39:e25. https://doi.org/10.1093/nar/gkq1187
  16. Wilbanks EG, Facciotti MT. Evaluation of algorithm performance in ChIP-seq peak detection. PLoS One 2010;5:e11471. https://doi.org/10.1371/journal.pone.0011471
  17. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009;10:R25. https://doi.org/10.1186/gb-2009-10-3-r25
  18. Kharchenko PV, Tolstorukov MY, Park PJ. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol 2008;26:1351-1359. https://doi.org/10.1038/nbt.1508
  19. Pickrell JK, Gaffney DJ, Gilad Y, Pritchard JK. False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions. Bioinformatics 2011;27:2144-2146. https://doi.org/10.1093/bioinformatics/btr354
  20. Amemiya HM, Kundaje A, Boyle AP. The ENCODE blacklist: identification of problematic regions of the genome. Sci Rep 2019;9:9354. https://doi.org/10.1038/s41598-019-45839-z
  21. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010;26:841-842. https://doi.org/10.1093/bioinformatics/btq033
  22. Li Q, Brown JB, Huang H, Bickel PJ. Measuring reproducibility of high-throughput experiments. Ann Appl Stat 2011;5:1752-1779.