Fig. 1. Cost of DNA analysis
Fig. 2. NGS analysis process of [28]
Fig. 3. Overall Architecture of Halvade [24]
Fig. 4. Workflow of SparkGA
Table 1. Tools for each NGS steps
References
- M. Choi, "Development Trends of Medical Genomics Using Next Generation Sequencing Techniques," Molecular Cell Biology Newsletter, Apr. 2014.
- https://www.genome.gov/sequencingcostsdata/
- M. C. Schatz, B. Langmead, and S. L. Salzberg, "Cloud Computing and the DNA Data Race," Nature Biotechnology, vol. 28, no. 7, 2010, pp. 691-693. https://doi.org/10.1038/nbt0710-691
- M. Baker, "Next-generation Sequencing: Adjusting to Data Overload," Nature Methods, vol. 7, no. 7, 2010, pp. 495-499. https://doi.org/10.1038/nmeth0710-495
- B. Calabrese and M. Cannataro, "Bioinformatics and Microarray Data Analysis on the Cloud," Methods in Molecular Biology, vol. 1375, 2016, pp. 25-39.
- http://ngenebio.com/
- C. Lee, Bioinformatics Analysis of Next-Generation Sequence Data, BRIC View Trend Report, 2016
- A. Geraldine, V. Auwera, M. O. Carneiro, C. Hartl, R. Poplin, G. Angel, A. Levy-Moonshine, T. Jordan, K. Shakir, D. Roazen, J. Thibault, E. Banks, K. V. Garimella, D. Altshuler, S. Gabriel, and M. A. DePristo, "From FastQ Data to High Confidence Variant Calls: the Genome Analysis Toolkit Best Practices Pipeline," Current Protocols in Bioinformatics, 2013, pp. 11-10.
- https://www.bioin.or.kr/board.do?cmd=view&bid=tech&num=216321
- BWA, https://github.com/lh3/bwa
- GATK, https://software.broadinstitute.org/gatk/
- B. Langmead, C. Trapnell, M. Pop, and S. Salzberg, "Ultrafast and Memory-efficient Alignment of Short DNA Sequences to the Human Genome," Genome biology, vol. 10, no. 3, 2009.
- http://broadinstitute.github.io/picard/
- https://github.com/GregoryFaust/samblaster
- https://github.com/broadinstitute/mutect
- https://hpc.nih.gov/apps/MutSig.html
- https://github.com/ekg/freebayes
- https://github.com/WGLab/doc-ANNOVAR/
- https://www.ensembl.org/vep
- https://gencore.bio.nyu.edu/variant-calling-pipeline/
- https://wikis.utexas.edu/display/bioiteam/DNAseq+Variant+Calling+Pipeline
- https://hadoop.apache.org/
- https://spark.apache.org/
- D. Decap, J. Reumers, C. Herzeel, P. Costanza, and J. Fostier, "Halvade: Scalable Sequence Analysis with MapReduce," Bioinformatics, vol. 31, no. 15, 2015, pp. 2482-2488. https://doi.org/10.1093/bioinformatics/btv179
- https://github.com/citiususc/BigBWA
- https://github.com/citiususc/SparkBWA
- J. Lee, H. Lee, J. Moon, H. Kang, S. Song, and S. Yu, "Parallel and Distributed PCR Duplication Marking Algorithm Integrated with Genome Sequence Alignment by Using Streaming Technology," Proceedings of TBC 2017, 2017.
- H. Mushtaq and Z. Al-Ars, "Cluster-based Apache Spark Implementation of the GATK DNA Analysis Pipeline," In Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2015, pp. 1471-1477.
- H. Mushtaq, F. Liu, C. Costa, G. Liu, P. Hofstee, and Z. Al-Ars, "Sparkga: A Spark Framework for Cost Effective, Fast and Accurate DNA Analysis at Scale," In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, 2017, pp. 148-157.