DOI QR코드

DOI QR Code

Integrative Comparison of Burrows-Wheeler Transform-Based Mapping Algorithm with de Bruijn Graph for Identification of Lung/Liver Cancer-Specific Gene

  • Ajaykumar, Atul (Department of Information, Communication and Electronics Engineering, The Catholic University of Korea) ;
  • Yang, Jung Jin (Department of Computer Science Engineering, The Catholic University of Korea)
  • Received : 2021.10.13
  • Accepted : 2021.12.23
  • Published : 2022.02.28

Abstract

Cancers of the lung and liver are the top 10 leading causes of cancer death worldwide. Thus, it is essential to identify the genes specifically expressed in these two cancer types to develop new therapeutics. Although many messenger RNA (mRNA) sequencing data related to these cancer cells are available due to the advancement of next-generation sequencing (NGS) technologies, optimized data processing methods need to be developed to identify the novel cancer-specific genes. Here, we conducted an analytical comparison between Bowtie2, a Burrows-Wheeler transform-based alignment tool, and Kallisto, which adopts pseudo alignment based on a transcriptome de Bruijn graph using mRNA sequencing data on normal cells and lung/liver cancer tissues. Before using cancer data, simulated mRNA sequencing reads were generated, and the high Transcripts Per Million (TPM) values were compared. mRNA sequencing reads data on lung/liver cancer cells were also extracted and quantified. While Kallisto could directly give the output in TPM values, Bowtie2 provided the counts. Thus, TPM values were calculated by processing the Sequence Alignment Map (SAM) file in R using package Rsubread and subsequently in python. The analysis of the simulated sequencing data revealed that Kallisto could detect more transcripts and had a higher overlap over Bowtie2. The evaluation of these two data processing methods using the known lung cancer biomarkers concludes that in standard settings without any dedicated quality control, Kallisto is more effective at producing faster and more accurate results than Bowtie2. Such conclusions were also drawn and confirmed with the known biomarkers specific to liver cancer.

Keywords

Acknowledgement

Special thanks for Dong Woo Ko for his help in data processing and for Chun Chang to provide the lung and liver biomarkers and their associated references. This work was supported by the Catholic University of Korea, Research Fund, 2020.

References

  1. Siegel, Rebecca L, Kimberly DM, Ahmedin J. 2019. Cancer statistics, 2019. CA Cancer J. Clin. 69: 7-34. https://doi.org/10.3322/caac.21551
  2. Street W. 2019. Cancer Facts & Figures 2019. American Cancer Society: Atlanta, GA, USA. Available from https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts-figures/cancer-facts-figures-2019.html.
  3. Islam S, Ronald C W. 2013. Advanced imaging (positron emission tomography and magnetic resonance imaging) and image-guided biopsy in initial staging and monitoring of therapy of lung cancer. Cancer J. 19: 208-216. https://doi.org/10.1097/ppo.0b013e318295185f
  4. Toyoda H, Kumada T, Tada T, Sone Y, Kaneoka Y, Maeda A. 2015. Tumor markers for hepatocellular carcinoma: Simple and significant predictors of outcome in patients with HCC. Liver Cancer 4: 126-136. https://doi.org/10.1159/000367735
  5. Blanco-Prieto S, De CL, Rodriguez-Girondo M. 2017. Highly sensitive marker panel for guidance in lung cancer rapid diagnostic units. Sci. Rep. 7: 41151. https://doi.org/10.1038/srep41151
  6. Kyoko O, Koichi T, Miiru I, Taishi H, Kazuto F, Yoichi N. 2013. Diagnostic value of CEA and CYFRA 21-1 tumor markers in primary lung cancer. Lung Cancer 80: 45-49. https://doi.org/10.1016/j.lungcan.2013.01.002
  7. Rafael M, Ramon MM, Josep MA, et al. 2016. Assessment of a combined panel of six serum tumor markers for lung cancer. Am. J. Respir. Crit. Care Med. 193: 427-437. https://doi.org/10.1164/rccm.201404-0603oc
  8. Man-Fung Y and Ching-Lung L. 2005. Serological markers of liver cancer. Best Pract. Res. Clin. Gastroenterol. 19: 91-99. https://doi.org/10.1016/j.bpg.2004.10.003
  9. Yan-Jie Z, Qiang J, Guan-Cheng L. 2013. Tumor markers for hepatocellular carcinoma. Mol. Clin. Oncol. 1: 593-598. https://doi.org/10.3892/mco.2013.119
  10. Fonseca NA, Rung J, Brazma A, Marioni JC. 2012. Tools for mapping high-throughput sequencing data. Bioinformatics 28: 3169-3177. https://doi.org/10.1093/bioinformatics/bts605
  11. Burrows M, Wheeler DJ. 1994. A block-sorting lossless data compression algorithm. pp. 124. Technical Report.
  12. Keel BN, Snelling WM. 2018. Comparison of burrows-wheeler transform-based mapping algorithms used in high-throughput whole-genome sequencing: Application to illumina data for livestock genomes. Front. Genet. 9: 35. https://doi.org/10.3389/fgene.2018.00035
  13. Langmead B, Wilks C, Antonescu V, Charles R. 2018. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics. 35: 421-432. https://doi.org/10.1093/bioinformatics/bty648
  14. Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9: 357-359. https://doi.org/10.1038/nmeth.1923
  15. Kim D, Langmead B, Salzberg SL. 2015. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12: 357-360. https://doi.org/10.1038/nmeth.3317
  16. Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754-1760. https://doi.org/10.1093/bioinformatics/btp324
  17. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29: 15-21. https://doi.org/10.1093/bioinformatics/bts635
  18. Du Y, Huang Q, Arisdakessian C, Garmire LX. 2020. Evaluation of STAR and Kallisto on single cell RNA-Seq data alignment. G3 (Bethesda). 10: 1775-1783. https://doi.org/10.1534/g3.120.401160
  19. Compeau, PEC, Pevzner PA, Tesler G. 2011. How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol. 29: 987-991. https://doi.org/10.1038/nbt.2023
  20. Nicolas L, Harold P, Pall M, Lior P. 2016. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34: 525-527. https://doi.org/10.1038/nbt.3519
  21. Yang L, Gordon KS, Wei S. 2019. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res. 47: e47. https://doi.org/10.1093/nar/gkz114
  22. Rasko L, Hideaki S. 2010. The sequence read archive. Nucleic Acids Res. 39: 19-21. https://doi.org/10.1093/nar/gkq768