DOI QR코드

DOI QR Code

Comparative analysis of HiSeq3000 and BGISEQ-500 sequencing platform with shotgun metagenomic sequencing data

  • Animesh Kumar (Center for Bioinformatics, Department of Chemistry, UiT The Arctic University of Norway) ;
  • Espen M. Robertsen (Center for Bioinformatics, Department of Chemistry, UiT The Arctic University of Norway) ;
  • Nils P. Willassen (Center for Bioinformatics, Department of Chemistry, UiT The Arctic University of Norway) ;
  • Juan Fu (Faculty of Biosciences, Department of Livestock and Aquaculture Science, Norwegian University of Life Sciences) ;
  • Erik Hjerde (Center for Bioinformatics, Department of Chemistry, UiT The Arctic University of Norway)
  • Received : 2023.09.18
  • Accepted : 2023.12.04
  • Published : 2023.12.31

Abstract

Recent advances in sequencing technologies and platforms have enabled to generate metagenomics sequences using different sequencing platforms. In this study, we analyzed and compared shotgun metagenomic sequences generated by HiSeq3000 and BGISEQ-500 platforms from 12 sediment samples collected across the Norwegian coast. Metagenomics DNA sequences were normalized to an equal number of bases for both platforms and further evaluated by using different taxonomic classifiers, reference databases, and assemblers. Normalized BGISEQ-500 sequences retained more reads and base counts after preprocessing, while a slightly higher fraction of HiSeq3000 sequences were taxonomically classified. Kaiju classified a higher percentage of reads relative to Kraken2 for both platforms, and comparison of reference database for taxonomic classification showed that MAR database outperformed RefSeq. Assembly using MEGAHIT produced longer assemblies and higher total contigs count in majority of HiSeq3000 samples than using metaSPAdes, but the assembly statistics notably improved with unprocessed or normalized reads. Our results indicate that both platforms perform comparably in terms of the percentage of taxonomically classified reads and assembled contig statistics for metagenomics samples. This study provides valuable insights for researchers in selecting an appropriate sequencing platform and bioinformatics pipeline for their metagenomics studies.

Keywords

Acknowledgement

Illumina HiSeq3000 sequencing was performed by the Norwegian Sequencing Centre (www.sequencing.uio.no), a national technology platform hosted by the University of Oslo and Oslo University Hospital and supported by the "Functional Genomics" and "Infrastructure" programs of the Research Council of Norway and the Southeastern Regional Health Authorities. BGI Tech Solutions (Hong Kong) Co., Ltd. provided assistance with sequencing using BGISEQ-500 sequencing platforms, acting as an overseas sample receiving site and temporary storage point. We would also like to acknowledge Dr. Tao Jin for help with sequencing at CNGB. The computations were performed on a local compute cluster and resources provided by Sigma2, the National Infrastructure for High-Performance Computing and Data Storage, Norway. This work was supported by UiT The Arctic University of Norway.

References

  1. Karsenti E, Acinas SG, Bork P, Bowler C, De Vargas C, Raes J, et al. A holistic approach to marine eco-systems biology. PLoS Biol 2011;9:e1001177.
  2. Bernard G, Pathmanathan JS, Lannes R, Lopez P, Bapteste E. Microbial dark matter investigations: how microbial studies transform biological knowledge and empirically sketch a logic of scientific discovery. Genome Biol Evol 2018;10:707-715. https://doi.org/10.1093/gbe/evy031
  3. Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, et al. Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform Biol Insights 2015;9:75-88. https://doi.org/10.4137/BBI.S12462
  4. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 2016;17:333-351.  https://doi.org/10.1038/nrg.2016.49
  5. Jeon SA, Park JL, Kim JH, Kim JH, Kim YS, Kim JC, et al. Comparison of the MGISEQ-2000 and Illumina HiSeq 4000 sequencing platforms for RNA sequencing. Genomics Inform 2019;17:e32.
  6. Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nat Biotechnol 2017;35:833-844. https://doi.org/10.1038/nbt.3935
  7. Fang C, Zhong H, Lin Y, Chen B, Han M, Ren H, et al. Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing. Gigascience 2018;7:gix133.
  8. Marcelino VR, Clausen P, Buchmann JP, Wille M, Iredell JR, Meyer W, et al. CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data. Genome Biol 2020;21:103.
  9. Yang C, Chowdhury D, Zhang Z, Cheung WK, Lu A, Bian Z, et al. A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Comput Struct Biotechnol J 2021;19:6301-6314. https://doi.org/10.1016/j.csbj.2021.11.028
  10. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019;20:257.
  11. Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun 2016;7:11257.
  12. Bokulich NA, Kaehler BD, Rideout JR, Dillon M, Bolyen E, Knight R, et al. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin. Microbiome 2018;6:90.
  13. O'Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2016;44:D733-745. https://doi.org/10.1093/nar/gkv1189
  14. Klemetsen T, Raknes IA, Fu J, Agafonov A, Balasundaram SV, Tartari G, et al. The MAR databases: development and implementation of databases specific for marine metagenomics. Nucleic Acids Res 2018;46:D692-D699. https://doi.org/10.1093/nar/gkx1036
  15. Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil PA, Hugenholtz P. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res 2022;50:D785-D794. https://doi.org/10.1093/nar/gkab776
  16. Andrews S, Krueger F, Segonds-Pichon A, Biggins L, Krueger C, Wingett S. FastQC: a quality control tool for high throughput sequence data. Cambridge: Babrahma Bioinfromatics, 2010. Accessed 2023 May 10. Avilable from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
  17. Wingett SW, Andrews S. FastQ Screen: a tool for multi-genome mapping and quality control. F1000Res 2018;7:1338.
  18. McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One 2013;8:e61217.
  19. Li D, Luo R, Liu CM, Leung CM, Ting HF, Sadakane K, et al. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 2016;102:3-11. https://doi.org/10.1016/j.ymeth.2016.02.020
  20. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res 2017;27:824-834. https://doi.org/10.1101/gr.213959.116
  21. Mikheenko A, Saveliev V, Gurevich A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 2016;32:1088-1090. https://doi.org/10.1093/bioinformatics/btv697
  22. Gu Z. Complex heatmap visualization. iMeta 2022;1:e43.
  23. van der Walt AJ, van Goethem MW, Ramond JB, Makhalanyane TP, Reva O, Cowan DA. Assembling metagenomes, one community at a time. BMC Genomics 2017;18:521.
  24. Chouvarine P, Wiehlmann L, Moran Losada P, DeLuca DS, Tummler B. Filtration and normalization of sequencing read data in whole-metagenome shotgun samples. PLoS One 2016;11:e0165015.
  25. Barlow A, Fortes GG, Dalen L, Pinhasi R, Gasparyan B, Rabeder G, et al. Massive influence of DNA isolation and library preparation approaches on palaeogenomic sequencing data. Preprint at BioRxiv https://doi.org/10.1101/075911 (2016).
  26. Pearman WS, Freed NE, Silander OK. Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads. BMC Bioinformatics 2020;21:220.
  27. Bowie JU, Reidhaar-Olson JF, Lim WA, Sauer RT. Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science 1990;247:1306-1310. https://doi.org/10.1126/science.2315699
  28. Nasko DJ, Koren S, Phillippy AM, Treangen TJ. RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification. Genome Biol 2018;19:165.
  29. Ayling M, Clark MD, Leggett RM. New approaches for metagenome assembly with short reads. Brief Bioinform 2020;21:584-594. https://doi.org/10.1093/bib/bbz020