DOI QR코드

DOI QR Code

A Comparative Analysis of the Illumina Truseq Synthetic Long-read Haplotyping Sequencing Platform versus the 10X Genomics Chromium Genome Sequencing Platform for Haplotype Phasing and the Identification of Single-nucleotide variants (SNVs) in Hanwoo (Korean Native Cattle)

일루미나에서 제작된 TSLRH (Truseq Synthetic Long-Read Haplotyping)와 10X Genomics에서 제작된 The Chromium Genome 시퀀싱 플랫폼을 이용하여 생산된 한우(한국 재래 소)의 반수체형 페이징 및 단일염기서열변이 비교 분석

  • Park, Woncheoul (Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA) ;
  • Srikanth, Krishnamoorthy (Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA) ;
  • Park, Jong-Eun (Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA) ;
  • Shin, Donghyun (Departments of Animal Biotechnology, Chonbuk National University) ;
  • Ko, Haesu (Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA) ;
  • Lim, Dajeong (Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA) ;
  • Cho, In-Cheol (Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA)
  • 박원철 (농촌진흥청 국립축산과학원 동물유전체과) ;
  • 크리스나무티 스리칸스 (농촌진흥청 국립축산과학원 동물유전체과) ;
  • 박종은 (농촌진흥청 국립축산과학원 동물유전체과) ;
  • 신동현 (전북대학교 동물공학과) ;
  • 고해수 (농촌진흥청 국립축산과학원 동물유전체과) ;
  • 임다정 (농촌진흥청 국립축산과학원 동물유전체과) ;
  • 조인철 (농촌진흥청 국립축산과학원 동물유전체과)
  • Received : 2018.07.13
  • Accepted : 2018.11.19
  • Published : 2019.01.30

Abstract

In Hanwoo cattle (Korean native cattle), there is a scarcity of comparative analysis papers using highdepth sequencing and haplotype phasing, particularly a comparative analysis of the Truseq Synthetic Long-Read Haplotyping sequencing platform serviced by Illumina (TSLRH) versus the Chromium Genome Sequencing platform serviced by 10X Genomics (10XG). DNA was extracted from the sperm of a Hanwoo breeding bull (ID: TN1505D2184/27214) provided by Hanwoo research canter and used for the generation of sequence data from both the sequencing platforms. We then identified SNVs using an appropriate analysis pipeline tailored for each platform. The TSLRH and 10XG platforms generated a total of 355,208,304 and 1,632,772,004 reads, respectively, corresponding to a Q30 (%) of 89.04% and 88.60%, respectively, of which 351,992,768(99.09%) and 1,526,641,824(93.50%) were successfully mapped. For the TSLRH and 10XG platforms, the mean depth of the sequencing was 13.04X and 74.3X, the longest phase block was 1,982,706 bp and 1,480,081 bp, the N50 phase block was 57,637 bp and 114,394 bp, the total number of SNVs identified was 4,534,989 and 8,496,813, and the total phased rate was 72.29% and 87.67%, respectively. Moreover, for each chromosome, we identified unique and common SNVs using both sequencing platforms. The number of SNVs was directly proportional to the length of the chromosome. Based on our results, we recommend the use of the 10XG platform for haplotype phasing and SNV identification, as it generated a longer N50 phase block, in addition to a higher mean depth, total number of reads, total number of SNVs, and phase rate, than the TSLRH platform.

한우(한국 재래 소)에서 반수체형 페이징을 위한 고밀도 시퀀싱을 이용한 비교 분석 논문은 많지가 않다. 이런 고밀도 시퀀싱 플랫폼 중에서, 일루미나에서 서비스 하는 Truseq Synthetic Long-Read Haplotyping 시퀀싱 플랫폼(TSLRH)과 10X Genomics에서 서비스하는 The Chromium Genome 시퀀싱 플랫폼을 특별히 비교 분석하는 논문은 없다. 우리는 한우 연구소의 한우 종모우(아이디: TN1505D2184 or 27214)의 정액에서 DNA를 추출하였으며, 이 DNA로부터 각각의 시퀀싱 플랫폼을 이용하여 시퀀싱 데이터를 생산하였다. 그 후, 우리는 각각의 시퀀싱 플랫폼에 맞는 분석 방법을 이용하여 단일염기서열변이들은 찾아냈다. 그 결과, TSLRH과 10XG의 전체 리드 수는 각각 355,208,304, 1,632,772,004, 맵핑 리드의 개수는 351,992,768(99.09%), 1,526,641,824(93.50%), Q30(%)은 89.04%, 88.60%, 평균 밀도는 13.04X, 74.3X, 가장 긴 페이즈 블락은 1,982,706bp, 1,480,081 bp, N50 페이즈 블락은 57,637 bp, 114,394 bp, 전체 단일염기서열변이는 4,534,989, 8,496,813, 전체 페이징 비율은 72.29%, 87.67%였다. 더욱이, 우리는 각각의 시퀀싱 플랫폼을 비교해서 각각의 시퀀싱 플랫폼의 고유한 단일염기서열변이와 두 시퀀싱 플랫폼에서 공통적으로 존재하는 단일염기서열변이를 각 염색체 별로 확인하였으며, 단일염기서열변이의 개수는 염색체 길이에 정비례한다는 결과를 확인하였다. 결론적으로, 본 연구에서 추천하는 바는 연구비가 충분하지 않을 시에는 TSLRH 보다 10XG을 사용하는 것을 추천한다. 왜냐하면 전체 리드 및 단일염기서열변이 개수, N50 페이즈 블락, 가장 긴 페이즈 블락, 페이즈 비율 그리고 평균 밀도 등이 TSLRH 보다 10XG가 더 높거나 좋기 때문이다.

Keywords

SMGHBM_2019_v29n1_1_f0001.png 이미지

Fig. 1. The graphical scheme of the experimental design, sequencing and bioinformatics method.

SMGHBM_2019_v29n1_1_f0002.png 이미지

Fig. 2. Histogram of the number of SNVs from TSLRH of Illumina and The Chromium Genome of 10X Genomics: (A) Histogram of the number of SNPs of TSLRH, 10X and common. (B) Histogram of the number of Indels of TSLRH, 10X and common.

SMGHBM_2019_v29n1_1_f0003.png 이미지

Fig. 3. Visualization of phasing block, haplotype, structural variant and mapped read from The Chromium Genome of 10X Genomics by using Loupe program: (A) Phasing block of each chromosome. (B) Structural variant. (C) Haplotype. (D) Mapped read.

Table 1. Summary of the basic performance results of The Chromium Genome of 10X Genomics and TSLRH of Illumina

SMGHBM_2019_v29n1_1_t0001.png 이미지

Table 2. Summary of the number of SNVs and the rate of type SNVs and phasing result from The Chromium Genome of 10X Genomics and TSLRH of Illumina sequencing data

SMGHBM_2019_v29n1_1_t0002.png 이미지

References

  1. Armstrong, E. E., Taylor, R. W., Prost, S., Blinston, P., van der Meer, E., Madzikanda, H., Mufute, O., Mandisodza, R., Steulpnagel, J. and Sillero-Zubiri, C. 2017. Entering the era of conservation genomics: Cost-effective assembly of the African wild dog genome using linked long reads. bioRxiv 195180.
  2. Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A., Handsaker, R. E., Lunter, G., Marth, G. T. and Sherry, S. T. 2011. The variant call format and VCFtools. Bioinformatics 27, 2156. https://doi.org/10.1093/bioinformatics/btr330
  3. Elsik, C. G., Tellam, R. L. and Worley, K. C. 2009. The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science 324, 522. https://doi.org/10.1126/science.1169588
  4. Greer, S. U., Nadauld, L. D., Lau, B. T., Chen, J., , C., Ford, J. M., Kuo, C. J.Wood- Bouwens and Ji, H. P. 2017. Linked read sequencing resolves complex genomic rearrangements in gastric cancer metastases. Genome Med. 9, 57. https://doi.org/10.1186/s13073-017-0447-8
  5. Jo, C., Cho, S., Chang, J. and Nam, K. 2012. Keys to production and processing of Hanwoo beef: A perspective of tradition and science. Animal Frontiers 2, 32.
  6. Kim, D., Kim, Y., Chung, Y., Yoo, Y. and Park, B. 1993. A study on the consumer's attitude to beef-(1)-Consumer's purchasing pattern and preference. RDA Journal of Agricultural Science (Korea Republic).
  7. Kuhn, C., Freyer, G., Weikard, R., Goldammer, T. and Schwerin, M. 1999. Detection of QTL for milk production traits in cattle by application of a specifically developed marker map of BTA6. Anim. Genet. 30, 333. https://doi.org/10.1046/j.1365-2052.1999.00487.x
  8. Kuleshov, V., Xie, D., Chen, R., Pushkarev, D., Ma, Z., Blauwkamp, T., Kertesz, M. and Snyder, M. 2014. Whole-genome haplotyping using long reads and statistical methods. Nat. Biotechnol. 32, 261. https://doi.org/10.1038/nbt.2833
  9. Lee, K. T., Chung, W. H., Lee, S. Y., Choi, J. W., Kim, J., Lim, D., Lee, S., Jang, G. W., Kim, B. and Choy, Y. H. 2013. Whole-genome resequencing of Hanwoo (Korean cattle) and insight into regions of homozygosity. BMC Genomics 14, 519. https://doi.org/10.1186/1471-2164-14-519
  10. Lee, S. H., Choi, B. H., Lim, D., Gondro, C., Cho, Y. M., Dang, C. G., Sharma, A., Jang, G. W., Lee, K. T. and Yoon, D. 2013. Genome-wide association study identifies major loci for carcass weight on BTA14 in Hanwoo (Korean cattle). PLoS One 8, e74677. https://doi.org/10.1371/journal.pone.0074677
  11. Lim, D., Lee, S. H., Kim, N. K., Cho, Y. M., Chai, H. H., Seong, H. H. and Kim, H. 2013. Gene co-expression analysis to characterize genes related to marbling trait in Hanwoo (Korean) cattle. Asian-Australas. J. Anim. Sci. 26, 19. https://doi.org/10.5713/ajas.2012.12375
  12. Mostovoy, Y., Levy-Sakin, M., Lam, J., Lam, E. T., Hastie, A. R., Marks, P., Lee, J., Chu, C., Lin, C. and Dzakula, Z. 2016. A hybrid approach for de novo human genome sequence assembly and phasing. Nat. Methods 13, 587. https://doi.org/10.1038/nmeth.3865
  13. Nielsen, R., Paul, J. S., Albrechtsen, A. and Song, Y. S. 2011. Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 443. https://doi.org/10.1038/nrg2986
  14. Romiguier, J., Ranwez, V., Douzery, E. J. and Galtier, N. 2010. Contrasting GC-content dynamics across 33 mammalian genomes: relationship with life-history traits and chromosome sizes. Genome Res. 20, 1001. https://doi.org/10.1101/gr.104372.109
  15. Snyder, M. W., Adey, A., Kitzman, J. O. and Shendure, J. 2015. Haplotype-resolved genome sequencing: experimental methods and applications. Nat. Rev. Genet 16, 344. https://doi.org/10.1038/nrg3903
  16. Tewhey, R., Bansal, V., Torkamani, A., Topol, E. J. and Schork, N. J. 2011. The importance of phase information for human genomics. Nat. Rev. Genet 12, 215. https://doi.org/10.1038/nrg2950
  17. Xia, L. C., Bell, J. M., Wood-Bouwens, C., Chen, J. J., Zhang, N. R. and Ji, H. P. 2017. Identification of large rearrangements in cancer genomes with barcode linked reads. Nucleic Acids Res. 46, e19. https://doi.org/10.1093/nar/gkx1193
  18. Yeo, J. S., Kim, J. W., Chang, T. K., Park, Y. A. and Nam, D. H. 2000. Utilization of DNA marker-assisted selection in Korean native animals. Biotechnol. Bioprocess Eng. 5, 71. https://doi.org/10.1007/BF02931875
  19. Zheng, G. X., Lau, B. T., Schnall-Levin, M., Jarosz, M., Bell, J. M., Hindson, C. M., Kyriazopoulou-Panagiotopoulou, S., Masquelier, D. A., Merrill, L. and Terry, J. M. 2016. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303. https://doi.org/10.1038/nbt.3432