DOI QR코드

DOI QR Code

Imputation Accuracy from 770K SNP Chips to Next Generation Sequencing Data in a Hanwoo (Korean Native Cattle) Population using Minimac3 and Beagle

Minimac3와 Beagle 프로그램을 이용한 한우 770K chip 데이터에서 차세대 염기서열분석 데이터로의 결측치 대치의 정확도 분석

  • An, Na-Rae (Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Rural Development Administration) ;
  • Son, Ju-Hwan (Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Rural Development Administration) ;
  • Park, Jong-Eun (Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Rural Development Administration) ;
  • Chai, Han-Ha (Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Rural Development Administration) ;
  • Jang, Gul-Won (Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Rural Development Administration) ;
  • Lim, Dajeong (Animal Genomics and Bioinformatics Division, National Institute of Animal Science, Rural Development Administration)
  • 안나래 (국립축산과학원 동물유전체과) ;
  • 손주환 (국립축산과학원 동물유전체과) ;
  • 박종은 (국립축산과학원 동물유전체과) ;
  • 채한화 (국립축산과학원 동물유전체과) ;
  • 장길원 (국립축산과학원 동물유전체과) ;
  • 임다정 (국립축산과학원 동물유전체과)
  • Received : 2018.06.04
  • Accepted : 2018.11.16
  • Published : 2018.11.30

Abstract

Whole genome analysis have been made possible with the development of DNA sequencing technologies and discovery of many single nucleotide polymorphisms (SNPs). Large number of SNP can be analyzed with SNP chips, since SNPs of human as well as livestock genomes are available. Among the various missing nucleotide imputation programs, Minimac3 software is suggested to be highly accurate, with a simplified workflow and relatively fast. In the present study, we used Minimac3 program to perform genomic missing value substitution 1,226 animals 770K SNP chip and imputing missing SNPs with next generation sequencing data from 311 animals. The accuracy on each chromosome was about 94~96%, and individual sample accuracy was about 92~98%. After imputation of the genotypes, SNPs with R Square ($R^2$) values for three conditions were 0.4, 0.6, and 0.8 and the percentage of SNPs were 91%, 84%, and 70% respectively. The differences in the Minor Allele Frequency gave $R^2$ values corresponding to seven intervals (0, 0.025), (0.025, 0.05), (0.05, 0.1), (0.1, 0.2), (0.2, 0.3). (0.3, 0.4) and (0.4, 0.5) of 64~88%. The total analysis time was about 12 hr. In future SNP chip studies, as the size and complexity of the genomic datasets increase, we expect that genomic imputation using Minimac3 can improve the reliability of chip data for Hanwoo discrimination.

DNA 염기서열의 발전과 많은 단일염기서열변이 정보(Single Nucleotide polymorphism, SNP)의 발굴은 유전 분석을 가능하게 만들었다. 단일염기서열변이 정보가 사람의 유전체뿐만 아니라 가축의 유전체에서도 이용할 수 있게 됨에 따라서 SNP 칩 마커를 통해 유전자형의 분석이 가능하게 되었다. 여러 유전자형 대치프로그램 중에서도 Minimac3 소프트웨어는 비교적 정확성이 높고, 계산의 효율성을 위해 분석을 단순화하여 유전자형의 결측치 대치 분석 시간을 단축시킨다. 따라서 본 연구에서는 Minimac3 프로그램을 사용하여 한우 1,226두 770K SNP 칩 데이터와 311두 차세대 염기서열분석 데이터를 이용하여 유전자형 결측치 대치를 실행해 보았다. 그 결과 염색체별 정확도는 약 94~96%의 정확도를 나타냈으며, 개체별 정확도는 약 92~98%의 정확도를 나타냈다. 유전자형의 결측치 대치의 완료 후, R Square ($R^2$) 값이 0.4 이상인 SNP는 총 SNP의 약 91%였다. $R^2$ 값이 0.6 이상인 SNP는 84%였으며, $R^2$ 값이 0.8 이상인 SNP는 70%였다. 대립유전자형빈도 차이를 기준으로 (0, 0.025), (0.025, 0.05), (0.05, 0.1), (0.1, 0.2), (0.2, 0.3), (0.3, 0.4), (0.4, 0.5)의 7구간에 해당하는 $R^2$ 값은 64~88%였다. 결측치 대치의 총 분석 시간은 약 12시간이 걸렸다. 추후의 유전체 데이터 세트의 크기와 복잡성이 증가하는 SNP 칩 연구에서 Minimac3를 사용한 유전체 결측치 대치법은 한우의 판별에 있어서 칩 데이터의 신뢰도를 향상 시킬 수 있을 것으로 본다.

Keywords

SMGHBM_2018_v28n11_1255_f0001.png 이미지

Fig. 1. Line plot showing the accuracy of imputation for each chromosome using Minimac3 & Beagle. X axis is chromosome number, Y axis is imputation accuracy.

SMGHBM_2018_v28n11_1255_f0002.png 이미지

Fig. 2. Bar plot showing the accuracy of imputation of 50 Hanwoo bulls that were genotyped from both whole genome sequence data and 777K SNP chip data. X axis is percentage of accuracy, Y axis is number of animals.

SMGHBM_2018_v28n11_1255_f0003.png 이미지

Fig. 3. Effect of chromosome size on imputation accuracy calculated through analyzing the correlation (R2) between chromosome size & accuracy of imputation. X axis is chromosome number, Y axis is R2.

SMGHBM_2018_v28n11_1255_f0004.png 이미지

Fig. 4. Effect of MAF (Minor Allele Frequency) on imputation accuracy calculated through analyzing the correlation (R2) between MAF & accuracy of imputation. X axis is MAF, Y axis is R2.

Table 1. Number of Single Nucleotide Polymorphism (SNP) in 770K chip-seq and Next Generation Sequencing (NGS)

SMGHBM_2018_v28n11_1255_t0001.png 이미지

Table 2. Accuracy (in %) of the imputed SNPs with Minimac3 & Beagle

SMGHBM_2018_v28n11_1255_t0002.png 이미지

Table 3. Chromosome wise average accuracy (in %) using Minimac3 & Beagle

SMGHBM_2018_v28n11_1255_t0003.png 이미지

Table 4. Chromosome wise time taken by Minimac3 for imputation

SMGHBM_2018_v28n11_1255_t0004.png 이미지

References

  1. Browning, B. L. and Browning, S. R. 2009. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210-223. https://doi.org/10.1016/j.ajhg.2009.01.005
  2. Browning, B. L. and Browning, S. R. 2013. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459-471. https://doi.org/10.1534/genetics.113.150029
  3. Browning, B. L. and Browning, S. R. 2016. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116-126. https://doi.org/10.1016/j.ajhg.2015.11.020
  4. Browning, S. R. and Browning, B. L. 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084-1097. https://doi.org/10.1086/521987
  5. Chud, T. C., Ventura, R. V., Schenkel, F. S., Carvalheiro, R., Buzanskas, M. E., Rosa, J. O., de Alvarenga Mudadu, M., da Silva, M. V. G., Mokry, F. B. and Marcondes, C. R. 2015. Strategies for genotype imputation in composite beef cattle. BMC Genet. 16, 99.
  6. Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A., Handsaker, R. E., Lunter, G., Marth, G. T. and Sherry, S. T. 2011. The variant call format and VCFtools. Bioinformatics 27, 2156-2158. https://doi.org/10.1093/bioinformatics/btr330
  7. Das, S., Forer, L., Schonherr, S., Sidore, C., Locke, A. E., Kwong, A., Vrieze, S. I., Chew, E. Y., Levy, S., McGue, M., Schlessinger, D., Stambolian, D., Loh, P. R., Iacono, W. G., Swaroop, A., Scott, L. J., Cucca, F., Kronenberg, F., Boehnke, M., Abecasis, G. R. and Fuchsberger, C. 2016. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284-1287. https://doi.org/10.1038/ng.3656
  8. Druet, T., Schrooten, C. and De Roos, A. P. W. 2010. Imputation of genotypes from different single nucleotide polymorphism panels in dairy cattle. J. Dairy Sci. 93, 5443-5454. https://doi.org/10.3168/jds.2010-3255
  9. Ellinghaus, D., Schreiber, S., Franke, A. and Nothnagel, M. 2009. Current software for genotype imputation. Hum. Genomics 3, 371.
  10. Hayes, B. J., Bowman, P. J., Daetwyler, H. D., Kijas, J. W. and Van der Werf, J. H. J. 2012. Accuracy of genotype imputation in sheep breeds. Anim. Genet. 43, 72-80.
  11. Hickey, J. M., Kinghorn, B. P., Tier, B., Wilson, J. F., Dunstan, N. and Van der Werf, J. H. J. 2011. A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes. Genet. Sel. Evol. 43, 12. https://doi.org/10.1186/1297-9686-43-12
  12. Hickey, J. M., Crossa, J., Babu, R. and de los Campos, G. 2012. Factors affecting the accuracy of genotype imputation in populations from several maize breeding programs. Crop Sci. 52, 654-663. https://doi.org/10.2135/cropsci2011.07.0358
  13. Howie, B. N., Donnelly, P. and Marchini, J. 2009. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529. https://doi.org/10.1371/journal.pgen.1000529
  14. Iwata, H. and Jannink, J. L. 2010. Marker genotype imputation in a low-marker-density panel with a high-marker-density reference panel: accuracy evaluation in barley breeding lines. Crop Sci. 50, 1269-1278. https://doi.org/10.2135/cropsci2009.08.0434
  15. Johnston, J., Kistemaker, G. and Sullivan, P. G. 2011. Comparison of different imputation methods. Interbull Bulletin. pp. 25-33. Stavanger, Norway.
  16. Li, L., Li, Y., Browning, S. R., Browning, B. L., Slater, A. J., Kong, X., Aponte, J. L., Mooser, V. E., Chissoe, S. L., Whittaker, J. C., Nelson, M. R. and Ehm, M. G. 2011. Performance of genotype imputation for rare variants identified in exons and flanking regions of genes. PLoS One 6, e24945. https://doi.org/10.1371/journal.pone.0024945
  17. Li, Y., Willer, C., Sanna, S. and Abecasis, G. 2009. Genotype imputation. Annu. Rev. Genomics Hum. Genet. 10, 387-406. https://doi.org/10.1146/annurev.genom.9.081307.164242
  18. Loh, P. -R., Danecek, P., Palamara, P. F., Fuchsberger, C., A Reshef, Y., K Finucane, H., Schoenherr, S., Forer, L., McCarthy, S., Abecasis, G. R., Durbin, R. and L Price, A. 2016. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443-1448. https://doi.org/10.1038/ng.3679
  19. Lopes, F. B., Wu, X. L., Li, H., Xu, J., Perkins, T., Genho, J., Ferretti, R., Tait Jr, R. G., Bauck, S. and Rosa, G. J. M. 2018. Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low density SNP genotypes. J. Anim. Breed. Genet. 135, 14-27. https://doi.org/10.1111/jbg.12312
  20. Marchini, J. and Howie, B. 2010. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499-511. https://doi.org/10.1038/nrg2796
  21. Meuwissen, T. H. E., Hayes, B. J. and Goddard, M. E. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819-1829.
  22. Ni, G., Cavero, D., Fangmann, A., Erbe, M. and Simianer, H. 2017. Whole-genome sequence-based genomic prediction in laying chickens with different genomic relationship matrices to account for genetic architecture. Genet. Sel. Evol. 49, 8. https://doi.org/10.1186/s12711-016-0277-y
  23. Ogawa, S., Matsuda, H., Taniguchi, Y., Watanabe, T., Takasuga, A., Sugimoto, Y. and Iwaisaki, H. 2016. Accuracy of imputation of single nucleotide polymorphism marker genotypes from low-density panels in Japanese Black cattle. Anim. Sci. J. 87, 3-12. https://doi.org/10.1111/asj.12393
  24. Sargolzaei, M., Chesnais, J. P. and Schenkel, F. S. 2011. FImpute-An efficient imputation algorithm for dairy cattle populations. J. Dairy Sci. 94, 421.
  25. Sargolzaei, M., Chesnais, J. P. and Schenkel, F. S. 2014. A new approach for efficient genotype imputation using information from relatives. BMC Genomics 15, 478. https://doi.org/10.1186/1471-2164-15-478
  26. Scheet, P. and Stephens, M. 2006. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629-644. https://doi.org/10.1086/502802
  27. Sun, C., Wu, X. L., Weigel, K. A., Rosa, G. J. M., Bauck, S., Woodward, B. W., Schnabel, R. D., Taylor, J. F. and Gianola, D. 2012. An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle. Genet. Res. 94, 133-150. https://doi.org/10.1017/S001667231200033X
  28. Uemoto, Y., Sasaki, S., Sugimoto, Y. and Watanabe, T. 2015. Accuracy of high-density genotype imputation in Japanese Black cattle. Anim. Genet. 46, 388-394. https://doi.org/10.1111/age.12314
  29. VanRaden, P. M., Null, D. J., Sargolzaei, M., Wiggans, G. R., Tooker, M. E., Cole, J. B., Sonstegard, T. S., Connor, E. E., Winters, M., Van Kaam, J. B., Valentini, A., Van Doormaal, B. J., Faust, M. A. and Doak, G. A. 2013. Genomic imputation and evaluation using high-density Holstein genotypes. J. Dairy Sci. 96, 668-678. https://doi.org/10.3168/jds.2012-5702
  30. VanRaden, P. M., O'Connell, J. R., Wiggans, G. R. and Weigel, K. A. 2011. Genomic evaluations with many more genotypes. Genet. Sel. Evol. 43, 10. https://doi.org/10.1186/1297-9686-43-10