DOI QR코드

DOI QR Code

Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes

  • Park, Chanwoo (Department of Statistics, Seoul National University) ;
  • Jiang, Nan (Interdisciplinary Program in Bioinformatics, Seoul National University) ;
  • Park, Taesung (Department of Statistics, Seoul National University)
  • Received : 2019.11.21
  • Accepted : 2019.12.09
  • Published : 2019.12.31

Abstract

The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.

Keywords

References

  1. Kooperberg C, LeBlanc M, Obenchain V. Risk prediction using genome-wide association studies. Genet Epidemiol 2010;34:643-652. https://doi.org/10.1002/gepi.20509
  2. Bae S, Choi S, Kim SM, Park T. Prediction of quantitative traits using common genetic variants: application to body mass index. Genomics Inform 2016;14:149-159. https://doi.org/10.5808/GI.2016.14.4.149
  3. Bae S, Park T. Risk prediction of type 2 diabetes using common and rare variants. Int J Data Min Bioinform 2018;20:77-90. https://doi.org/10.1504/IJDMB.2018.092160
  4. Johnstone IM, Titterington DM. Statistical challenges of high-dimensional data. Philos Trans A Math Phys Eng Sci 2009;367:4237-4253.
  5. Jostins L, Barrett JC. Genetic risk prediction in complex disease. Hum Mol Genet 2011;20:R182-R188. https://doi.org/10.1093/hmg/ddr378
  6. Slatkin M. Linkage disequilibrium: understanding the evolutionary past and mapping the medical future. Nat Rev Genet 2008;9:477-485. https://doi.org/10.1038/nrg2361
  7. Wang WY, Barratt BJ, Clayton DG, Todd JA. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005;6:109-118. https://doi.org/10.1038/nrg1522
  8. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature 2009;461:747-753. https://doi.org/10.1038/nature08494
  9. Kempthorne O. An introduction to genetic statistics. Iowa: Iowa State University Press, 1969.
  10. Florez JC. Leveraging genetics to advance type 2 diabetes prevention. PLoS Med 2016;13:e1002102. https://doi.org/10.1371/journal.pmed.1002102
  11. Udler MS, McCarthy MI, Florez JC, Mahajan A. Genetic risk scores for diabetes diagnosis and precision medicine. Endocr Rev 2019;40:1500-1520. https://doi.org/10.1210/er.2019-00088
  12. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70:41-55. https://doi.org/10.1093/biomet/70.1.41
  13. Lyssenko V, Laakso M. Genetic screening for the risk of type 2 diabetes: worthless or valuable? Diabetes Care 2013;36 Suppl 2:S120-S126. https://doi.org/10.2337/dcS13-2009
  14. Choi S, Bae S, Park T. Risk prediction using genome-wide association studies on type 2 diabetes. Genomics Inform 2016;14:138-148. https://doi.org/10.5808/GI.2016.14.4.138
  15. Golay A, Ybarra J. Link between obesity and type 2 diabetes. Best Pract Res Clin Endocrinol Metab 2005;19:649-663. https://doi.org/10.1016/j.beem.2005.07.010
  16. Hillier TA, Pedula KL. Characteristics of an adult population with newly diagnosed type 2 diabetes: the relation of obesity and age of onset. Diabetes Care 2001;24:1522-1527. https://doi.org/10.2337/diacare.24.9.1522
  17. Ding EL, Song Y, Malik VS, Liu S. Sex differences of endogenous sex hormones and risk of type 2 diabetes: a systematic review and meta-analysis. JAMA 2006;295:1288-1299. https://doi.org/10.1001/jama.295.11.1288
  18. Willi C, Bodenmann P, Ghali WA, Faris PD, Cornuz J. Active smoking and the risk of type 2 diabetes: a systematic review and meta-analysis. JAMA 2007;298:2654-2664. https://doi.org/10.1001/jama.298.22.2654
  19. Stumvoll M, Tschritter O, Fritsche A, Staiger H, Renn W, Weisser M, et al. Association of the T-G polymorphism in adiponectin (exon 2) with obesity and insulin sensitivity: interaction with family history of type 2 diabetes. Diabetes 2002;51:37-41. https://doi.org/10.2337/diabetes.51.1.37
  20. Ali O. Genetics of type 2 diabetes. World J Diabetes 2013;4:114-123. https://doi.org/10.4239/wjd.v4.i4.114
  21. Kaprio J, Tuomilehto J, Koskenvuo M, Romanov K, Reunanen A, Eriksson J, et al. Concordance for type 1 (insulin-dependent) and type 2 (non-insulin-dependent) diabetes mellitus in a population-based cohort of twins in Finland. Diabetologia 1992;35:1060-1067. https://doi.org/10.1007/BF02221682
  22. Wray NR, Yang J, Hayes BJ, Price AL, Goddard ME, Visscher PM. Pitfalls of predicting complex traits from SNPs. Nat Rev Genet 2013;14:507-515. https://doi.org/10.1038/nrg3457
  23. Buniello A, MacArthur JA, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 2019;47:D1005-D1012. https://doi.org/10.1093/nar/gky1120
  24. Marin-Penalver JJ, Martin-Timon I, Sevillano-Collantes C, Del Canizo-Gomez FJ. Update on the treatment of type 2 diabetes mellitus. World J Diabetes 2016;7:354-395. https://doi.org/10.4239/wjd.v7.i17.354
  25. Xue A, Wu Y, Zhu Z, Zhang F, Kemper KE, Zheng Z, et al. Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat Commun 2018;9:2941. https://doi.org/10.1038/s41467-018-04951-w
  26. Ho DE, Imai K, King G, Stuart EA. MatchIt: nonparametric preprocessing for parametric causal inference. J Stat Softw 2011;42:1-28.
  27. Ripley B. MASS: support functions and datasets for Venables and Ripley's MASS. R package version 7.3-29. The Comprehensive R Archive Network, 2011.
  28. Friedman J, Hastie T, Tibshirani R. glmnet: Lasso and elastic-net regularized generalized linear models. R package version 1. The Comprehensive R Archive Network, 2009.
  29. Staiger H, Machicao F, Fritsche A, Haring HU. Pathomechanisms of type 2 diabetes genes. Endocr Rev 2009;30:557-585. https://doi.org/10.1210/er.2009-0017
  30. Spiel C, Lapka D, Gradinger P, Zodlhofer EM, Reimann R, Schober B, et al. A Euclidean distance-based matching procedure for nonrandomized comparison studies. Eur Psychol 2008;13:180-187. https://doi.org/10.1027/1016-9040.13.3.180