DOI QR코드

DOI QR Code

Selection probability of multivariate regularization to identify pleiotropic variants in genetic association studies

  • Kim, Kipoong (Department of Statistics, Pusan National University) ;
  • Sun, Hokeun (Department of Statistics, Pusan National University)
  • Received : 2020.04.26
  • Accepted : 2020.07.12
  • Published : 2020.09.30

Abstract

In genetic association studies, pleiotropy is a phenomenon where a variant or a genetic region affects multiple traits or diseases. There have been many studies identifying cross-phenotype genetic associations. But, most of statistical approaches for detection of pleiotropy are based on individual tests where a single variant association with multiple traits is tested one at a time. These approaches fail to account for relations among correlated variants. Recently, multivariate regularization methods have been proposed to detect pleiotropy in analysis of high-dimensional genomic data. However, they suffer a problem of tuning parameter selection, which often results in either too many false positives or too small true positives. In this article, we applied selection probability to multivariate regularization methods in order to identify pleiotropic variants associated with multiple phenotypes. Selection probability was applied to individual elastic-net, unified elastic-net and multi-response elastic-net regularization methods. In simulation studies, selection performance of three multivariate regularization methods was evaluated when the total number of phenotypes, the number of phenotypes associated with a variant, and correlations among phenotypes are different. We also applied the regularization methods to a wild bean dataset consisting of 169,028 variants and 17 phenotypes.

Keywords

Acknowledgement

This work was supported by a 2-Year Research Grant of Pusan National University.

References

  1. Alexander D and Lange K (2011). Stability selection for genome-wide association, Genetic Epidemiology, 35, 722-728. https://doi.org/10.1002/gepi.20623
  2. Bhattacharjee S, Rajaraman P, Jacobs KB, et al. (2012). A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits, American Journal of Human Genetics, 90, 821-835. https://doi.org/10.1016/j.ajhg.2012.03.015
  3. Broadaway KA, Cutler DJ, Duncan R, et al. (2016). A statistical approach for testing cross-phenotype effects of rare variants, American Journal of Human Genetics, 98, 525-540. https://doi.org/10.1016/j.ajhg.2016.01.017
  4. Choi J, Kim K, and Sun H (2018). New variable selection strategy for analysis of high-dimensional DNA methylation data, Journal of Bioinformatics and Computational Biology, 16, 1850010. https://doi.org/10.1142/S0219720018500105
  5. Foulkes AS (2009). Applied Statistical Genetics with R, Springer-Verlag, New York.
  6. Kim K and Sun H (2019). Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data, BMC Bioinformatics, 20, 510. https://doi.org/10.1186/s12859-019-3040-x
  7. Li Y, Nan B, and Zhu J (2015). Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure, Biometrics, 71, 354-363. https://doi.org/10.1111/biom.12292
  8. Lin Z and Lin X (2018). Multiple phenotype association tests using summary statistics in genome-wide association studies, Biometrics, 74, 165-175. https://doi.org/10.1111/biom.12735
  9. Lipka AE, Tian F, Wang Q, et al. (2012). GAPIT: genome association and prediction integrated tool, Bioinformatics, 28, 2397-2399. https://doi.org/10.1093/bioinformatics/bts444
  10. Meinshausen N and Buhlmann P (2010). Stability selection, Journal of the Royal Statistical Society Series B, 72, 417-473. https://doi.org/10.1111/j.1467-9868.2010.00740.x
  11. Schaid DJ, Tong X, Larrabee B, Kennedy RB, Poland GA, and Sinnwell JP (2016). Statistical methods for testing genetic pleiotropy, Genetics, 204, 483-497. https://doi.org/10.1534/genetics.116.189308
  12. Simon N, Friedman J, and Hastie T (2013a). A blockwise descent algorithm for group-penalized multiresponse and multinomial regression, arXiv preprint arXiv:1311.6529.
  13. Simon N, Friedman J, Hastie T, and Tibshirani R (2013b). A sparse-group lasso, Journal of Computational and Graphical Statistics, 22, 231-245. https://doi.org/10.1080/10618600.2012.681250
  14. Solovieff N, Cotsapas C, Lee PH, Purcell SM, and Smoller JW (2013). Pleiotropy in complex traits: challenges and strategies, Nature Reviews Genetics, 14, 483-495. https://doi.org/10.1038/nrg3461
  15. Sun H and Wang S (2012). Penalized logistic regression for high-dimensional DNA methylation data analysis with case-control studies, Bioinformatics, 28, 1368-1375. https://doi.org/10.1093/bioinformatics/bts145
  16. Sun H and Wang S (2013). Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data, Statistics in Medicine, 32, 2127-2139. https://doi.org/10.1002/sim.5694
  17. Sun H, Wang Y, Chen Y, Li Y, and Wang S (2017). pETM: a penalized Exponential Tilt Model for analysis of correlated high-dimensional DNA methylation data, Bioinformatics, 33, 1765-1772. https://doi.org/10.1093/bioinformatics/btx064
  18. van der Sluis S, Posthuma D, and Dolan CV (2013). TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies, PLoS Genetics, 9, e1003235. https://doi.org/10.1371/journal.pgen.1003235
  19. Wu B and Pankow JS (2016). Sequence kernel association test of multiple continuous phenotypes, Genetic Epidemiology, 40, 91-100. https://doi.org/10.1002/gepi.21945
  20. Wu T, Chen Y, Hastie T, Sobel E, and Lange K (2009). Genome-wide association analysis by lasso penalized logistic regression, Bioinformatics, 25, 714-721. https://doi.org/10.1093/bioinformatics/btp041
  21. Yuan M and Lin Y (2006). Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society Series B, 68, 49-67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
  22. Zou H and Hastie T (2005). Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B, 67, 301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
  23. Zhou H, Sehl M, Sinsheimer J, and Lange K (2010). Association screening of common and rare genetic variants by penalized regression, Bioinformatics, 26, 2375-2382. https://doi.org/10.1093/bioinformatics/btq448