A Review of Power and Sample Size Estimation in Genomewide Association Studies

유전체 연관 연구에서의 검정력 및 연구대상수 계산 고찰

  • Park, Ae-Kyung (Graduate School of Public Health, Seoul National University) ;
  • Kim, Ho (Graduate School of Public Health, Seoul National University)
  • 박애경 (서울대학교 보건대학원) ;
  • 김호 (서울대학교 보건대학원)
  • Published : 2007.03.31

Abstract

Power and sample size estimation is one of the crucially important steps in planning a genetic association study to achieve the ultimate goal, identifying candidate genes for disease susceptibility, by designing the study in such a way as to maximize the success possibility and minimize the cost. Here we review the optimal two-stage genotyping designs for genomewide association studies recently investigated by Wang et al(2006). We review two mathematical frameworks most commonly used to compute power in genetic association studies prior to the main study: Monte-Carlo and non-central chi-square estimates. Statistical powers are computed by these two approaches for case-control genotypic tests under one-stage direct association study design. Then we discuss how the linkage-disequilibrium strength affects power and sample size, and how to use empirically-derived distributions of important parameters for power calculations. We provide useful information on publicly available softwares developed to compute power and sample size for various study designs.

Keywords

References

  1. Thomas DC, Haile RW, Duggan D. Recent developments in genomewide association scans: a workshop summary and review. Am J Hum Genet 2005; 77(3): 337-345 https://doi.org/10.1086/432962
  2. Wang H, Thomas DC, Pe'er I, Stram DO. Optimal two-stage genotyping designs for genome-wide association scans. Genet Epidemiol 2006; 30(4): 356-368 https://doi.org/10.1002/gepi.20150
  3. Satagopan JM, Elston RC. Optimal two-stage genotyping in population-based association studies. Genet Epidemiol 2003; 25(2): 149-157 https://doi.org/10.1002/gepi.10260
  4. Guedj M, Della-Chiesa E, Picard F, Nuel G. Computing power in case-control association studies through the use of quadratic approximations: Application to meta-statistics. Ann Hum Genet 2007; 71(Pt 2): 262-270 https://doi.org/10.1111/j.1469-1809.2006.00316.x
  5. Pritchard JK, Przeworski M. Linkage disequilibrium in humans: models and data. Am J Hum Genet 2001; 69(1): 1-14 https://doi.org/10.1086/321275
  6. Kruglyak L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet 1999; 22(2): 139-144 https://doi.org/10.1038/9642
  7. Schork NJ. Power calculations for genetic association studies using estimated probability distributions. Am J Hum Genet 2002; 70(6): 1480-1489 https://doi.org/10.1086/340788
  8. Ambrosius WT, Lange EM, Langefeld CD. Power for genetic association studies with random allele frequencies and genotype distributions. Am J Hum Genet 2004; 74(4):683-693 https://doi.org/10.1086/383282
  9. Purcell S, Cherny SS, Sham PC. Genetic power calculator: Design of linkage and association genetic mapping studies of complex traits. Bioinformatics 2003; 19(1): 149-150 https://doi.org/10.1093/bioinformatics/19.1.149
  10. Sham PC, Cherny SS, Purcell S, Hewitt JK. Power of linkage versus association analysis of quantitative traits, by use of variancecomponents models, for sibship data. Am J Hum Genet 2000; 66(5): 1616-1630 https://doi.org/10.1086/302891
  11. Gordon D, Finch SJ, Nothnagel M, Ott J. Power and sample size calculations for casecontrol genetic association tests when errors are present: Application to single nucleotide polymorphisms. Hum Hered 2002; 54(1): 22-33 https://doi.org/10.1159/000066696
  12. Edwards BJ, Haynes C, Levenstien MA, Finch SJ, Gordon D. Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies. BMC Genet 2005; 6(1): 18 https://doi.org/10.1186/1471-2156-6-18
  13. Gordon D, Haynes C, Blumenfeld J, Finch SJ. PAWE-3D: Visualizing power for association with error in case-control genetic studies of complex traits. Bioinformatics 2005; 21(20): 3935-3937 https://doi.org/10.1093/bioinformatics/bti643
  14. Skol AD, Scott LJ, Abecasis GR, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 2006; 38(2): 209-213 https://doi.org/10.1038/ng1706
  15. Kang D, Lee KM. Current status of genomic epidemiology research. Korean J Prev Med 2003; 36(3): 213-222
  16. Park S. Statistical issues in genomic cohort studies. J Prev Med Public Health (Korean)(in press) https://doi.org/10.3961/jpmph.2007.40.2.108