DOI QR코드

DOI QR Code

환자-대조군 연구에서 인구집단 층화가 일배체형 경향성 검정에 미치는 영향

Study on Effects of Population Stratification on Haplotype Trend Test in Case-Control Studies

  • 김진흠 (수원대학교 통계정보학과) ;
  • 강대룡 (연세대학교 세브란스병원 임상시험센터) ;
  • 임현선 (연세대학교 의과대학 의학통계학과) ;
  • 남정모 (연세대학교 의과대학 예방의학교실)
  • Kim, Jin-Heum (Department of Applied Statistics, University of Suwon) ;
  • Kang, Dae-Ryong (Clinical Trials Center, Severance Hospital, Yonsei University) ;
  • Lim, Hyun-Sun (Department of Biostatistics, Yonsei University College of Medicine) ;
  • Nam, Chung-Mo (Department of Preventive Medicine, Yonsei University College of Medicine)
  • 발행 : 2009.10.31

초록

환자-대조군 연관성 연구에서 후보 유전자와 질병이 연관되어 있지 않더라도 인구집단 층화로 인해 가짜 연관성이 발생할 수도 있다. 본 연구에서는 일배체형에 기초한 환자-대조군 연관성 연구에서 인구집단 층화로 인한 가짜 연관성을 해결하기 위한 방법으로, Zaykin 등 (2002)이 제안한 일배체형 경향성 모형에 인구집단 층화에 대한 정보를 추가하고자 한다. Zaykin 등 (2002)의 모형과 제안한 모형에 기초한 일배체형의 유의성 검정에서 인구집단 층화와 인구집단에 대한 관측 오차가 제1종 오류율에 미치는 영향을 모의실험을 통해 살펴보았다. 인구집단이 층화되어 있지만 각 개체가 속한 인구집단을 정확히 알 수 있을 때, Zaykin 등 (2002)의 모형에 기초한 검정은 제1종 오류율을 잘 조절하지 못했지만 본 연구에서 제안한 모형에 기초한 검정은 제1종 오류율을 잘 조절하는 것으로 나타났다. 그러나 인구집단이 층화되어 있고 관측 오차가 존재하면 제안한 모형에 기초한 검정도 제1종 오류율을 조절하지 못하고 명목 유의수준보다 큰 값을 갖는 것으로 나타났다. 따라서 단일염기다형성에 기초한 환자-대조군 연관성 연구와 마찬가지로 일배체형에 기초한 환자-대조군 연관성 연구에서도 인구집단 층화에 대한 정보를 갖고 있다할지라도 그 속에 관측 오차가 존재하면 위양성을 피하기 어렵다는 것을 알 수 있었다.

Population stratification can cause spurious associations between genetic markers and disease locus. In order to handle this population stratification in haplotype-based case-control association studies, we added population indicators as covariates to the haplotype trend regression model proposed by Zaykin et al. (2002). We investigated through simulations how both population stratification and measurement error in the estimation of true population of each individual affect type I error probabilities of the association tests based on both Zaykin et al.'s (2002) model and the proposed model. Based on those results, in the situation that there exists population stratification but there is no error in population classification of each individual, our proposed model does satisfy a type I error probability whereas Zaykin et al.'s (2002) model does not. However, as the measurement error increases, a type I error probability of our model correspondingly becomes larger than a nominal significance level. It implies that as long as uncertainty in the estimation of true population of each individual still remains, it is nearly impossible to avoid false positive in case-control association studies based on haplotypes.

키워드

참고문헌

  1. Armitage, P. (1955). Tests for linear trends in proportions and frequencies, Biometrics, 11, 375-386 https://doi.org/10.2307/3001775
  2. Devlin, B. and Roeder, K. (1999). Genomic control for association studies, Biometrics, 55, 997-1004 https://doi.org/10.1111/j.0006-341X.1999.00997.x
  3. Epstein, M. P. and Satten, G. A. (2003). Inference on haplotype effects in case-control studies using unphased genotype data, The American Journal of Human Genetics, 73, 1316-1329 https://doi.org/10.1086/380204
  4. Excoffier, L. and Slatkin, M. (1995). Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population, Molecular Biology and Evolution, 12, 921-927
  5. Fallin, D., Cohen, A., Essioux, L., Chumakov, I., Blumenfeld, M., Cohen, D. and Schork, N. J. (2001). Genetic analysis of case/control data using estimated haplotype frequencies: Application to APOE locus variation and alzheimer's disease, Genome Research, 11, 143-151 https://doi.org/10.1101/gr.148401
  6. Haggart, C J., Parra, E. J., Shriver, M. D., Bonilla, C, Kittles, R. A., Clayton, D. G. and McKeigue, P. M. (2003). Control of confounding of genetic associations in stratified populations, The American Journal of Human Genetics, 72, 1492-1504 https://doi.org/10.1086/375613
  7. Jorde, L. B. (1995). Linkage disequilibrium as a gene-mapping tool, The American Journal of Human Genetics, 56, 11-14
  8. Keavney, B. (2002). Genetic epidemiological studies of coronary heart disease, International Journal of Epidemiology, 31, 730-736 https://doi.org/10.1093/ije/31.4.730
  9. Kim, J., Kang, D. R., Lee, Y. K., Shin, S. M., Suh, I. and Nam, C M. (2004). Statistical algorithm in genetic linkage based on haplotypes, Journal of Preventive Medicine and Public Health, 37, 366-372
  10. Long, J. C, Williams, R. C and Urbanek, M. (1995). An E-M algorithm and testing strategy for multiple-locus haplotypes, The American Journal of Human Genetics, 56, 799-810
  11. Nielsen, D. M. and Weir, B. S. (1999). A classical setting for associations between markers and loci affecting quantitative traits, Genetical Research, 74, 271-277 https://doi.org/10.1017/S0016672399004231
  12. Pritchard, J. K., Stephens, M. and Donnelly, P. (2000a). Inference of population structure using multilocus genotype data, Genetics, 155, 945-959
  13. Pritchard, J. K., Stephens, M., Rosenberg, N. A. and Donnelly, P. (2000b). Association mapping in structured populations, The American Journal of Human Genetics, 67, 170-181 https://doi.org/10.1086/302959
  14. SAS Institute. (2002). SAS/Genetics User's Guide, SAS Institute, Cary
  15. Sasieni, P. D. (1997). From genotypes to genes: Doubling the sample size, Biometrics, 53, 1253-1261 https://doi.org/10.2307/2533494
  16. Satten, G. A., Flanders, W. D. and Yang, Q. (2001). Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model, The American Journal of Human Genetics, 68, 466-477 https://doi.org/10.1086/318195
  17. Schaid, D. J., Rowland, C. M., Tines, D. E., Jacobson, R. M. and Poland, G. A. (2002). Score tests for association between traits and haplotypes when linkage phase is ambiguous, The American Journal of Human Genetics, 70, 425-434 https://doi.org/10.1086/338688
  18. Setakis, E., Stirnadel, H. and Balding, D. J. (2006). Logistic regression protects against population structure in genetic association studies, Genome Research, 16, 290-296 https://doi.org/10.1101/gr.4346306
  19. Tanck, M. W. T., Klerkx, A. H. E. M., Jukema, J. W., De Knijff, P., Kastelein, J. J. P. and Zwinderman, A. H. (2003). Estimation of multilocus haplotype effects using weighted penalised log-likelihood: Analysis of five sequence variations at the cholesteryl ester transfer protein gene locus, Annals of Human Genetics, 67, 175-184 https://doi.org/10.1046/j.1469-1809.2003.00021.x
  20. Terwilliger, J. and Ott, J. (1994). Handbook of Human Genetic Linkage, Johns Hopkins University Press, Baltimore
  21. Zaykin, D. V., Westfall, P. H., Young, S. 5., Karnoub, M. A., Wagner, M. J. and Ehm, M. G. (2002). Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals, Human Heredity, 53, 79-91 https://doi.org/10.1159/000057986
  22. Zhao, J. H., Curtis, D. and Sham, P. C. (2000). Model-free analysis and permutation tests for allelic associations, Human Heredity, 50, 133-139 https://doi.org/10.1159/000022901
  23. Zhu, X., Zhang,S., Zhao, H. and Cooper, R. S. (2002). Association mapping, using a mixture model for complex traits, Genetic Epidemiology, 23, 181-196 https://doi.org/10.1002/gepi.210