DOI QR코드

DOI QR Code

집합 결합과 신경망을 이용한 복합질환의 예측

A Prediction Model for Complex Diseases using Set Association & Artificial Neural Network

  • 발행 : 2008.08.29

초록

복합질환은 다수의 유전자들이 상호작용하여 유발되는 질병으로서, 여러 유전자들이 관여한다는 복잡성 때문에 전통적인 분석 방법을 적용하는데 한계가 있다. 최근에는 기계학습 기법을 이용한 새로운 분석 방법들이 제안되고 있다. 신경망은 이처럼 복잡한 데이터에서 일정한 패턴을 찾아 이를 분류하는데 적합한 모델이다. 그러나 다량의 데이터가 입력으로 들어오는 경우에 학습에 오랜 시간이 걸리고 패턴을 찾기가 어려워지는 단점이 있다. 본 연구에서는 다량의 SNP 데이터로부터 질병에 연관된 소수의 중요 SNP을 찾기 위한 통계학적인 방법인 집합결합(set association)과 신경망을 결합한 모델을 제시한다. 이 모델을 천식 관련 SNP 데이터에 적용하여 천식 발병 여부를 예측한 결과, 신경망만 사용했을 때보다 실행 시간도 빠르고 예측 정확도도 높았다. 이 모델은 다른 복합질환의 예측에도 효과적으로 사용할 수 있을 것으로 기대한다.

Since complex diseases are caused by interactions of multiple genes, traditional statistical methods are limited in its power to predict the onset of a complex disease. Recently new approaches using machine learning techniques are introduced. Neural nets are a suitable model to find patterns in complex data. When large amount of data are fed into a neural net, however, it takes a long time for learning and finding patterns. In this study we suggest a new model that combines the set association, which is a statistical technique to find important SNPs associated with complex diseases, and neural network. We experiment with SNP data related to asthma to test the effectiveness of our model. Our model shows higher prediction accuracy and shorter execution time than neural net only. We expect our model can be used effectively to predict the onset of other complex diseases.

키워드

참고문헌

  1. J. Y. Dai, I. Ruczinski, M. LeBlanc and C. Kooperberg, “Imputation methods to improve inference in SNP association studies,” Genetic Epidemiology, Vol.30, pp.690-702, 2006 https://doi.org/10.1002/gepi.20180
  2. D. Bostein, N. Risch, “Discovering genotypes underlying human phenotypes: past success for Mendelian disease, future approaches for complex disease,” Nat. Genet. Suppl. Vol.33, pp.228-237, 2003 https://doi.org/10.1038/ng1090
  3. 임성빈, “약물유전체학(Phamacogenomics)”, 월드사이언스, 2004
  4. N. Nagelkerke, J. Smits, S. Le Cessie, H. Van Houwelingen, “Testing goodness-of-fit of the logistic regression model in case-control studies using sample reweighting,” Stat. Med. Vol.24, pp.121-130, 2005 https://doi.org/10.1002/sim.1997
  5. Y. Tomita, S. Tomida,Y. Hasegawa, Y. Suzuki, T. Shirakawa, T. Kobayashi and H. Homita, “Artificial neural network approach for selection of susceptible single nucleotide polymorphism and construction of prediction model on childhood allergic asthma,” Bioinformatics, Vol.5, pp.120-132, 2004 https://doi.org/10.1186/1471-2105-5-120
  6. M. D. Ritchie, B. C. White, J. S. Parker, L. W. Hahn and J. H. Moore, “Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human disease,” BMC Bioinformatics, Vol.4, pp.28-42, 2003 https://doi.org/10.1186/1471-2105-4-28
  7. A. A. Motsinger, S. L. Lee, G. Mellick and M. D. Ritchie, “GPNN: Power studies and applications of a neural network method for detecting gene-gene interactions in studies of human disease,” BMC Bioinformatics, Vol.7, pp.39-49, 2006 https://doi.org/10.1186/1471-2105-7-39
  8. L. W. Hahn, M.D. Ritchie and J. H. Moore, “Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions,” Bioninformatics, Vol.19, No.3, pp.376-382, 2003 https://doi.org/10.1093/bioinformatics/btf869
  9. J. H. Moore, J. C. Gilbert, C. T. Tsai, F. T. Chiang, T. Holden, N. Barney and B. C. White, “A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of numan disease susceptibility,” J. Theor. Biol. Vol.241, pp.252-261, 2006 https://doi.org/10.1016/j.jtbi.2005.11.036
  10. J. Hoh, A. Wille and J. Ott, “Trimming, weighting, and grouping SNPs in human case-control association studies,” Genome Res. Vol.11, pp.2115-2119, 2001 https://doi.org/10.1101/gr.204001
  11. J. Ott and J. Hoh, “Set association analysis of SNP casecontrol and microarray data”, J. Comput. Biol. Vol.10, pp.569-574, 2003 https://doi.org/10.1089/10665270360688192
  12. K. L. Lunetta, L. B. Hayward, J. Segal and P. Van Eerdewgh, “Screening large-scale association study data: exploiting interactions using random forests,” BMC Genetics, Vol.5, pp. 32-45, 2004 https://doi.org/10.1186/1471-2156-5-32
  13. A. Bureau, J. Dupuis, K. Falls, K. L. Lunetta, B. Hayward, T. P. Keith and P. Van Eerdewegh, “Identifying SNPs predictive of phenotype using random forests,” Genet. Epidemiol. Vol.28, pp.171-182, 2005 https://doi.org/10.1002/gepi.20041
  14. A. G. Heidema, J. M. Boer, N. Nagelkerke, E.C. Mariman and D. L. van der A, E. J. Feskens, “The challenge for genetic epidemiologists : how to analyze large numbers of SNPs in relation to complex disease,” BMC Genetics, Vol.7, pp.23-38, 2006 https://doi.org/10.1186/1471-2156-7-23
  15. S. Kumar, “Neural Networks: A Classroom Approach,” McGraw Hill, 2004
  16. R. Rojas, “Neural Networks: A Systematic Introduction,” Springer, 1991
  17. http://sourceforge.net/projects/mdr/