DOI QR코드

DOI QR Code

Nonparametric Bayesian Statistical Models in Biomedical Research

생물/보건/의학 연구를 위한 비모수 베이지안 통계모형

  • Noh, Heesang (Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology) ;
  • Park, Jinsu (Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology) ;
  • Sim, Gyuseok (Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology) ;
  • Yu, Jae-Eun (Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology) ;
  • Chung, Yeonseung (Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology)
  • 노희상 (한국과학기술원 수리과학과) ;
  • 박진수 (한국과학기술원 수리과학과) ;
  • 심규석 (한국과학기술원 수리과학과) ;
  • 유재은 (한국과학기술원 수리과학과) ;
  • 정연승 (한국과학기술원 수리과학과)
  • Received : 2014.10.27
  • Accepted : 2014.12.01
  • Published : 2014.12.31

Abstract

Nonparametric Bayesian (np Bayes) statistical models are popularly used in a variety of research areas because of their flexibility and computational convenience. This paper reviews the np Bayes models focusing on biomedical research applications. We review key probability models for np Bayes inference while illustrating how each of the models is used to answer different types of research questions using biomedical examples. The examples are chosen to highlight the problems that are challenging for standard parametric inference but can be solved using nonparametric inference. We discuss np Bayes inference in four topics: (1) density estimation, (2) clustering, (3) random effects distribution, and (4) regression.

비모수 베이지안 통계 모형은 그 유연성과 계산의 편리성으로 인해 최근 다양한 분야에서 응용되고 있는데, 본 논문에서는 생물/의학/보건 연구에서 사용되는 비모수 베이지안 통계 모형에 대해서 개괄하였다. 본 논문에서는 비모수 베이지안 통계 모델링에서 핵심적으로 사용되는 확률모형들을 소개하고, 다양한 예제들을 통하여 그 모형들이 어떻게 사용되는지 이해를 돕도록 하였다. 특별히, 논의된 예제들은 모수적 통계 모형으로 고찰하기에는 한계가 있는 연구가설들을 포함하고 있어 모수적 모형의 한계점을 지적하고 비모수적 베이지안 모형의 필요성을 강조하는 것들로 정하였다. 크게 확률밀도함수 추정, 군집분석, 임의효과 분포의 추정, 그리고 회귀분석의 4가지 주제로 분류하여 살펴보았다.

Keywords

References

  1. Baladandayuthapani, V., Mallick, B. K. and Carroll, R. J. (2005). Spatially adaptive Bayesian penalized regression splines(P-splines), Journal of Computational and Graphical Statistics, 14, 378-394. https://doi.org/10.1198/106186005X47345
  2. Barnes, T. G., Jefferys, W. H., Berger, J. O., Muller, P., Orr, K. and Rodriguez, R. (2003). A Bayesian analysis of the Cepheid distance scale, The Astrophysical Journal, 592, 539. https://doi.org/10.1086/375583
  3. Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Polya urn schemes, Annals of Statistics, 1, 353-355. https://doi.org/10.1214/aos/1176342372
  4. Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation,
  5. Brown, E. R., Ibrahim, J. G. and DeGruttola, V. (2005). A flexible B-spline model for multiple longitudinal Biomarkers and survival, Biometrics, 61, 64-73. https://doi.org/10.1111/j.0006-341X.2005.030929.x
  6. Bush, C. A. and MacEachern, S. N. (1996). A semiparametric Bayesian model for randomized block designs, Biometrika, 83, 275-285. https://doi.org/10.1093/biomet/83.2.275
  7. Dahl, D. B. (2006). Model-based clustering for expression data via a Dirichlet process mixture model, In Vannucci, M., Do, K. A. and Muller, P. (eds.), Bayesian Inference for Gene Expression and Proteomics, Cambridge University Press.
  8. De Iorio, M., Muller, P., Rosner, G. L. and MacEachern, S. N. (2004). An ANOVA model for dependent random measures, Journal of the American Statistical Association, 99, 205-215. https://doi.org/10.1198/016214504000000205
  9. De Iorio, M., Johnson, W. O., Muller, P. and Rosner, G. L. (2009). Bayesian nonparametric non-proportional hazards survival modeling, Biometrics, 65, 762-771. https://doi.org/10.1111/j.1541-0420.2008.01166.x
  10. De la Cruz, R., Quintana, F. A. and Muller, P. (2007). Semiparametric Bayesian classification with longitudinal markers, Applied Statistics, 56, 119-137.
  11. Dunson, D. B. and Park, J. H. (2008). Kernel stick-breaking processes, Biometrika, 95, 307-323. https://doi.org/10.1093/biomet/asn012
  12. Dunson, D. B., Pillai, N. and Park, J. H. (2007). Bayesian density regression, Journal of the Royal Statistical Society, Series B, 69, 163-183. https://doi.org/10.1111/j.1467-9868.2007.00582.x
  13. Dunson, D. B. (2010). Nonparametric Bayes applications to Biostatistics. Bayesian Nonparametrics, Chapter 7, Cambridge University Press.
  14. Escobar, M. D., (1994). Estimating normal means with a Dirichlet process prior, Journals of the American Statistical Association, 89, 268-277. https://doi.org/10.1080/01621459.1994.10476468
  15. Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems, The Annals of Statistics, 1, 209-230. https://doi.org/10.1214/aos/1176342360
  16. Ferguson, T. S. (1974). Prior distributions on spaces of probability measures, The Annals of Statistics, 2, 615-629. https://doi.org/10.1214/aos/1176342752
  17. Guglielm, A., Ruggeri, F. and Soriano, J. (2014). Semiparametric Bayesian models for clustering and classification in the presence of unbalanced in-hospital survival, Journal of the Royal Statistical Society, Series C, 63, 25-46. https://doi.org/10.1111/rssc.12021
  18. Guindani, M., Sepulveda, N., Paulino, C. D. and Muller, P. (2012). A Bayesian Semi-parametric approach for the differential analysis of sequence counts data, Technical report, M. D. Anderson Cancer Center.
  19. Hanson, T. E. and Johnson, W. O. (2002). Modeling regression error with a mixture of Polya trees, Journal of the American Statistical Association, 97, 1020-1033. https://doi.org/10.1198/016214502388618843
  20. Hartigan, J. A. (1990). Partition models, Communications in Statistics: Theory and Methods, 19, 2745-2756. https://doi.org/10.1080/03610929008830345
  21. Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors, Journal of the American Statistical Association, 96, 161-173. https://doi.org/10.1198/016214501750332758
  22. Ji, Y., Yin, G., Tsui, K. W., Kolonin, M. G., Sun, J., Arap, W., Pasqualini, R. and Do, K. A. (2007). Bayesian mixture models for complex high dimensional count data in phage display experiments, Journal of the Royal Statistical Society, Series C: Applied Statistics, 56, 139-152. https://doi.org/10.1111/j.1467-9876.2007.00570.x
  23. Kleinman, K. and Ibrahim, J. (1998a). A Semi-parametric Bayesian approach to the random effects model, Biometrics, 54, 921-938. https://doi.org/10.2307/2533846
  24. Kleinman, K. and Ibrahim, J. (1998b). A Semi-parametric Bayesian approach to generalized linear mixed models, Statistics in Medicine, 17, 2579-2596. https://doi.org/10.1002/(SICI)1097-0258(19981130)17:22<2579::AID-SIM948>3.0.CO;2-P
  25. Kormaksson, M., Booth, J. G., Figueroa, M. E. and Melnick, A. (2012). Integrative model-based clustering of microarray methylation and expression data., Annals of Applied Statistics, 6, 1327-1347. https://doi.org/10.1214/11-AOAS533
  26. Kundu, S. and Dunson, D. B. (2014). Bayes variable selection in semiparametric linear models, Journal of the American Statistical Association, 109, 437-447. https://doi.org/10.1080/01621459.2014.881153
  27. Leon-Novelo, L. G., Muller, P., Arap, W., Kolonin, M. Sun, J., Pasqualini, R. and Do, K. A. (2013). Semiparametric Bayesian inference for phage display data, Biometrics, 69, 174-183. https://doi.org/10.1111/j.1541-0420.2012.01817.x
  28. Liu, Q., Lin, K. K., Andersen, B., Smyth, P., and Ihler, A. (2010). Estimating replicate time shifts using Gaussian process regression, Bioinformatics, 26, 770-776. https://doi.org/10.1093/bioinformatics/btq022
  29. Longnecker, M. P., Klebanoff, M. A., Zhou, H. and Brock, J. W. (2001). Association between maternal serum concentration of the DDT metabolite DDE and preterm and small-for-gestational-age babies at birth, Lancet, 358, 110-114. https://doi.org/10.1016/S0140-6736(01)05329-6
  30. MacEachern, S. (1994). Estimating normal means with a conjugate style Dirichlet process prior, Communications in Statistics: Simulation and Computation, 23, 727-741. https://doi.org/10.1080/03610919408813196
  31. MacEachern, S. (1999). Dependent nonparametric processes, in ASA Proceedings of the Section on Bayesian Statistical Science, American Statistical Association.
  32. Mukhopadhyay, S. and Gelfand, A. (1997). Dirichlet process mixed generalized linear models, Journal of the American Statistical Association, 92, 633-639. https://doi.org/10.1080/01621459.1997.10474014
  33. Muller, P., Erkanli, A. and West, M. (1996). Bayesian curve fitting using multivariate normal mixtures, Biometrika, 83, 67-79. https://doi.org/10.1093/biomet/83.1.67
  34. Muller, P. and Rosner, G. (1997). A Bayesian population model with hierarchical mixture priors applied to blood count data, Journal of the American Statistical Association, 92, 633-639. https://doi.org/10.1080/01621459.1997.10474014
  35. Muller, P., Quintana, F. and Rosner, G. (2007). Semiparametric Bayesian inference for multilevel repeated measurement data, Biometrics, 63, 280-289. https://doi.org/10.1111/j.1541-0420.2006.00668.x
  36. Muller, P., Quintana, F. and Rosner, G. L. (2011). A product partition model with regression on covariates, Journal of Computational and Graphical Statistics, 20, 260-278. https://doi.org/10.1198/jcgs.2011.09066
  37. Quintana, F. A. (2006). A predictive view of Bayesian clustering, Journal of Statistical Planning and In- ference, 136, 2407-2429. https://doi.org/10.1016/j.jspi.2004.09.015
  38. Rice, J. A. and Wu, C. O. (2001). Nonparametric mixed effects models for unequally sampled noisy curves, Biometrics, 57, 253-259. https://doi.org/10.1111/j.0006-341X.2001.00253.x
  39. Rodriguez, A., Dunson, D. B. and Gelfand, A. E. (2008). The nested Dirichlet process, Journal of the American Statistical Association, 103, 1131-1154. https://doi.org/10.1198/016214508000000553
  40. Rodriguez, A. and Dunson, D. B. (2011). Nonparametric Bayesian models through probit stick-breaking processes, Bayesian Analysis, 6, 145-178. https://doi.org/10.1214/11-BA605
  41. Sethuraman, J. (1994). A constructive definition of Dirichlet priors, Statistica Sinica, 4, 639-650.
  42. Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes, Journal of the American statistical association, 101, 1566-1581. https://doi.org/10.1198/016214506000000302
  43. Vidakovic, B. (1998). Nonlinear wavelet shrinkage with Bayes rules and Bayes factors, Journal of the American Statistical Association, 93, 173-179. https://doi.org/10.1080/01621459.1998.10474099
  44. Walker, S. and Mallick, B. (1997). Hierarchical generalized linear models and frailty models with Bayesian nonparametric mixing, Journal of the Royal Statistical Society, 59, 845-860. https://doi.org/10.1111/1467-9868.00101
  45. Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions, In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, (eds. P. K. Goel and A. Zellner), 233-243, North-Holland/Elsevier.