DOI QR코드

DOI QR Code

폴랴-감마 잠재변수에 기반한 베이지안 영과잉 음이항 회귀모형: 약학 자료에의 응용

A Bayesian zero-inflated negative binomial regression model based on Pólya-Gamma latent variables with an application to pharmaceutical data

  • 서기태 (중앙대학교 응용통계학과) ;
  • 황범석 (중앙대학교 응용통계학과)
  • Seo, Gi Tae (Department of Applied Statistics, Chung-Ang University) ;
  • Hwang, Beom Seuk (Department of Applied Statistics, Chung-Ang University)
  • 투고 : 2021.12.23
  • 심사 : 2022.01.22
  • 발행 : 2022.04.30

초록

0의 값을 과도하게 포함하는 가산자료는 다양한 연구 분야에서 흔히 나타난다. 영과잉 모형은 영과잉 가산자료를 분석하기 위해 가장 일반적으로 사용되는 모형이다. 영과잉 모형에 대한 전통적인 베이지안 추론은 조건부 사후분포의 형태가 폐쇄형 분포로 나타나지 않아 모형 적합 과정이 용이하지 않다는 한계점이 존재했다. 그러나 최근 Pillow와 Scott (2012)과 Polson 등 (2013)이 제안한 폴랴-감마 자료확대전략으로 인해, 로지스틱 회귀모형과 음이항 회귀모형에서 깁스 샘플링을 통한 추론이 가능해지면서, 영과잉 모형에 대한 베이지안 추론이 용이해졌다. 본 논문에서는 베이지안 추론에 기반한 영과잉 음이항 회귀모형을 Min과 Agresti(2005)에서 분석된 약학 연구 자료에 적용해본다. 분석에 사용된 자료는 경시적 영과잉 가산자료로 복잡한 자료 구조를 가지고 있다. 모형 적합 과정에서는 깁스 샘플링을 통한 추론을 수행하기 위해 폴랴-감마 자료확대전략을 사용한다.

For count responses, the situation of excess zeros often occurs in various research fields. Zero-inflated model is a common choice for modeling such count data. Bayesian inference for the zero-inflated model has long been recognized as a hard problem because the form of conditional posterior distribution is not in closed form. Recently, however, Pillow and Scott (2012) and Polson et al. (2013) proposed a Pólya-Gamma data-augmentation strategy for logistic and negative binomial models, facilitating Bayesian inference for the zero-inflated model. We apply Bayesian zero-inflated negative binomial regression model to longitudinal pharmaceutical data which have been previously analyzed by Min and Agresti (2005). To facilitate posterior sampling for longitudinal zero-inflated model, we use the Pólya-Gamma data-augmentation strategy.

키워드

과제정보

이 논문은 2019년도 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업(NRF-2019R1C1C1011710)이고, 산업통상자원부(MOTIE)와 한국에너지기술평가원(KETEP)의 지원을 받아 수행한 연구 과제임(No. 20199710100060).

참고문헌

  1. Banerjee S, Carlin BP, and Gelfand AE (2014). Hierarchical Modeling and Analysis for Spatial Data, Chapman & Hall/CRC, Boca Raton, FL.
  2. Bohning D (1998). Zero-inflated Poisson models and C.A.MAN: A tutorial collection of evidence, Biometrical Journal, 40, 833-843. https://doi.org/10.1002/(SICI)1521-4036(199811)40:7<833::AID-BIMJ833>3.0.CO;2-O
  3. Casella G and George EI (1992). Explaining the Gibbs sampler, The American Statistician, 46, 167-174. https://doi.org/10.2307/2685208
  4. Chen MH, Dey DK, and Shao QM (1999). A new skewed link model for dichotomous quantal response data, Journal of the American Statistical Association, 94, 1172-1186. https://doi.org/10.1080/01621459.1999.10473872
  5. Chib S and Greenberg E (1995). Understanding the Metropolis-Hastings algorithm, The American Statistician, 49, 327-335. https://doi.org/10.2307/2684568
  6. Fahrmeir L and Osuna EL (2006). Structured additive regression for overdispersed and zero-inflated count data, Applied Stochastic Models in Business and Industry, 22, 351-369. https://doi.org/10.1002/asmb.631
  7. Gelman A, Meng XL, and Stern H (1996). Posterior predictive assessment of model fitness via realized discrepancies, Statistica Sinica, 6, 733-807.
  8. Geweke J (1992). Evaluating the accuracy of sampling-based approaches to the calculations of posterior moments, Bayesian statistics, 4, 641-649.
  9. Ghosh SK, Mukhopadhyay P, and Lu JC (2006). Bayesian analysis of zero-inflated regression models, Journal of Statistical Planning and Inference, 136, 1360-1375. https://doi.org/10.1016/j.jspi.2004.10.008
  10. Gilks WR and Wild P (1992). Adaptive rejection sampling for Gibbs sampling, Journal of the Royal Statistical Society: Series C (Applied Statistics), 41, 337-348. https://doi.org/10.2307/2347565
  11. Gschlossl S and Czado C (2008). Modelling count data with overdispersion and spatial effects, Statistical papers, 49, 531-552. https://doi.org/10.1007/s00362-006-0031-6
  12. Lambert D (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing, Technometrics, 49, 1-14. https://doi.org/10.1198/004017006000000372
  13. Min Y and Agresti A (2005). Random effect models for repeated measures of zero-inflated count data, Statistical Modelling, 5, 1-19. https://doi.org/10.1191/1471082X05st084oa
  14. Neelon B (2019). Bayesian zero-inflated negative binomial regression based on Polya-Gamma mixtures, Bayesian Analysis, 14, 829-855. https://doi.org/10.1214/18-BA1132
  15. Pillow J and Scott J (2012). Fully Bayesian inference for neural models with negative-binomial spiking, Advances in Neural Information Processing Systems, 25, 1898-1906.
  16. Polson NG, Scott JG, and Windle J (2013). Bayesian inference for logistic models using Polya-Gamma latent variables, Journal of the American statistical Association, 108, 1339-1349. https://doi.org/10.1080/01621459.2013.829001
  17. Rodrigues J (2003). Bayesian analysis of zero-inflated distributions, Communications in Statistics-Theory and Methods, 32, 281-289. https://doi.org/10.1081/STA-120018186
  18. Rubin DB (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician, The Annals of Statistics, 12, 1151-1172. https://doi.org/10.1214/aos/1176346785
  19. Yau KK, Wang K, and Lee AH (2003). Zero-inflated negative binomial mixed regression modeling of overdispersed count data with extra zeros, Biometrical Journal: Journal of Mathematical Methods in Biosciences, 45, 437-452. https://doi.org/10.1002/bimj.200390024
  20. Zhou M and Carin L (2013). Negative binomial process count and mixture modeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 307-320. https://doi.org/10.1109/TPAMI.2013.211