• Title/Summary/Keyword: zero-inflated count data

Search Result 31, Processing Time 0.025 seconds

Bayesian Analysis of a Zero-inflated Poisson Regression Model: An Application to Korean Oral Hygienic Data (영과잉 포아송 회귀모형에 대한 베이지안 추론: 구강위생 자료에의 적용)

  • Lim, Ah-Kyoung;Oh, Man-Suk
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.505-519
    • /
    • 2006
  • We consider zero-inflated count data, which is discrete count data but has too many zeroes compared to the Poisson distribution. Zero-inflated data can be found in various areas. Despite its increasing importance in practice, appropriate statistical inference on zero-inflated data is limited. Classical inference based on a large number theory does not fit unless the sample size is very large. And regular Poisson model shows lack of St due to many zeroes. To handle the difficulties, a mixture of distributions are considered for the zero-inflated data. Specifically, a mixture of a point mass at zero and a Poisson distribution is employed for the data. In addition, when there exist meaningful covariates selected to the response variable, loglinear link is used between the mean of the response and the covariates in the Poisson distribution part. We propose a Bayesian inference for the zero-inflated Poisson regression model by using a Markov Chain Monte Carlo method. We applied the proposed method to a Korean oral hygienic data and compared the inference results with other models. We found that the proposed method is superior in that it gives small parameter estimation error and more accurate predictions.

A Bayesian joint model for continuous and zero-inflated count data in developmental toxicity studies

  • Hwang, Beom Seuk
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.239-250
    • /
    • 2022
  • In many applications, we frequently encounter correlated multiple outcomes measured on the same subject. Joint modeling of such multiple outcomes can improve efficiency of inference compared to independent modeling. For instance, in developmental toxicity studies, fetal weight and number of malformed pups are measured on the pregnant dams exposed to different levels of a toxic substance, in which the association between such outcomes should be taken into account in the model. The number of malformations may possibly have many zeros, which should be analyzed via zero-inflated count models. Motivated by applications in developmental toxicity studies, we propose a Bayesian joint modeling framework for continuous and count outcomes with excess zeros. In our model, zero-inflated Poisson (ZIP) regression model would be used to describe count data, and a subject-specific random effects would account for the correlation across the two outcomes. We implement a Bayesian approach using MCMC procedure with data augmentation method and adaptive rejection sampling. We apply our proposed model to dose-response analysis in a developmental toxicity study to estimate the benchmark dose in a risk assessment.

A Zero-Inated Model for Insurance Data (제로팽창 모형을 이용한 보험데이터 분석)

  • Choi, Jong-Hoo;Ko, In-Mi;Cheon, Soo-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.3
    • /
    • pp.485-494
    • /
    • 2011
  • When the observations can take only the non-negative integer values, it is called the count data such as the numbers of car accidents, earthquakes, or insurance coverage. In general, the Poisson regression model has been used to model these count data; however, this model has a weakness in that it is restricted by the equality of the mean and the variance. On the other hand, the count data often tend to be too dispersed to allow the use of the Poisson model in practice because the variance of data is significantly larger than its mean due to heterogeneity within groups. When overdispersion is not taken into account, it is expected that the resulting parameter estimates or standard errors will be inefficient. Since coverage is the main issue for insurance, some accidents may not be covered by insurance, and the number covered by insurance may be zero. This paper considers the zero-inflated model for the count data including many zeros. The performance of this model has been investigated by using of real data with overdispersion and many zeros. The results indicate that the Zero-Inflated Negative Binomial Regression Model performs the best for model evaluation.

An application to Multivariate Zero-Inflated Poisson Regression Model

  • Kim, Kyung-Moo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.177-186
    • /
    • 2003
  • The Zero-Inflated Poisson regression is a model for count data with exess zeros. When the correlated response variables are intrested, we have to extend the univariate zero-inflated regression model to multivariate model. In this paper, we study and simulate the multivariate zero-inflated regression model. A real example was applied to this model. Regression parameters are estimated by using MLE's. We also compare the fitness of multivariate zero-inflated Poisson regression model with the decision tree model.

  • PDF

A Bayesian zero-inflated negative binomial regression model based on Pólya-Gamma latent variables with an application to pharmaceutical data (폴랴-감마 잠재변수에 기반한 베이지안 영과잉 음이항 회귀모형: 약학 자료에의 응용)

  • Seo, Gi Tae;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.311-325
    • /
    • 2022
  • For count responses, the situation of excess zeros often occurs in various research fields. Zero-inflated model is a common choice for modeling such count data. Bayesian inference for the zero-inflated model has long been recognized as a hard problem because the form of conditional posterior distribution is not in closed form. Recently, however, Pillow and Scott (2012) and Polson et al. (2013) proposed a Pólya-Gamma data-augmentation strategy for logistic and negative binomial models, facilitating Bayesian inference for the zero-inflated model. We apply Bayesian zero-inflated negative binomial regression model to longitudinal pharmaceutical data which have been previously analyzed by Min and Agresti (2005). To facilitate posterior sampling for longitudinal zero-inflated model, we use the Pólya-Gamma data-augmentation strategy.

A Bayesian zero-inflated Poisson regression model with random effects with application to smoking behavior (랜덤효과를 포함한 영과잉 포아송 회귀모형에 대한 베이지안 추론: 흡연 자료에의 적용)

  • Kim, Yeon Kyoung;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.2
    • /
    • pp.287-301
    • /
    • 2018
  • It is common to encounter count data with excess zeros in various research fields such as the social sciences, natural sciences, medical science or engineering. Such count data have been explained mainly by zero-inflated Poisson model and extended models. Zero-inflated count data are also often correlated or clustered, in which random effects should be taken into account in the model. Frequentist approaches have been commonly used to fit such data. However, a Bayesian approach has advantages of prior information, avoidance of asymptotic approximations and practical estimation of the functions of parameters. We consider a Bayesian zero-inflated Poisson regression model with random effects for correlated zero-inflated count data. We conducted simulation studies to check the performance of the proposed model. We also applied the proposed model to smoking behavior data from the Regional Health Survey (2015) of the Korea Centers for disease control and prevention.

Weighted zero-inflated Poisson mixed model with an application to Medicaid utilization data

  • Lee, Sang Mee;Karrison, Theodore;Nocon, Robert S.;Huang, Elbert
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.173-184
    • /
    • 2018
  • In medical or public health research, it is common to encounter clustered or longitudinal count data that exhibit excess zeros. For example, health care utilization data often have a multi-modal distribution with excess zeroes as well as a multilevel structure where patients are nested within physicians and hospitals. To analyze this type of data, zero-inflated count models with mixed effects have been developed where a count response variable is assumed to be distributed as a mixture of a Poisson or negative binomial and a distribution with a point mass of zeros that include random effects. However, no study has considered a situation where data are also censored due to the finite nature of the observation period or follow-up. In this paper, we present a weighted version of zero-inflated Poisson model with random effects accounting for variable individual follow-up times. We suggested two different types of weight function. The performance of the proposed model is evaluated and compared to a standard zero-inflated mixed model through simulation studies. This approach is then applied to Medicaid data analysis.

An application to Zero-Inflated Poisson Regression Model

  • Kim, Kyung-Moo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.1
    • /
    • pp.45-53
    • /
    • 2003
  • The Zero-Inflated Poisson regression is a model for count data with exess zeros. When the reponse variables have excess zeros, it is not easy to apply the Poisson regression model. In this paper, we study and simulate the zero-inflated Poisson regression model. An real example was applied to this model. Regression parameters are estimated by using MLE's. We also compare the fitness of zero-inflated Poisson model with the Poisson regression and decision tree model.

  • PDF

Sample size calculations for clustered count data based on zero-inflated discrete Weibull regression models

  • Hanna Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.1
    • /
    • pp.55-64
    • /
    • 2024
  • In this study, we consider the sample size determination problem for clustered count data with many zeros. In general, zero-inflated Poisson and binomial models are commonly used for zero-inflated data; however, in real data the assumptions that should be satisfied when using each model might be violated. We calculate the required sample size based on a discrete Weibull regression model that can handle both underdispersed and overdispersed data types. We use the Monte Carlo simulation to compute the required sample size. With our proposed method, a unified model with a low failure risk can be used to cope with the dispersed data type and handle data with many zeros, which appear in groups or clusters sharing a common variation source. A simulation study shows that our proposed method provides accurate results, revealing that the sample size is affected by the distribution skewness, covariance structure of covariates, and amount of zeros. We apply our method to the pancreas disorder length of the stay data collected from Western Australia.

A joint modeling of longitudinal zero-inflated count data and time to event data (경시적 영과잉 가산자료와 생존자료의 결합모형)

  • Kim, Donguk;Chun, Jihun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1459-1473
    • /
    • 2016
  • Both longitudinal data and survival data are collected simultaneously in longitudinal data which are observed throughout the passage of time. In this case, the effect of the independent variable becomes biased (provided that sole use of longitudinal data analysis does not consider the relation between both data used) if the missing that occurred in the longitudinal data is non-ignorable because it is caused by a correlation with the survival data. A joint model of longitudinal data and survival data was studied as a solution for such problem in order to obtain an unbiased result by considering the survival model for the cause of missing. In this paper, a joint model of the longitudinal zero-inflated count data and survival data is studied by replacing the longitudinal part with zero-inflated count data. A hurdle model and proportional hazards model were used for each longitudinal zero inflated count data and survival data; in addition, both sub-models were linked based on the assumption that the random effect of sub-models follow the multivariate normal distribution. We used the EM algorithm for the maximum likelihood estimator of parameters and estimated standard errors of parameters were calculated using the profile likelihood method. In simulation, we observed a better performance of the joint model in bias and coverage probability compared to the separate model.