• Title/Summary/Keyword: clustered count data

Search Result 10, Processing Time 0.017 seconds

Modeling clustered count data with discrete weibull regression model

  • Yoo, Hanna
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.413-420
    • /
    • 2022
  • In this study we adapt discrete weibull regression model for clustered count data. Discrete weibull regression model has an attractive feature that it can handle both under and over dispersion data. We analyzed the eighth Korean National Health and Nutrition Examination Survey (KNHANES VIII) from 2019 to assess the factors influencing the 1 month outpatient stay in 17 different regions. We compared the results using clustered discrete Weibull regression model with those of Poisson, negative binomial, generalized Poisson and Conway-maxwell Poisson regression models, which are widely used in count data analyses. The results show that the clustered discrete Weibull regression model using random intercept model gives the best fit. Simulation study is also held to investigate the performance of the clustered discrete weibull model under various dispersion setting and zero inflated probabilities. In this paper it is shown that using a random effect with discrete Weibull regression can flexibly model count data with various dispersion without the risk of making wrong assumptions about the data dispersion.

Sample size calculations for clustered count data based on zero-inflated discrete Weibull regression models

  • Hanna Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.1
    • /
    • pp.55-64
    • /
    • 2024
  • In this study, we consider the sample size determination problem for clustered count data with many zeros. In general, zero-inflated Poisson and binomial models are commonly used for zero-inflated data; however, in real data the assumptions that should be satisfied when using each model might be violated. We calculate the required sample size based on a discrete Weibull regression model that can handle both underdispersed and overdispersed data types. We use the Monte Carlo simulation to compute the required sample size. With our proposed method, a unified model with a low failure risk can be used to cope with the dispersed data type and handle data with many zeros, which appear in groups or clusters sharing a common variation source. A simulation study shows that our proposed method provides accurate results, revealing that the sample size is affected by the distribution skewness, covariance structure of covariates, and amount of zeros. We apply our method to the pancreas disorder length of the stay data collected from Western Australia.

Bayesian Parameter :Estimation and Variable Selection in Random Effects Generalised Linear Models for Count Data

  • Oh, Man-Suk;Park, Tae-Sung
    • Journal of the Korean Statistical Society
    • /
    • v.31 no.1
    • /
    • pp.93-107
    • /
    • 2002
  • Random effects generalised linear models are useful for analysing clustered count data in which responses are usually correlated. We propose a Bayesian approach to parameter estimation and variable selection in random effects generalised linear models for count data. A simple Gibbs sampling algorithm for parameter estimation is presented and a simple and efficient variable selection is done by using the Gibbs outputs. An illustrative example is provided.

Modelling Count Responses with Overdispersion

  • Jeong, Kwang Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.761-770
    • /
    • 2012
  • We frequently encounter outcomes of count that have extra variation. This paper considers several alternative models for overdispersed count responses such as a quasi-Poisson model, zero-inflated Poisson model and a negative binomial model with a special focus on a generalized linear mixed model. We also explain various goodness-of-fit criteria by discussing their appropriateness of applicability and cautions on misuses according to the patterns of response categories. The overdispersion models for counts data have been explained through two examples with different response patterns.

Weighted zero-inflated Poisson mixed model with an application to Medicaid utilization data

  • Lee, Sang Mee;Karrison, Theodore;Nocon, Robert S.;Huang, Elbert
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.173-184
    • /
    • 2018
  • In medical or public health research, it is common to encounter clustered or longitudinal count data that exhibit excess zeros. For example, health care utilization data often have a multi-modal distribution with excess zeroes as well as a multilevel structure where patients are nested within physicians and hospitals. To analyze this type of data, zero-inflated count models with mixed effects have been developed where a count response variable is assumed to be distributed as a mixture of a Poisson or negative binomial and a distribution with a point mass of zeros that include random effects. However, no study has considered a situation where data are also censored due to the finite nature of the observation period or follow-up. In this paper, we present a weighted version of zero-inflated Poisson model with random effects accounting for variable individual follow-up times. We suggested two different types of weight function. The performance of the proposed model is evaluated and compared to a standard zero-inflated mixed model through simulation studies. This approach is then applied to Medicaid data analysis.

A Bayesian zero-inflated Poisson regression model with random effects with application to smoking behavior (랜덤효과를 포함한 영과잉 포아송 회귀모형에 대한 베이지안 추론: 흡연 자료에의 적용)

  • Kim, Yeon Kyoung;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.2
    • /
    • pp.287-301
    • /
    • 2018
  • It is common to encounter count data with excess zeros in various research fields such as the social sciences, natural sciences, medical science or engineering. Such count data have been explained mainly by zero-inflated Poisson model and extended models. Zero-inflated count data are also often correlated or clustered, in which random effects should be taken into account in the model. Frequentist approaches have been commonly used to fit such data. However, a Bayesian approach has advantages of prior information, avoidance of asymptotic approximations and practical estimation of the functions of parameters. We consider a Bayesian zero-inflated Poisson regression model with random effects for correlated zero-inflated count data. We conducted simulation studies to check the performance of the proposed model. We also applied the proposed model to smoking behavior data from the Regional Health Survey (2015) of the Korea Centers for disease control and prevention.

A Study for Recent Development of Generalized Linear Mixed Model (일반화된 선형 혼합 모형(GENERALIZED LINEAR MIXED MODEL: GLMM)에 관한 최근의 연구 동향)

  • 이준영
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.541-562
    • /
    • 2000
  • The generalized linear mixed model framework is for handling count-type categorical data as well as for clustered or overdispersed non-Gaussian data, or for non-linear model data. In this study, we review its general formulation and estimation methods, based on quasi-likelihood and Monte-Carlo techniques. The current research areas and topics for further development are also mentioned.

  • PDF

Maximum Likelihood Estimation Using Laplace Approximation in Poisson GLMMs

  • Ha, Il-Do
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.6
    • /
    • pp.971-978
    • /
    • 2009
  • Poisson generalized linear mixed models(GLMMs) have been widely used for the analysis of clustered or correlated count data. For the inference marginal likelihood, which is obtained by integrating out random effects is often used. It gives maximum likelihood(ML) estimator, but the integration is usually intractable. In this paper, we propose how to obtain the ML estimator via Laplace approximation based on hierarchical-likelihood (h-likelihood) approach under the Poisson GLMMs. In particular, the h-likelihood avoids the integration itself and gives a statistically efficient procedure for various random-effect models including GLMMs. The proposed method is illustrated using two practical examples and simulation studies.

Generalized Linear Mixed Model for Multivariate Multilevel Binomial Data (다변량 다수준 이항자료에 대한 일반화선형혼합모형)

  • Lim, Hwa-Kyung;Song, Seuck-Heun;Song, Ju-Won;Cheon, Soo-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.6
    • /
    • pp.923-932
    • /
    • 2008
  • We are likely to face complex multivariate data which can be characterized by having a non-trivial correlation structure. For instance, omitted covariates may simultaneously affect more than one count in clustered data; hence, the modeling of the correlation structure is important for the efficiency of the estimator and the computation of correct standard errors, i.e., valid inference. A standard way to insert dependence among counts is to assume that they share some common unobservable variables. For this assumption, we fitted correlated random effect models considering multilevel model. Estimation was carried out by adopting the semiparametric approach through a finite mixture EM algorithm without parametric assumptions upon the random coefficients distribution.

Characteristics and Survival of Genus Vibrio Isolated in the Intertidal Zone of the Yellow Sea near Kunsan (군산인근해역에서 분리동정된 Vibrio 속의 특성과 해수에서의 생존)

  • 왕혜영;이건형
    • Korean Journal of Environmental Biology
    • /
    • v.17 no.4
    • /
    • pp.439-448
    • /
    • 1999
  • To investigate the population dynamics and survival of Genus Vibrio, population densities of aerobic saprophytic bacteria and Vibrio groups were measured 4 times in the intertidal waters of the Yellow Sea near Kunsan from November, 1997 to June, 1998. The distribution of heterotrophic bacteria during the survey periods by plate count and direct count method ranged from 1.2$\pm$0.6$\times$10$^3$~2.0$\pm$1.5$\times$10$^4$CFU ml­$^1$and from 6.0$\pm$4.0$\times$10$^{5}$ ~1.9$\pm$1.5$\times$10$^{7}$ cells ml­$^1$, respectively. Vibrio groups were distributed in the range of 1$\times$10 and 6$\pm$2.2$\times$10$^2$CFU ml­$^1$. The proportion of Vibrio groups to total heterotrophic bacteria was between 0.1 and 6% during the survey periods. A total of 51 isolates was obtained from TCBS agar plates and identified to species level by Biolog Identification System$^{TM}$. As a result, dominant genera were V, mediterranei, V aitguillarum, tr metschnikovii, and V. parahaemolyticus, and isolates were clustered into 26 groups based on the relatedness of average linkage clustering method at 70% level. As for the susceptibility of 51 isolates to 7 kinds of antibacterial agents (gentamicin, ampicillin, chlorarnphenicol, streptomycin, kanamycin, tetracycline, carbenicillin), 96% of isolates showed high resistance to more than one antibiotics and 65% of isolates contained a plasmid, of which size was observed greater than 12 kb, The number of cells of 3 tested strains (V. anguillarum, V. vulnificus, and V. metschnikovii) in filtered aged seawater decreased by approximately 1 to 5 orders of magnitude during 30-d incubation. In most cases, the numbers of cells decreased rapidly until day 3, then decreased slowly by day 30. The number of cells incubated at 15$^{\circ}C$ showed higher survival than those at 4$^{\circ}C$ and $25^{\circ}C$. These results may be considered for the basic supporting data in the risk assessment of vibriosis in summer.r.

  • PDF