• Title/Summary/Keyword: count data model

Search Result 235, Processing Time 0.023 seconds

Application of discrete Weibull regression model with multiple imputation

  • Yoo, Hanna
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.325-336
    • /
    • 2019
  • In this article we extend the discrete Weibull regression model in the presence of missing data. Discrete Weibull regression models can be adapted to various type of dispersion data however, it is not widely used. Recently Yoo (Journal of the Korean Data and Information Science Society, 30, 11-22, 2019) adapted the discrete Weibull regression model using single imputation. We extend their studies by using multiple imputation also with several various settings and compare the results. The purpose of this study is to address the merit of using multiple imputation in the presence of missing data in discrete count data. We analyzed the seventh Korean National Health and Nutrition Examination Survey (KNHANES VII), from 2016 to assess the factors influencing the variable, 1 month hospital stay, and we compared the results using discrete Weibull regression model with those of Poisson, negative Binomial and zero-inflated Poisson regression models, which are widely used in count data analyses. The results showed that the discrete Weibull regression model using multiple imputation provided the best fit. We also performed simulation studies to show the accuracy of the discrete Weibull regression using multiple imputation given both under- and over-dispersed distribution, as well as varying missing rates and sample size. Sensitivity analysis showed the influence of mis-specification and the robustness of the discrete Weibull model. Using imputation with discrete Weibull regression to analyze discrete data will increase explanatory power and is widely applicable to various types of dispersion data with a unified model.

Hurdle Model for Longitudinal Zero-Inflated Count Data Analysis (영과잉 경시적 가산자료 분석을 위한 허들모형)

  • Jin, Iktae;Lee, Keunbaik
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.6
    • /
    • pp.923-932
    • /
    • 2014
  • The Hurdle model can to analyze zero-inflated count data. This model is a mixed model of the logit model for a binary component and a truncated Poisson model of a truncated count component. We propose a new hurdle model with a general heterogeneous random effects covariance matrix to analyze longitudinal zero-inflated count data using modified Cholesky decomposition. This decomposition factors the random effects covariance matrix into generalized autoregressive parameters and innovation variance. The parameters are modeled using (generalized) linear models and estimated with a Bayesian method. We use these methods to carefully analyze a real dataset.

Count Data Model for The Estimation of Bus Ridership (Focusing on Commuters and Students in Seoul) (가산자료모형(Count Data Model)을 이용한 버스이용횟수추정에 관한 연구 (서울시 통근.통학자를 대상으로))

  • 문진수;김순관;임강원
    • Journal of Korean Society of Transportation
    • /
    • v.17 no.5
    • /
    • pp.123-135
    • /
    • 1999
  • The rapid increase of Passenger cars which is caused by the discomfort of Public transit and the Preference of automobiles is the major factor of increasing traffic congestions in Seoul With the point that leading the automobilists to the Public transit can be the most important Policy to ease these traffic congestions, this study focuses on the behavioral aspects of company employees and university students and investigates factors influencing bus ridership. To be brief, by estimating bus ridership through count models, this study investigates factors which influence bus ridership and elicits Political suggestions which lead automobilists to Public transit. The Purpose in this study is the application of appropriate count data model. The count data models have been widely applied to the economic area from the middle of the 1980s and to transportation aspect mainly in the foreign countries from the latter half of the 1980s. Even though a few studies in this country employed count data model to count data. all of them were Poisson regression models without suitable tests for the importance of the model specification. In the end, as the result of statistical test, negative binomial regression model which is suitable for overdispersed data was found to be appropriate for the data of weekly bus ridership. To emphasize the importance of model specification, both of poisson regression model and negative binomial regression model were estimated and the results were compared.

  • PDF

Sample size calculations for clustered count data based on zero-inflated discrete Weibull regression models

  • Hanna Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.1
    • /
    • pp.55-64
    • /
    • 2024
  • In this study, we consider the sample size determination problem for clustered count data with many zeros. In general, zero-inflated Poisson and binomial models are commonly used for zero-inflated data; however, in real data the assumptions that should be satisfied when using each model might be violated. We calculate the required sample size based on a discrete Weibull regression model that can handle both underdispersed and overdispersed data types. We use the Monte Carlo simulation to compute the required sample size. With our proposed method, a unified model with a low failure risk can be used to cope with the dispersed data type and handle data with many zeros, which appear in groups or clusters sharing a common variation source. A simulation study shows that our proposed method provides accurate results, revealing that the sample size is affected by the distribution skewness, covariance structure of covariates, and amount of zeros. We apply our method to the pancreas disorder length of the stay data collected from Western Australia.

A Bayesian zero-inflated Poisson regression model with random effects with application to smoking behavior (랜덤효과를 포함한 영과잉 포아송 회귀모형에 대한 베이지안 추론: 흡연 자료에의 적용)

  • Kim, Yeon Kyoung;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.2
    • /
    • pp.287-301
    • /
    • 2018
  • It is common to encounter count data with excess zeros in various research fields such as the social sciences, natural sciences, medical science or engineering. Such count data have been explained mainly by zero-inflated Poisson model and extended models. Zero-inflated count data are also often correlated or clustered, in which random effects should be taken into account in the model. Frequentist approaches have been commonly used to fit such data. However, a Bayesian approach has advantages of prior information, avoidance of asymptotic approximations and practical estimation of the functions of parameters. We consider a Bayesian zero-inflated Poisson regression model with random effects for correlated zero-inflated count data. We conducted simulation studies to check the performance of the proposed model. We also applied the proposed model to smoking behavior data from the Regional Health Survey (2015) of the Korea Centers for disease control and prevention.

Application of Bootstrap Method to Primary Model of Microbial Food Quality Change

  • Lee, Dong-Sun;Park, Jin-Pyo
    • Food Science and Biotechnology
    • /
    • v.17 no.6
    • /
    • pp.1352-1356
    • /
    • 2008
  • Bootstrap method, a computer-intensive statistical technique to estimate the distribution of a statistic was applied to deal with uncertainty and variability of the experimental data in stochastic prediction modeling of microbial growth on a chill-stored food. Three different bootstrapping methods for the curve-fitting to the microbial count data were compared in determining the parameters of Baranyi and Roberts growth model: nonlinear regression to static version function with resampling residuals onto all the experimental microbial count data; static version regression onto mean counts at sampling times; dynamic version fitting of differential equations onto the bootstrapped mean counts. All the methods outputted almost same mean values of the parameters with difference in their distribution. Parameter search according to the dynamic form of differential equations resulted in the largest distribution of the model parameters but produced the confidence interval of the predicted microbial count close to those of nonlinear regression of static equation.

An Analysis of Panel Count Data from Multiple random processes

  • Park, You-Sung;Kim, Hee-Young
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.11a
    • /
    • pp.265-272
    • /
    • 2002
  • An Integer-valued autoregressive integrated (INARI) model is introduced to eliminate stochastic trend and seasonality from time series of count data. This INARI extends the previous integer-valued ARMA model. We show that it is stationary and ergodic to establish asymptotic normality for conditional least squares estimator. Optimal estimating equations are used to reflect categorical and serial correlations arising from panel count data and variations arising from three random processes for obtaining observation into estimation. Under regularity conditions for martingale sequence, we show asymptotic normality for estimators from the estimating equations. Using cancer mortality data provided by the U.S. National Center for Health Statistics (NCHS), we apply our results to estimate the probability of cells classified by 4 causes of death and 6 age groups and to forecast death count of each cell. We also investigate impact of three random processes on estimation.

  • PDF

Poisson linear mixed models with ARMA random effects covariance matrix

  • Choi, Jiin;Lee, Keunbaik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.4
    • /
    • pp.927-936
    • /
    • 2017
  • To analyze longitudinal count data, Poisson linear mixed models are commonly used. In the models the random effects covariance matrix explains both within-subject variation and serial correlation of repeated count outcomes. When the random effects covariance matrix is assumed to be misspecified, the estimates of covariates effects can be biased. Therefore, we propose reasonable and flexible structures of the covariance matrix using autoregressive and moving average Cholesky decomposition (ARMACD). The ARMACD factors the covariance matrix into generalized autoregressive parameters (GARPs), generalized moving average parameters (GMAPs) and innovation variances (IVs). Positive IVs guarantee the positive-definiteness of the covariance matrix. In this paper, we use the ARMACD to model the random effects covariance matrix in Poisson loglinear mixed models. We analyze epileptic seizure data using our proposed model.

Weighted zero-inflated Poisson mixed model with an application to Medicaid utilization data

  • Lee, Sang Mee;Karrison, Theodore;Nocon, Robert S.;Huang, Elbert
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.173-184
    • /
    • 2018
  • In medical or public health research, it is common to encounter clustered or longitudinal count data that exhibit excess zeros. For example, health care utilization data often have a multi-modal distribution with excess zeroes as well as a multilevel structure where patients are nested within physicians and hospitals. To analyze this type of data, zero-inflated count models with mixed effects have been developed where a count response variable is assumed to be distributed as a mixture of a Poisson or negative binomial and a distribution with a point mass of zeros that include random effects. However, no study has considered a situation where data are also censored due to the finite nature of the observation period or follow-up. In this paper, we present a weighted version of zero-inflated Poisson model with random effects accounting for variable individual follow-up times. We suggested two different types of weight function. The performance of the proposed model is evaluated and compared to a standard zero-inflated mixed model through simulation studies. This approach is then applied to Medicaid data analysis.

Estimating the Economic Value of Recreation Sea Fishing in the Yellow Sea: An Application of Count Data Model (가산자료모형을 이용한 서해 태안군 유어객의 편익추정)

  • Choi, Jong Du
    • Environmental and Resource Economics Review
    • /
    • v.23 no.2
    • /
    • pp.331-347
    • /
    • 2014
  • The purpose of this study is to estimate the economic value of the recreational sea fishing in the Yellow Sea using count data model. For estimating consumer surplus, we used several count data model of travel cost recreation demand such as a poisson model(PM), a negative binomial model(NBM), a truncated poisson model(TPM), and a truncated negative binomial model(TNBM). Model results show that there is no exist the over-dispersion problem and a NBM was statistically more suitable than the other models. All parameters estimated are statistically significant and theoretically valid. The NBM was applied to estimate the travel demand and consumer surplus. The consumer surplus pre trip was estimated to be 254,453won, total consumer surplus per person and per year 1,536,896won.