Similarity between the dispersion parameter in zero-altered model and the two goodness-of-fit statistics

Yun, Yujeong;Kim, Honggie;

doi:10.7465/jkdi.2017.28.3.493

Journal of the Korean Data and Information Science Society

Volume 28 Issue 3
/
Pages.493-504
/
2017
/
1598-9402(pISSN)

The Korean Data and Information Science Society (한국데이터정보과학회)

DOI QR Code

Similarity between the dispersion parameter in zero-altered model and the two goodness-of-fit statistics

영 변환 모형 산포형태모수와 두 적합도 검정통계량 사이의 유사성 비교

Yun, Yujeong (Research Division, Asia Pacific Population Institute) ;
Kim, Honggie (Department of Information and Statistics, Chungnam National University)

윤유정 (아태인구연구원) ;
김홍기 (충남대학교 정보통계학과)

Received : 2017.04.08
Accepted : 2017.05.18
Published : 2017.05.31

https://doi.org/10.7465/jkdi.2017.28.3.493 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

We often observe count data that exhibit over-dispersion, originating from too many zeros, and under-dispersion, originating from too few zeros. To handle this types of problems, the zero-altered distribution model is designed by Ghosh and Kim in 2007. Their model can control both over-dispersion and under-dispersion with a single parameter, which had been impossible ever. The dispersion type depends on the sign of the parameter ${\delta}$ in zero-altered distribution. In this study, we demonstrate the role of the dispersion type parameter ${\delta}$ through the data of the number of births in Korea. Employing both the chi-square statistic and the Kolmogorov statistic for goodness-of-fit, we also explained any difference between the theoretical distribution and the observed one that exhibits either over-dispersion or under-dispersion. Finally this study shows whether the test statistics for goodness-of-fit show any similarity with the role of the dispersion type parameter ${\delta}$ or not.

통계청 인구총조사의 출생아 수 자료는 우리가 쉽게 접할 수 있는 가산 자료이며 국가경쟁력 제고를 위한 정부의 출산정책 결정 및 그 기대효과 분석의 기반이 되는 자료이다. 출생아 수 자료 분석에 있어서 포아송 모형 등 가산 모형이 우월하다는 선행 연구결과에 의하여 가산 모형을 통한 자료 분석방법이 활용되고 있다. 이 때 가산 모형에서 가장 많이 사용하는 포아송 모형은 균등상포라는 제한적인 가정을 토대로 하기 때문에 출생아 수 자료 분석에 이 포아송 모형을 그대로 적용한다면 정보의 손실과 편향추정을 피할 수 없게 된다. 이러한 한계를 극복하기 위해 Ghosh 와 Kim (2007)은 영 과잉과 부족으로 인한 과대산포와 과소산포를 동시에 설명할 수 있는 영 변환 모형 (zero-altered model)을 제안하였다. 본 논문에서는 Ghosh 와 Kim (2007)의 영 변환 모형을 적용하여 실제 출생아수분포에서 영 변환 모형의 산포형태모수 ${\delta}$를 도출하고 그 역할에 대하여 분석한다. 그리고 관측분포에서의 산포형태모수 ${\delta}$와 이론적분포와의 차이를 비교하기 위한 적합도 검정통계량과의 유사성을 확인한다.

Keywords

References

Castillo, J. D. and Perez-Casany, M. (2005). Overdispersed and underdispersed Poisson generalizations. Journal of Statistical Planning and Inference, 134, 486-500. https://doi.org/10.1016/j.jspi.2004.04.019
Chun, H. (2016). The factors of insurance solicitor's turnovers of life insurance using Poisson regression. Journal of the Korean Data & Information Science Society, 27, 1337-1347. https://doi.org/10.7465/jkdi.2016.27.5.1337
Ghosh, S. K. and Kim, H. (2007). Semiparametric inference based on a class of zero-altered distributions. Statistical Methodology, 4, 371-378. https://doi.org/10.1016/j.stamet.2007.01.001
Gupta, P. L., Gupta, R. C. and Tripathi, R. C. (1996). Analysis of zero-adjusted count data. Computational Statistics and Data Analysis, 23, 207-218. https://doi.org/10.1016/S0167-9473(96)00032-1
Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics 56, 1030-1039. https://doi.org/10.1111/j.0006-341X.2000.01030.x
Heilbron, D. C. (1994). Zero-altered and other regression models for count data with added zeroes. Biometrical Journal, 36, 531-547. https://doi.org/10.1002/bimj.4710360505
Kang, S. Han, J. Seo, Y. and Jeong, J. (2014). Goodness-of-fit tests for the inverse Weibull or extreme value distribution based on multiply type-II censored samples. Journal of the Korean Data & Information Science Society, 25, 903-914. https://doi.org/10.7465/jkdi.2014.25.4.903
Kim, K. (2013). A study on the role of the dispersion parameter in Ghosh and Kim's zero-altered model, Master's thesis, Graduate school of Chungnam national university, Daejeon, Republic of Korea.
Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1-14. https://doi.org/10.2307/1269547
Ra, Y. (2012). Application of zero-altered model to the distribution of number of children of Korean married women, Master's thesis, Graduate school of Chungnam national university, Daejeon, Republic of Korea.
Ridout, M., Demetrio, C. G. B. and Hinde, J. (1998). Models for count data with many zeros. International Biometric Conference.
Ridout, M., Hinde, J. and Demerio, C. G. B. (2001). A score test for testing a zero-infalted poisson regression model against zero-inflated negative binomial alternatives. Biometrics, 57, 219-223. https://doi.org/10.1111/j.0006-341X.2001.00219.x
Van den Broeck, J. (1995). A Score test for zero inflation in poisson distribution. Biometrics, 51, 738-743. https://doi.org/10.2307/2532959
Xiang, L., Lee, A. H., Yau, K. K. W. and McLachlan, G. J. (2006). A score test for over-dispersion in zero-inflated poisson mixed regression model. Inter Science, 26, 1608-1622.
Xie, M., He, B. and Goh, T. N. (2001). Zero-inflated Poisson model in statistical process control. Computational Statistics and Data Analysis, 38, 191-201. https://doi.org/10.1016/S0167-9473(01)00033-0
Zhao, Y. (2006). Score test for generalization and zero-inflation in countdata modeling. ProQuest, 131.

Journal of the Korean Data and Information Science Society

Similarity between the dispersion parameter in zero-altered model and the two goodness-of-fit statistics

영 변환 모형 산포형태모수와 두 적합도 검정통계량 사이의 유사성 비교

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)