A Zero-Inated Model for Insurance Data

Choi, Jong-Hoo;Ko, In-Mi;Cheon, Soo-Young;

doi:10.5351/KJAS.2011.24.3.485

The Korean Journal of Applied Statistics (응용통계연구)

Volume 24 Issue 3
/
Pages.485-494
/
2011
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

A Zero-Inated Model for Insurance Data

제로팽창 모형을 이용한 보험데이터 분석

Choi, Jong-Hoo (Department of Informational Statistics, Korea University) ;
Ko, In-Mi (Department of Informational Statistics, Korea University) ;
Cheon, Soo-Young (Department of Informational Statistics, Korea University)

최종후 (고려대학교 정보통계학과) ;
고인미 (고려대학교 정보통계학과) ;
전수영 (고려대학교 정보통계학과)

Received : 20110400
Accepted : 20110500
Published : 2011.06.30

https://doi.org/10.5351/KJAS.2011.24.3.485 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

When the observations can take only the non-negative integer values, it is called the count data such as the numbers of car accidents, earthquakes, or insurance coverage. In general, the Poisson regression model has been used to model these count data; however, this model has a weakness in that it is restricted by the equality of the mean and the variance. On the other hand, the count data often tend to be too dispersed to allow the use of the Poisson model in practice because the variance of data is significantly larger than its mean due to heterogeneity within groups. When overdispersion is not taken into account, it is expected that the resulting parameter estimates or standard errors will be inefficient. Since coverage is the main issue for insurance, some accidents may not be covered by insurance, and the number covered by insurance may be zero. This paper considers the zero-inflated model for the count data including many zeros. The performance of this model has been investigated by using of real data with overdispersion and many zeros. The results indicate that the Zero-Inflated Negative Binomial Regression Model performs the best for model evaluation.

계수(Count) 데이터는 반응변수가 음이 아닌 계수로, 자동차 사고건수나 지진이 일어난 횟수, 보험처리 발생건수 등을 말한다. 이런 경우에는 주로 포아송 회귀모형을 사용하지만, 평균과 분산이 동일한 경우만 이용될 수 있다는 제약이 따른다. 실증적 자료에서는 그룹 간 이질성으로 인해 분산이 매우 큰 과대산포(Overdispersion) 현상을 볼 수 있는데, 이를 무시할 경우 회귀계수나 표준오차가 편의되는 현상이 발생한다. 보험은 보장성 개념이 강하기 때문에 실제로 보험처리가 발생하지 않는 경우가 많아, 보험처리 건수에 '0'값이 있을 수 있다. 본 논문에서는 '0'값이 많은 자료의 분석을 위해 제로팽창 모형(Zero-Inflated Model)을 고려하고, 여러 모형들의 효율성을 실증자료를 통하여 비교하였다. 실증 자료 분석 결과, 과대산포와 제로팽창 현상이 존재하는 자료에서 제로팽창 음이항 모형(Zero-Inflated Negative Binomial Regression Model)이 가장 효율적인 모형임을 보여 주었다.

Keywords

References

강현철, 최종후, 한상태 (2001). 데이터마이닝: 방법론 및 활용(제3판), 자유아카데미, 서울.
기승도, 김대환 (2009). 일반화선형모형(GLM)을 이용한 자동차보험 요율상대도 산출방법 연구, 보험연구원.
김명준, 김영화 (2009). 다양한 모형화를 통한 자동차 보험가격 산출, Journal of the Korean Data & Information Science Society, 20, 515-526.
박상일 (2009). 제로팽창 음이항 회귀모형을 이용한 MMS 사용횟수에 대한 분석, 서울시립대학교 대학원 통계학과 석사학위 논문.
변혜원, 박정희 (2010). 2010 보험소비자 설문조사, 보험연구원.
전희주, 최용석, 최종후, 기승도, 김은석 (2009). 보험자료를 활용한 일반화 선형모형, 사이플러스, 서울.
Berry, M. J. A. and Linoff, G. (1997). Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons, New York.
Cameron, A. C. and Trivedi, P. K. (1998). Regression Analysis of Count Data, 7th, Cambridge University press, New York.
Cox, D. R. (1983). Some remarks on overdispersion, Biometrika, 70, 269-274. https://doi.org/10.1093/biomet/70.1.269
Dean, C. and Lawless, F. (1989). Tests for detecting overdispersion in Poisson regression models, Journal of the American Statistical Association, 84, 467-472. https://doi.org/10.2307/2289931
Grogger, J. and Carson, R. (1991). Models for truncated counts, Journal of Applied Econometrics, 6, 225-238. https://doi.org/10.1002/jae.3950060302
Gurmu, S. (1991). Tests for detecting overdispersion in the positive Poisson regression model, Journal of Business and Economic Statistics, 9, 215-222. https://doi.org/10.2307/1391790
Jung, B. C., Jhun, M. and Song, S. H. (2006). Testing for overdispersion in a censored Poisson regression model, Statistics, 40, 533-544. https://doi.org/10.1080/02331880601012884
Piet, D. J. and Gillian, Z. H. (2008). Generalized Linear Models for Insurance Data, Cambridge University Press, New York.
Ridout, M., Hinde, J. and Demetrio, C. G. B. (2001). A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives, Biometrics, 57, 219-223. https://doi.org/10.1111/j.0006-341X.2001.00219.x

Cited by

The study on the determinants of the number of job changes vol.26, pp.2, 2015, https://doi.org/10.7465/jkdi.2015.26.2.387

The Korean Journal of Applied Statistics (응용통계연구)

A Zero-Inated Model for Insurance Data

제로팽창 모형을 이용한 보험데이터 분석

Abstract

Keywords

References

Cited by

Detail Search