DOI QR코드

DOI QR Code

Bayesian Clustering of Prostate Cancer Patients by Using a Latent Class Poisson Model

잠재그룹 포아송 모형을 이용한 전립선암 환자의 베이지안 그룹화

  • Oh Man-Suk (Department of Statistics, Ewha Women’s University)
  • Published : 2005.03.01

Abstract

Latent Class model has been considered recently by many researchers and practitioners as a tool for identifying heterogeneous segments or groups in a population, and grouping objects into the segments. In this paper we consider data on prostate cancer patients from Korean National Cancer Institute and propose a method for grouping prostate cancer patients by using latent class Poisson model. A Bayesian approach equipped with a Markov chain Monte Carlo method is used to overcome the limit of classical likelihood approaches. Advantages of the proposed Bayesian method are easy estimation of parameters with their standard errors, segmentation of objects into groups, and provision of uncertainty measures for the segmentation. In addition, we provide a method to determine an appropriate number of segments for the given data so that the method automatically chooses the number of segments and partitions objects into heterogeneous segments.

최근 많은 연구자와 실무자들이 모집단에 내재해 있는 여러 다른 그룹(class, segment)간의 이질성을 밝혀내고 객체들을 그룹별로 세분화하는 방법 중 하나로 잠재그룹 모델(Latent class model)을 고려하고 있다. 이 논문에서는 2000년도에 국립 암 센터에 접수된 한국 내 연령별 전립선암 사망자수 자료를 기반으로, 잠재그룹 포아송 모형을 이용하여 전립선암 환자의 연령에 따른 그룹화를 시도한다. 최우추정법 등 고전적 추론방법의 한계를 극복하기 위하여 Markov Chain Monte Carlo (MCMC) 방법을 도구로 한 베이지안 추정 방법을 제안한다. 제안된 베이지안 방법의 장점은 용이한 모수추정과 추정오차의 제공, 그리고 각 객체의 소속그룹의 판정과 이에 따르는 오차, 즉, 객체의 각 군집에 속할 확률, 도 구할 수 있다는 것이다. 또한 주어진 자료들에 대해 가장 적합한 그룹의 수를 결정하는 방법을 제시하여 그룹의 수나 세분화의 근거를 사전에 제공하지 않아도 자료가 주는 정보로부터 이들을 자동으로 결정하는 방법을 제시한다.

Keywords

References

  1. DeSarbo, W.S. and Choi, J. (1999). A latent structure double hurdle regression model for exploring heterogeneity in consumer search patterns, Journal of Econometrics, 89, 423-455 https://doi.org/10.1016/S0304-4076(98)00070-0
  2. DeSarbo, W.S. and Cron, W.L. (1988). A maximum likelihood methodology for clusterwise linear regression, Journal of Classification, 5, 249-282 https://doi.org/10.1007/BF01897167
  3. DeSoete and DeSarbo, W.S. (1991). A latent class probit model for analyzing pick any/n data, Journal of Classification, 8, 45-64 https://doi.org/10.1007/BF02616247
  4. Forman, A.K. (1985). Constrained latent class models: theory and applications, British Journal of Mathematical Statistics and Psychology, 38, 87-111 https://doi.org/10.1111/j.2044-8317.1985.tb00818.x
  5. Gelfand, A.E. and Smith, A.F.M. (1990). Sampling-based approaches to calculating marginal densities, Journal of the American Statistical Association, 85, 398-409 https://doi.org/10.2307/2289776
  6. Goodman, L.A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models, Biometrika, 61, 215-231 https://doi.org/10.1093/biomet/61.2.215
  7. Heinen, T. (1993). Discrete Latent Variable Models, Tilberg University Press
  8. Hoijtink, H. (1998). Constrained latent class analysis using the Gibbs sampler and posterior predictive p-values : application to educational testing, Statistica Sinica, 8, 691-711
  9. Jedidi, K., Ramaswamy, V. and DeSarbo, W.S. (1993). A maximum likelihood method for latent class regression involving a censored dependent variable, Psychometrika, 58, 375-394 https://doi.org/10.1007/BF02294647
  10. Kamakura, W.A. and Russel, G. (1989). A probabilistic choice model for market segmentation and elasticity structure, Journal of Marketing Research, 26, 379-390 https://doi.org/10.2307/3172759
  11. Oh, M.S. (1999). Estimation of posterior density functions from a posterior sample, Computational Statistics and Data Analysis, 29, 411-428 https://doi.org/10.1016/S0167-9473(98)00068-1