DOI QR코드

DOI QR Code

군집 특정 변량효과를 포함한 유한 혼합 모형의 베이지안 분석

Bayesian analysis of finite mixture model with cluster-specific random effects

  • 이혜진 (덕성여자대학교 통계학과) ;
  • 경민정 (덕성여자대학교 통계학과)
  • Lee, Hyejin (Department of Statistics, Duksung Women's University) ;
  • Kyung, Minjung (Department of Statistics, Duksung Women's University)
  • 투고 : 2016.09.22
  • 심사 : 2016.12.20
  • 발행 : 2017.02.28

초록

대량의 데이터에 있어 전반적인 특성 및 구조를 파악하는데 유용하기 때문에 다양한 분야에서 군집분석을 사용하고 있다. Dempster 등 (1977)에서 정의된 expectation-maximization(EM) 알고리즘은 가장 보편적으로 사용되는 군집분석 방법이다. 선형모형의 유한혼합물(finite mixture of linear model) 기법 또한 군집분석 방법 중 많이 사용되는 방법이며 베이지안 군집방법은 Bernardo와 Giron (1988)이 군집에 대한 가중치 확률만 모를 경우 처음 적용하였다. 우리는 이 연구에서 일반적인 선형모형의 유한혼합물이 아닌 군집특정(cluster-specific) 변량효과를 모형에 포함하여 베이지안 분석방법인 깁스표집법(Gibbs sampling)을 사용한다. 제안한 모형의 특성 및 표집법에 대하여 설명하였고 모의실험 및 실제 데이터 분석을 통하여 모형의 유용성을 파악하였다. Hurn 등 (2003)의 CO2 데이터에 모형을 적용하여 변량효과가 없는 모형, 개체특정(subject-specific) 변량효과 모형과 비교하였다.

Clustering algorithms attempt to find a partition of a finite set of objects in to a potentially predetermined number of nonempty subsets. Gibbs sampling of a normal mixture of linear mixed regressions with a Dirichlet prior distribution calculates posterior probabilities when the number of clusters was known. Our approach provides simultaneous partitioning and parameter estimation with the computation of classification probabilities. A Monte Carlo study of curve estimation results showed that the model was useful for function estimation. Examples are given to show how these models perform on real data.

키워드

참고문헌

  1. Banfield, J. D. and Raftery A. E. (1993). Model-based Gaussian and non-Gaussian clustering, Biometrics, 49, 803-821. https://doi.org/10.2307/2532201
  2. Bensmail, H., Celeux, G., Raftery, A. E., and Robert, C. P. (1997). Inference in modelbased cluster analysis, Statistics and Computing, 7, 1-10. https://doi.org/10.1023/A:1018510926151
  3. Bernardo, J. M. and Giro n, F. J. (1988). A Bayesian analysis of simple mixture problems. In Bayesian Statistics 3, Bernardo, J. M., DeGroot, M. H., Lindley, D. V., and Smith, A. F. M. (Eds), Clarendon, New York, 67-78.
  4. Cao, G. and West, M. (1996). Practical Bayesian inference using mixtures of mixtures, Biometrics, 52, 1334-1341. https://doi.org/10.2307/2532848
  5. Carlin, B. and Chib, S. (1995). Bayesian model choice via Markov chain Monte Carlo methods, Journal of the Royal Statistical Society Series B (Methodological), 57, 473-484.
  6. Dasgupta, A. and Raftery, A. E. (1998). Detecting features in spatial point processes with clutter via modelbased clustering, Journal of the American Statistical Association, 93, 294-302. https://doi.org/10.1080/01621459.1998.10474110
  7. Dellaportas, P. (1998). Bayesian classification of neolithic tools, Applied Statistics, 47, 279-297.
  8. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society Series B (Methodological), 39, 1-38.
  9. De Veaux, R. D. (1989). Mixtures of linear regressions, Journal Computational Statistics & Data Analysis, 8, 227-245. https://doi.org/10.1016/0167-9473(89)90043-1
  10. Diebolt, J. and Robert, C. (1990). Bayesian estimation of finite mixture distributions, part ii: sampling implementation, Technical Report 111, LSTA, Universite Paris VI, Paris.
  11. Diebolt, J. and Robert, C. (1994). Estimation of finite mixture distributions through Bayesian sampling, Journal of the Royal Statistical Society Series B (Methodological), 56, 363-375.
  12. Escobar, M. and West, M. (1995). Bayesian density estimation and inference using mixtures, Journal of the American Statistical Association, 90, 577-588. https://doi.org/10.1080/01621459.1995.10476550
  13. Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation, Journal of the American Statistical Association, 97, 611-631. https://doi.org/10.1198/016214502760047131
  14. Fruhwirth-Schnatter, S. (2005). Finite Mixture and Markov Switching Models, Springer Science & Business Media, New York.
  15. Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (1995). Bayesian Data Analysis (pp. 526), Chapman and Hall, Boca Raton, FL.
  16. Geoffery, M. and David, P. (2000). Finite Mixture Models, John Wiley & Sons, New York.
  17. Hurn, M., Justel, A., and Robert, C. P. (2003). Estimating mixtures of regressions, Journal of Computational and Graphical Statistics, 12, 55-79. https://doi.org/10.1198/1061860031329
  18. Kyung, M. (2015). Dirichlet process mixtures of linear mixed regressions, Communications for Statistical Applications and Methods, 22, 625-637. https://doi.org/10.5351/CSAM.2015.22.6.625
  19. McLachlan, G. J. and Basford, K. E. (1988). Mixture Models: Inference and Applications to Clustering, Marcel Dekker, New York.
  20. McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models, Wiley, New York.
  21. Mengersen, K. and Robert, C. (1996). Testing for mixtures: a Bayesian entropic approach. In Bayesian Statistics 5, Proceedings of the Fifth Valencia International Meeting, Bernardo, J. M., Berger, J. O., Dawid, A. P., and Smith, A. F. M. (Eds), Oxford University Press, Oxford, 255-276.
  22. Phillips, D. B. and Smith, A. F. M. (1996). Bayesian model comparison via jump diffusions. In Markov Chain Monte Carlo in Practice, Gilks, W. R., Richardson, S., and Spiegelhalter, D. J. (Eds), Chapman and Hall, London, 215-239.
  23. Quandt, R. E. (1958). The estimation of the parameters of a linear regression system obeying two separate regimes, Journal of the American Statistical Association, 53, 873-880. https://doi.org/10.1080/01621459.1958.10501484
  24. Quandt, R. E. and Ramsey, J. B. (1978). Estimating mixtures of normal distributions and switching regressions, Journal of the American Statistical Association, 73, 730-738. https://doi.org/10.1080/01621459.1978.10480085
  25. Raftery, A. E. (1996). Approximate Bayes factors and accounting for model uncertainty in generalized linear models, Biometrika, 83, 251-266. https://doi.org/10.1093/biomet/83.2.251
  26. Richardson, S. and Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion), Journal of the Royal Statistical Society B (Statistical Methodology), 59, 731-792. https://doi.org/10.1111/1467-9868.00095
  27. Robert, C. P. (1996). Mixtures of distributions: inference and estimation. In Markov Chain Monte Carlo in Practice, Gilks, W. R., Richardson, S., and Spiegelhalter, D. J. (Eds), Chapman and Hall, London, 441-464.
  28. Robert, C. P. and Mengersen, K. L. (1999). Reparameterization issues in mixture modelling and their bearings on MCMC algorithms, Computational Statistics and Data Analysis, 29, 325-343. https://doi.org/10.1016/S0167-9473(98)00058-9
  29. Roeder, K. and Wasserman, L. (1997). Practical density estimation using mixtures of normals, Journal of the American Statistical Association, 92, 894-902. https://doi.org/10.1080/01621459.1997.10474044
  30. Scott, A. J. and Symons, M. J. (1971). Clustering methods based on likelihood ratio criteria, Biometrics, 27, 387-389. https://doi.org/10.2307/2529003
  31. Smith, A. E. M. and Roberts, G. O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods, Journal of the Royal Statistical Society Series B (Methodological), 55, 3-23.
  32. Vounatsou, P., Smith, T., and Smith, A. F. M. (1998). Bayesian analysis of two-component mixture distributions applied to estimating malaria attributable fractions, Applied Statistics, 47, 575-587.
  33. West, M. (1992). Modelling with mixtures. In Bayesian Statistics 4, Proceedings of the Fourth Valencia International Meeting, Bernardo, J. M., Berger, J. O., Dawid, A. P., and Smith, A. F. M. (Eds), Oxford University Press, Oxford, 503-524.
  34. West, M., Muller, P., and Escobar, M. D. (1994). Hierarchical priors and mixture models with application in regression and density estimation. In Aspects of Uncertainty: A tribute to D.V. Lindley, Smith, A. F. M. and Freeman, P. (Eds), Wiley, New York, 363-386.
  35. Yu, J. Z. and Tanner, M. A. (1999). An analytical study of several Markov chain Monte Carlo estimators of the marginal likelihood, Journal of Computational and Graphical Statistics, 8, 839-853.