DOI QR코드

DOI QR Code

Multivariate empirical distribution plot and goodness-of-fit test

다변량 경험분포그림과 적합도 검정

  • Hong, Chong Sun (Department of Statistics, Sungkyunkwan University) ;
  • Park, Yongho (Department of Statistics, Sungkyunkwan University) ;
  • Park, Jun (Department of Statistics, Sungkyunkwan University)
  • 홍종선 (성균관대학교 통계학과) ;
  • 박용호 (성균관대학교 통계학과) ;
  • 박준 (성균관대학교 통계학과)
  • Received : 2017.06.14
  • Accepted : 2017.07.26
  • Published : 2017.08.31

Abstract

The multivariate empirical distribution function could be defined when its distribution function can be estimated. It is known that bivariate empirical distribution functions could be visualized by using Step plot and Quantile plot. In this paper, the multivariate empirical distribution plot is proposed to represent the multivariate empirical distribution function on the unit square. Based on many kinds of empirical distribution plots corresponding to various multivariate normal distributions and other specific distributions, it is found that the empirical distribution plot also depends sensitively on its distribution function and correlation coefficients. Hence, we could suggest five goodness-of-fit test statistics. These critical values are obtained by Monte Carlo simulation. We explore that these critical values are not much different from those in text books. Therefore, we may conclude that the proposed test statistics in this work would be used with known critical values with ease.

다변량 자료의 분포함수를 알고 있거나 추정할 수 있으면 다변량 경험분포함수를 정의할 수 있다. 이변량인 경우에는 계단그림과 분위그림을 사용하여 경험분포함수를 시각화할 수 있는데, 본 연구에서는 다변량인 경우에 경험분포함수를 정사각형에 표현할 수 있는 다변량 경험분포그림을 제안하였다. 여러 종류의 다변량 정규분포와 특정한 분포에 대하여 경험분포그림을 작성하고 특징을 살펴보니, 다양한 분산공분산행렬을 포함된 분포함수에 따라 경험분포그림이 민감하게 반응하는 것을 탐색하였다. 이를 바탕으로 경험분포함수를 구할 때 가정한 다변량 분포함수의 적합도 검정방법을 제안하였다. 대표적인 다섯 종류의 적합도 검정방법을 사용하고, 다양한 분포함수들에 대하여 각각의 검정통계량 기각역을 구하였다. 본 연구에서 얻은 기각역은 문헌에서 구할 수 있는 기각역과 큰 차이가 없음을 발견하였다. 그러므로 본 연구에서 제안한 적합도 검정방법을 문헌에서 제시한 기각역으로 쉽게 사용할 수 있는 장점이 있다.

Keywords

References

  1. Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain "goodness of fit" criteria based on stochastic processes, The Annals of Mathematical Statistics, 23, 193-212. https://doi.org/10.1214/aoms/1177729437
  2. Anderson, T. W. and Darling, D. A. (1954). A test of goodness of fit, Journal of the American Statistical Association, 49, 765-769. https://doi.org/10.1080/01621459.1954.10501232
  3. Anderson, T. W. (1962). On the distribution of the two-sample Cramer-von Mises criterion, The Annals of Mathematical Statistics, 33, 1148-1159. https://doi.org/10.1214/aoms/1177704477
  4. D'Agostino, R. B. and Stephens, M. A. (1986). Goodness-of-fit techniques, Statistics, a Series of Textbooks and Monographs, 68, Marcel Dekker Inc., New York.
  5. Gnanadesikan, R. and Kettenring, J. R. (1972). Robust estimates, residuals, and outlier detection with multiresponse data, Biometrics, 28, 81-124. https://doi.org/10.2307/2528963
  6. Gnanadesikan, R., Kettenring, J. R., and Landwehr, J. M. (1977). Interpreting and assessing the results of cluster analyses, Bulletin of the International Statistical Institute, 47, 451-463.
  7. Hong, C. S., Park, J., and Park, Y. H. (2017). Multivariate empirical distribution functions and descriptive methods, Journal of the Korean Data & Information Science Society, 28, 87-98. https://doi.org/10.7465/jkdi.2017.28.1.87
  8. Justel, A., Pena, D., and Zamar, R. (1997). A multivariate Kolmogorov-Smirnov test of goodness of fit, Statistics & Probability Letters, 35, 251-259. https://doi.org/10.1016/S0167-7152(97)00020-5
  9. Kim, N. H. (2004). An approximate Shapiro-Wilk statistic for testing multivariate normality, The Korean Journal of Applied Statistics, 17, 35-47. https://doi.org/10.5351/KJAS.2004.17.1.035
  10. Kim, N. H. (2005). The limit distribution of an invariant test statistic for multivariate normality, Communications for Statistical Applications and Methods, 12, 71-86. https://doi.org/10.5351/CKSS.2005.12.1.071
  11. Kim, N. H. (2006). Testing multivariate normality based on EDF statistics, The Korean Journal of Applied Statistics, 19, 241-256. https://doi.org/10.5351/KJAS.2006.19.2.241
  12. Kolmogorov, A. N. (1933). Sulla determinazione empirica di una legge di distribuzione, Giornale dell'Instuto Italiano degli Attuari, 4, 83-91.
  13. Koziol, J. A. (1982). A class of invariant procedures for assessing multivariate normality, Biometrika, 69, 423-427. https://doi.org/10.1093/biomet/69.2.423
  14. Kuiper, N. H. (1960). Tests concerning random points on a circle. In Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen, Series A, 63, 38-47.
  15. Malkovich, J. F. and A, A. A. (1973). On tests for multivariate normality, Journal of the American Statistical Association, 68, 176-179. https://doi.org/10.1080/01621459.1973.10481358
  16. Meintanis, S. G. and Hlavka, Z. (2010). Goodness-of-fit tests for bivariate and multivariate skew-normal distributions, Scandinavian Journal of Statistics, 37, 701-714. https://doi.org/10.1111/j.1467-9469.2009.00687.x
  17. Moore, D. S. and Stubblebine, J. B. (1981). Chi-square tests for multivariate normality with application to common stock prices, Communications in Statistics-Theory and Methods, 10, 713-738. https://doi.org/10.1080/03610928108828070
  18. Rosenblatt, M. (1952). Remarks on a multivariate transformation, The Annals of Mathematical Statistics, 23, 470-472. https://doi.org/10.1214/aoms/1177729394
  19. Roy, S. N. (1953). On a heuristic method of test construction and its use in multivariate analysis, The Annals of Mathematical Statistics, 24, 220-238. https://doi.org/10.1214/aoms/1177729029
  20. Royston, J. P. (1983). Some techniques for assessing multivariate normality based on the Shapiro-Wilk W, Journal of the Royal Statistical Society. Series C (Applied Statistics), 32, 121-133.
  21. Singh, A. (1993). Omnibus robust procedures for assessment of multivariate normality and detection of multivariate outliers, Multivariate environmental statistics, North-Holland, Amsterdam, 445-488.
  22. Smirnov, N. V. (1933). Estimate of deviation between empirical distribution functions in two independent samples, Bulletin Moscow University, 2, 3-16.
  23. Stephens, M. A. (1965). The goodness-of-fit statistic $V_n$: Distribution and significance points, Biometrika, 52, 309-321.
  24. Thode, H. C. (2002). Testing for Normality, Marcel Dekker Inc., New York, 164.
  25. Watson, G. S. (1961). Goodness-of-fit tests on a circle, Biometrika, 48, 109-114. https://doi.org/10.1093/biomet/48.1-2.109
  26. Zhu, L. X., Fang, K. T., and Bhatti, M. I. (1997). On estimated projection pursuit-type cramer-von mises statistics, Journal of Multivariate Analysis, 63, 1-14. https://doi.org/10.1006/jmva.1997.1673