DOI QR코드

DOI QR Code

Ordering Variables and Categories on the Mosaic Plot

모자이크 플롯에서 변수와 범주의 순서화

  • Published : 2008.10.31

Abstract

Mosaic plots, proposed by Hartigan and Kleiner (1981, 1984), are very useful in visualizing categorical data. In mosaic plot, multi-way classified cell frequencies are represented by rectangles with proportional area. The plot is easy to understand while preserving the information contained in the data. Plot's appearance, however, does change substantially depending on the order of variables and the orders of categories with variable put into the plot. In this study, we propose the algorithms for ordering variables and categories of the categorical data to be explored via mosaic plots. We demonstrate our methods to three well-known datasets: Titanic, Housing and PreSex.

Hartigan과 Kleiner (1981, 1984)에 의해 제안된 모자이크 플롯은 범주형 자료의 탐색에 매우 유용한 시각화 도구이다. 모자이크 플롯은 범주 셀의 빈도를 사각형의 기에 비례하게 나타내므로 이해가 쉽고 데이터에 포함된 정보를 유지하지만 실제 모습은 변수 순서와 변수 내 범주의 순서에 따라 상당히 달라진다. 이에 우리는 본 연구에서 모자이크 플롯에서 크래머(Cramer)의 V 계수를 활용한 변수의 순서화 방법과 감마 계수를 활용한 범주의 순서화 방법을 제안하고 Titanic, Housing, PreSex 등 공개 자료에 적용한 결과를 제시한다.

Keywords

References

  1. Bickel, P. J., Hammel, E. A. and O'Connell, J. W. (1975). Sex bias in graduate admissions: Data from Berkeley, Science, 187, 398-403 https://doi.org/10.1126/science.187.4175.398
  2. Cramer, H. (1946). Mathematical Methods of Statistics, Princeton University Press, Princeton
  3. Friendly, M. (1994). Mosaic displays for multi-way contingency tables, Journal of the American Statistical Association, 89, 190-200 https://doi.org/10.2307/2291215
  4. Garson, G. D. (2008). Nominal association: Phi, contingency coefficient, Tschuprow's T, Cramer's V, lambda, uncertainty coefficient, Statnotes: Topics in Multivariate Analysis, Retrieved from http://www2.chass.ncsu.edu/garson/pa765/statnote.htm 06/25/2008
  5. Goodman, L. A. and Kruskal, W. H. (1979). Measures of Association for Cross Classifications, SpringerVerlag, New York
  6. Greenacre, M. J. (1984). Theory and Applications of Correspondence Analysis, Academic Press, London
  7. Hartigan, J. A. and Kleiner, B. (1981). Mosaics for contingency tables, In Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface, (ed. by W.F. Eddy). New York: Springer-Verlag, 268-273
  8. Hartigan, J. A. and Kleiner, B. (1984). A mosaic of television ratings, The American Statistician, 38, 32-35 https://doi.org/10.2307/2683556
  9. Huh, M. Y. (2004). Line mosaic plot: Algorithm and implementation, COMPSTAT, 2004 Symposium, Physica-Verlag/Springer
  10. Hurley, C. B. (2004). Clustering visualizations of multidimensional data, Journal of Computational & Graphical Statistics, 13, 788-806 https://doi.org/10.1198/106186004X12425
  11. Madsen, M. (1976). Statistical analysis of multiple contingency tables: Two examples, Scandinavian Journal of Statistics, 3, 97-106
  12. Thomes, B. and Collard, J. (1979). Who Divorces?, Routledge and Kegan, London
  13. van der Heijden, P. G. M. and de Leeuw, J. (1985). Correspondence analysis used complementary to log-linear analysis, Psychometrika, 50, 429-447 https://doi.org/10.1007/BF02296262