DOI QR코드

DOI QR Code

Compositional data analysis by the square-root transformation: Application to NBA USG% data

  • Jeseok Lee (Department of Statistics, Kyungpook National University) ;
  • Byungwon Kim (Department of Statistics, Kyungpook National University)
  • Received : 2023.10.21
  • Accepted : 2024.02.05
  • Published : 2024.05.31

Abstract

Compositional data refers to data where the sum of the values of the components is a constant, hence the sample space is defined as a simplex making it impossible to apply statistical methods developed in the usual Euclidean vector space. A natural approach to overcome this restriction is to consider an appropriate transformation which moves the sample space onto the Euclidean space, and log-ratio typed transformations, such as the additive log-ratio (ALR), the centered log-ratio (CLR) and the isometric log-ratio (ILR) transformations, have been mostly conducted. However, in scenarios with sparsity, where certain components take on exact zero values, these log-ratio type transformations may not be effective. In this work, we mainly suggest an alternative transformation, that is the square-root transformation which moves the original sample space onto the directional space. We compare the square-root transformation with the log-ratio typed transformation by the simulation study and the real data example. In the real data example, we applied both types of transformations to the USG% data obtained from NBA, and used a density based clustering method, DBSCAN (density-based spatial clustering of applications with noise), to show the result.

Keywords

Acknowledgement

This research was supported by Kyungpook National University Research Fund, 2021.

References

  1. Aitchison J (1982). The statistical analysis of compositional data, Journal of the Royal Statistical Society: Series B (Methodological), 44, 139-160.
  2. Aitchison J (1986). The Statistical Analysis of Compositional Data (Monographs on Statistics and Applied Probability), Chapman and Hall London, New York.
  3. Aitchison J, Barcelo-Vidal C, Martin-Fernandez JA, and Pawlowsky-Glahn V (2000). Logratio analysis and compositional distance, Mathematical Geology, 32, 271-275.
  4. Aitchison J (2008). The single principle of compositional data analysis, continuing fallacies, confusions and misunderstandings and some suggested remedies, In Proceedings of CoDaWork'08, The 3rd Compositional Data Analysis Workshop, Girona, Spain, Available from: http://hdl.handle.net/10256/706
  5. Buccianti A and Pawlowsky-Glahn V (2005). New perspectives on water chemistry and compositional data analysis, Mathematical Geology, 37, 703-727.
  6. Cook (1964). Percentage Baseball, Waverly Press, Brooklyn, New York City.
  7. Cust EE, Sweeting AJ, Ball K, and Robertson S (2019). Machine and deep learning for sport-specific movement recognition: A systematic review of model development and performance, Journal of Sports Sciences, 37, 568-600.
  8. Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, and Barcelo-Vidal C (2003). Isometric logratio transformations for compositional data analysis, Mathematical Geology, 35, 279-300.
  9. Ester M, Kriegel H-P, Sander J, and Xu X (1996). A density-based algorithm for discovering clusters in large spatial databases with noise, In KDD, 96, 226-231.
  10. Galletti A and Maratea A (2016). Numerical stability analysis of the centered log-ratio transformation, In Proceedings of 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Napoli, 713-716. IEEE.
  11. Godichon-Baggioni A, Maugis-Rabusseau C, and Rau A (2019). Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data, Journal of Applied Statistics, 46, 47-65.
  12. Greenacre M, Martinez-Alvaro M, and Blasco A (2021). Compositional data analysis of microbiome and any-omics datasets: A validation of the additive logratio transformation, Frontiers in Microbiology, 12, 727398.
  13. Hron K, Templ M, and Filzmoser P (2010). Imputation of missing values for compositional data using classical and robust methods, Computational Statistics & Data Analysis, 54, 3095-3107.
  14. Kucera M and Malmgren BA (1998). Logratio transformation of compositional data: A resolution of the constant sum constraint, Marine Micropaleontology, 34, 117-120.
  15. Li H (2015). Microbiome, metagenomics, and high-dimensional compositional data analysis, Annual Review of Statistics and Its Application, 2, 73-94.
  16. Mardia KV and Jupp PE (2000). Directional Statistics, Wiley Online Library.
  17. Ordonez EG, Perez MdCI, and Gonzalez CT (2016). Performance assessment in water polo using compositional data analysis, Journal of Human Kinetics, 54, 143-151.
  18. Palarea-Albaladejo J, Martin-Fernandez JA, and Soto JA (2012). Dealing with distances and transformations for fuzzy C-means clustering of compositional data, Journal of Classification, 29, 144-169.
  19. Rein R and Memmert D (2016). Big data and tactical analysis in elite soccer: Future challenges and opportunities for sports science, SpringerPlus, 5, 1-13.
  20. Scealy J and Welsh A (2011). Regression for compositional data by using distributions defined on the hypersphere, Journal of the Royal Statistical Society Series B: Statistical Methodology, 73, 351-375.
  21. Scealy J and Welsh AH (2014). Fitting Kent models to compositional data with small concentration, Statistics and Computing, 24, 165-179.
  22. Schubert E, Sander J, Ester M, Kriegel HP, and Xu X (2017). Dbscan revisited, revisited: Why and how you should (still) use dbscan, ACM Transactions on Database Systems (TODS), 42, 1-21.
  23. Shen J, Hao X, Liang Z, Liu Y, Wang W, and Shao L (2016). Real-time superpixel segmentation by dbscan clustering algorithm, IEEE Transactions on Image Processing, 25, 5933-5942.
  24. Wang Z, Shi W, Zhou W, Li X, and Yue T (2020). Comparison of additive and isometric log-ratio transformations combined with machine learning and regression kriging models for mapping soil particle size fractions, Geoderma, 365, 114214.