DOI QR코드

DOI QR Code

A guideline for the statistical analysis of compositional data in immunology

  • Yoo, Jinkyung (Department of Statistics, Kyungpook National University) ;
  • Sun, Zequn (Department of Preventive Medicine - Biostatistics, Northwestern University) ;
  • Greenacre, Michael (Department of Economics and Business, Universitat Pompeu Fabra, and Barcelona School of Management) ;
  • Ma, Qin (Department of Biomedical Informatics, The Ohio State University) ;
  • Chung, Dongjun (Department of Biomedical Informatics, The Ohio State University) ;
  • Kim, Young Min (Department of Statistics, Kyungpook National University)
  • Received : 2021.12.28
  • Accepted : 2022.04.28
  • Published : 2022.07.31

Abstract

The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the alternative approach using Dirichlet regression analysis, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.

Keywords

Acknowledgement

This work was supported by the National Institutes of Health (grant numbers R01-GM122078, R21-CA209848, U01-DA045300) awarded to Dongjun Chung, and the Human Resources Program in Energy Technology of the Korea Institute of Energy Technology Evaluation and Planning(KETEP) granted financial resource from the Ministry of Trade, Industry & Energy, Republic of Korea (No. 20204010600060) awarded to Young Min Kim. The funders had no role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript.

References

  1. Aitchison J (1982). The statistical analysis of compositional data, Journal of the Royal Statistical Society: Series B (Methodological), 44, 139-160. https://doi.org/10.1111/j.2517-6161.1982.tb01195.x
  2. Aitchison J (1986). Logratio analysis of composition, In The Statistical Analysis of Compositional Data (pp. 141-183), London: Champman & Hall.
  3. Aitchison J and Greenacre M (2002). Biplots of compositional data, Journal of the Royal Statistical Society: Series C (Applied Statistics), 51, 375-392. https://doi.org/10.1111/1467-9876.00275
  4. Camargo AP, Stern JM, and Lauretto MS (2012). Estimation and model selection in Dirichlet regression, AIP Conference Proceedings 31st, 1443, 206-213.
  5. Campbell G and Mosimann J (1987). Multivariate methods for proportional shape, ASA Proceedings of the Section on Statistical Graphics, 1, 10-17.
  6. Coenders G and Pawlowsky-Glahn V RD (2020). On interpretations of tests and effect sizes in regression models with a compositional predictor, SORT - Statistics and Operations Research Transactions, 44, 200-220.
  7. Cook RD (1986). Assessment of local influence, Journal of the Royal Statistical Society: Series B (Methodological), 48, 133-155. https://doi.org/10.1111/j.2517-6161.1986.tb01398.x
  8. Curran T, Sun Z, Gerry B, et al. (2021). Differential immune signatures in the tumor microenvironment are associated with colon cancer racial disparities, Cancer Medicine, 10, 1805-1814. https://doi.org/10.1002/cam4.3753
  9. Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, and Barcelo-Vidal C (2003). Isometric logratio transformations for compositional data analysis, Mathematical Geology, 35, 279-300. https://doi.org/10.1023/A:1023818214614
  10. Filzmoser P, Hron K, and Templ M (2018). Applied Compositional Data Analysis, Cham: Springer.
  11. Gower JC and Dijksterhuis GB (2004). Procrustes problems, Oxford, New York: Oxford University Press.
  12. Graeve M and Greenacre M (2020). The selection and analysis of fatty acid ratios: a new approach for the univariate and multivariate analysis of fatty acid trophic markers in marine organisms, Limnology and Oceanography: Methods, 18, 196-210. https://doi.org/10.1002/lom3.10360
  13. Greenacre M (2010). Log-ratio analysis is a limiting case of correspondence analysis, Mathematical Geosciences, 42, 129-134. https://doi.org/10.1007/s11004-008-9212-2
  14. Greenacre M (2016). Data reporting and visualization in ecology, Polar Biology, 39, 2189-2205. https://doi.org/10.1007/s00300-016-2047-2
  15. Greenacre M (2018). Compositional Data Analysis in Practice, Chapman and Hall/CRC.
  16. Greenacre M (2019). Variable selection in compositional data analysis using pairwise logratios, Mathematical Geosciences, 51, 649-682. https://doi.org/10.1007/s11004-018-9754-x
  17. Greenacre M, Grunsky E, and Bacon-Shone J (2020). A comparison of isometric and amalgamation logratio balances in compositional data analysis, Computers & Geosciences, 148, 104621.
  18. Greenacre M, Martinez-Alvaro M, and Blasco A (2021). Compositional data analysis of microbiome and any-Omics datasets: A validation of the additive logratio transformation, Frontiers in Microbiology, 2625.
  19. Greenacre M (2021). Compositional data analysis, Annual Review of Statistics and its Application, 8, 271-299. https://doi.org/10.1146/annurev-statistics-042720-124436
  20. Greenacre M (2022). Compositional data analysis - linear algebra, visualization and interpretation, In A. Bekker et al. (Eds), Innovations in Multivariate Statistical Modelling: Navigating Theoretical and Multidisciplinary Domains, in press, Springer.
  21. Gueorguieva R, Rosenheck R, and Zelterman D (2008). Dirichlet component regression and its applications to psychiatric data, Computational Statistics & Data Analysis, 52, 5344-5355. https://doi.org/10.1016/j.csda.2008.05.030
  22. Hijazi RH and Jernigan RW (2009). Modeling compositional data using Dirichlet regression models, Journal of Applied Probability & Statistics, 4, 77-91.
  23. Hron K, Templ M, and Filzmoser P (2010). Imputation of missing values for compositional data using classical and robust methods, Computational Statistics & Data Analysis, 54, 3095-3107. https://doi.org/10.1016/j.csda.2009.11.023
  24. King Thomas J, Mir H, Kapur N, and Singh S (2019). Racial differences in immunological landscape modifiers contributing to disparity in prostate cancer, Cancers, 11, 1857.
  25. Legendre P and Legendre L (2012). Numerical Ecology, Elsevier Science.
  26. Lubbe S, Filzmoser P, and Templ M (2021). Comparison of zero replacement strategies for compositional data with large numbers of zeros, Chemometrics and Intelligent Laboratory Systems, 210, 104248.
  27. Maier MJ (2014). DirichletReg: Dirichlet regression for compositional data in R, Research Report Series / Department of Statistics and Mathematics, 125, WU Vienna University of Economics and Business, Vienna.
  28. Melo TFN, Vasconcellos KLP, and Lemonte AJ (2009). Some restriction tests in a new class of regression models for proportions, Computational Statistics & Data Analysis, 53, 3972-3979. https://doi.org/10.1016/j.csda.2009.06.005
  29. Newman AM, Liu CL, Green MR, et al. (2015). Robust enumeration of cell subsets from tissue expression profiles, Nature Methods, 12, 453-457. https://doi.org/10.1038/nmeth.3337
  30. Newman AM, Steen CB, Liu CL, et al. (2019). Determining cell type abundance and expression from bulk tissues with digital cytometry, Nature Biotechnology, 37, 773-782. https://doi.org/10.1038/s41587-019-0114-2
  31. Pillai KCS (1955). Some new test criteria in multivariate analysis, The Annals of Mathematical Statistics, 26, 117-121. https://doi.org/10.1214/aoms/1177728599
  32. Templ M, Hron K, and Filzmoser P (2011). robCompositions: An R-package for robust statistical analysis of compositional data, Ch. 25. In Pawlowsky-Glahn V, Buccianti A (Eds) Compositional Data Analysis: Theory and Applications (pp. 341-355), Chichester, UK: John Wiley & Sons, Ltd.
  33. Thorsson V, Gibbs DL, Brown SD, et al. (2018). The immune landscape of cancer, Immunity, 48, 812-830. https://doi.org/10.1016/j.immuni.2018.03.023
  34. Van den Boogaart KG and Tolosana-Delgado R (2008). "compositions": a unified R package to analyze compositional data, Computers & Geosciences, 34, 320-338. https://doi.org/10.1016/j.cageo.2006.11.017
  35. Van den Boogaart KG and Tolosana-Delgado R (2013). Analyzing Compositional Data with R, Springer.
  36. Zelterman D and Chen CF (1988). Homogeneity tests against central-mixture alternatives, Journal of the American Statistical Association, 83, 179-182. https://doi.org/10.1080/01621459.1988.10478585