• Title/Summary/Keyword: compositional data

Search Result 125, Processing Time 0.027 seconds

A guideline for the statistical analysis of compositional data in immunology

  • Yoo, Jinkyung;Sun, Zequn;Greenacre, Michael;Ma, Qin;Chung, Dongjun;Kim, Young Min
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.453-469
    • /
    • 2022
  • The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the alternative approach using Dirichlet regression analysis, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.

Binary classification on compositional data

  • Joo, Jae Yun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.1
    • /
    • pp.89-97
    • /
    • 2021
  • Due to boundedness and sum constraint, compositional data are often transformed by logratio transformation and their transformed data are put into traditional binary classification or discriminant analysis. However, it may be problematic to directly apply traditional multivariate approaches to the transformed data because class distributions are not Gaussian and Bayes decision boundary are not polynomial on the transformed space. In this study, we propose to use flexible classification approaches to transformed data for compositional data classification. Empirical studies using synthetic and real examples demonstrate that flexible approaches outperform traditional multivariate classification or discriminant analysis.

Ranking subjects based on paired compositional data with application to age-related hearing loss subtyping

  • Nam, Jin Hyun;Khatiwada, Aastha;Matthews, Lois J.;Schulte, Bradley A.;Dubno, Judy R.;Chung, Dongjun
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.2
    • /
    • pp.225-239
    • /
    • 2020
  • Analysis approaches for single compositional data are well established; however, effective analysis strategies for paired compositional data remain to be investigated. The current project was motivated by studies of age-related hearing loss (presbyacusis), where subjects are classified into four audiometric phenotypes that need to be ranked within these phenotypes based on their paired compositional data. We address this challenge by formulating this problem as a classification problem and integrating a penalized multinomial logistic regression model with compositional data analysis approaches. We utilize Elastic Net for a penalty function, while considering average, absolute difference, and perturbation operators for compositional data. We applied the proposed approach to the presbyacusis study of 532 subjects with probabilities that each ear of a subject belongs to each of four presbyacusis subtypes. We further investigated the ranking of presbyacusis subjects using the proposed approach based on previous literature. The data analysis results indicate that the proposed approach is effective for ranking subjects based on paired compositional data.

Comparison of Methods for Reducing the Dimension of Compositional Data with Zero Values

  • Song, Taeg-Youn;Choi, Byung-Jin
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.4
    • /
    • pp.559-569
    • /
    • 2012
  • Compositional data consist of compositions that are non-negative vectors of proportions with the unit-sum constraint. In disciplines such as petrology and archaeometry, it is fundamental to statistically analyze this type of data. Aitchison (1983) introduced a log-contrast principal component analysis that involves logratio transformed data, as a dimension-reduction technique to understand and interpret the structure of compositional data. However, the analysis is not usable when zero values are present in the data. In this paper, we introduce 4 possible methods to reduce the dimension of compositional data with zero values. Two real data sets are analyzed using the methods and the obtained results are compared.

Statistical analysis of metagenomics data

  • Calle, M. Luz
    • Genomics & Informatics
    • /
    • v.17 no.1
    • /
    • pp.6.1-6.9
    • /
    • 2019
  • Understanding the role of the microbiome in human health and how it can be modulated is becoming increasingly relevant for preventive medicine and for the medical management of chronic diseases. The development of high-throughput sequencing technologies has boosted microbiome research through the study of microbial genomes and allowing a more precise quantification of microbiome abundances and function. Microbiome data analysis is challenging because it involves high-dimensional structured multivariate sparse data and because of its compositional nature. In this review we outline some of the procedures that are most commonly used for microbiome analysis and that are implemented in R packages. We place particular emphasis on the compositional structure of microbiome data. We describe the principles of compositional data analysis and distinguish between standard methods and those that fit into compositional data analysis.

Principal Component Analysis of Compositional Data using Box-Cox Contrast Transformation (Box-Cox 대비변환을 이용한 구성비율자료의 주성분분석)

  • 최병진;김기영
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.137-148
    • /
    • 2001
  • Compositional data found in many practical applications consist of non-negative vectors of proportions with the constraint which the sum of the elements of each vector is unity. It is well-known that the statistical analysis of compositional data suffers from the unit-sum constraint. Moreover, the non-linear pattern frequently displayed by the data does not facilitate the application of the linear multivariate techniques such as principal component analysis. In this paper we develop new type of principal component analysis for compositional data using Box-Cox contrast transformation. Numerical illustrations are provided for comparative purpose.

  • PDF

Geostatistical Simulation of Compositional Data Using Multiple Data Transformations (다중 자료 변환을 이용한 구성 자료의 지구통계학적 시뮬레이션)

  • Park, No-Wook
    • Journal of the Korean earth science society
    • /
    • v.35 no.1
    • /
    • pp.69-87
    • /
    • 2014
  • This paper suggests a conditional simulation framework based on multiple data transformations for geostatistical simulation of compositional data. First, log-ratio transformation is applied to original compositional data in order to apply conventional statistical methodologies. As for the next transformations that follow, minimum/maximum autocorrelation factors (MAF) and indicator transformations are sequentially applied. MAF transformation is applied to generate independent new variables and as a result, an independent simulation of individual variables can be applied. Indicator transformation is also applied to non-parametric conditional cumulative distribution function modeling of variables that do not follow multi-Gaussian random function models. Finally, inverse transformations are applied in the reverse order of those transformations that are applied. A case study with surface sediment compositions in tidal flats is carried out to illustrate the applicability of the presented simulation framework. All simulation results satisfied the constraints of compositional data and reproduced well the statistical characteristics of the sample data. Through surface sediment classification based on multiple simulation results of compositions, the probabilistic evaluation of classification results was possible, an evaluation unavailable in a conventional kriging approach. Therefore, it is expected that the presented simulation framework can be effectively applied to geostatistical simulation of various compositional data.

Automatic Extraction Method of Compositional Herb Using Herb List (본초 목록을 이용한 방제의 본초 구성 자동 추출 방법)

  • Kim, Ki-Wook;Kim, Thae-Yul;Lee, Byung-Wook
    • Journal of Korean Medical classics
    • /
    • v.27 no.3
    • /
    • pp.155-166
    • /
    • 2014
  • Objectives : Many keyboarding staff are required for compositional herb data entry from ancient documents. It's the most difficult thing to build DB. So, we have studied to solve this problem. Methods : We have used Windows 7 and Access2013 for automatic extraction compositional herb in the Taipinghuiminhejijufang(太平惠民和劑局方). First of all, we have made synonyms list of herbs and list of herb volumes notation. And than we have analyzied order of arrangements between names of herb and notations of herb volume. Results : We have analyzied a part of prescriptions in the Taipinghuiminhejijufang(太平惠民和劑局方) (519 prescriptions). After 91 seconds, we can complete automatic extraction of compositional herb from 519 prescriptions. Conclusions : If we can analyzie order of arrangements between names of herb and notations of herb volume, we can automatically extract compositional herb of prescription from ancient documents.

Compositional and Contextual Factors Related to Area Differentials in Suicide (지역의 자살률 차이와 관련된 구성적 요인과 상황적 요인)

  • Kang, Eunjeong
    • Korean Journal of Health Education and Promotion
    • /
    • v.30 no.1
    • /
    • pp.41-52
    • /
    • 2013
  • Objectives: Rural-urban differences in suicide have been observed in many settings. However, there has been little research addressing what factors can explain these differences. The purpose of this study was to analyze which compositional factors and contextual factors in local areas might be related to local suicide. Methods: The study design was cross-sectional. The data for 251 primary local governments on their age-standardized suicide mortality and their predefined indicators of compositional factors and contextual factors were obtained from Korean Statistical Information Service as of year 2010. Bivariate analysis including one-way ANOVA and chi-square test were used to identify the differences in local features by area type. Seven poisson regression models for each of total, males, and females were used to analyze which compositional and contextual factors were related to suicide. Results: There were differences in suicide between gu and goon in total, male, and female groups. For total, compositional factors including divorce and smoking rate, and contextual factors including financial independency, water and waterwaste coverage, and number of wastewater discharge factories were found to explain the urban-rural differences. Conclusions: This study provided some evidence that contextual factors at the local level as well as compositional factors are useful for predicting local suicide mortality.

A study on the shoulder composition methods of power shoulder jackets and corresponding details (파워숄더 재킷의 어깨 구성 방법과 디테일 대응 분석)

  • Park, Jeongah;Lee, Jeongran
    • The Research Journal of the Costume Culture
    • /
    • v.29 no.3
    • /
    • pp.388-405
    • /
    • 2021
  • This study classifies the compositional methods of power shoulder jackets from 1980 to the present. It analyzes the relevance of jacket details according to how the power shoulder changes and its compositional methods by era. The research subdivides shoulder compositional techniques into seven, based on shoulder variation, sleeve variation, and the body and sleeve combination. The researcher investigates the frequency and trends of composing shoulders and analyzes details pertaining to the silhouette, jacket length, collar shape, and front closure. The most common method of shoulder composition is an angular shoulder variation. The others are a rounded shoulder variation, puffed sleeve, sleeve variation using pattern incision, raglan and kimono sleeves, and a shoulder variation that expanded the angle and width. The frequency differs slightly for each era. The relationship between shoulder compositional methods and details of power shoulder jackets is statistically significant, showing period-related differences. The homogeneity analysis results reveal that the shoulder composition of power shoulder jackets, the times, and details fall into distinct groups. This analysis shows that the silhouette, length, collar, and front closure of the power shoulder jacket differ depending on the power shoulder's compositional methods. Moreover, the shape of the power shoulder jacket is distinctly different. One can use this data to help develop the power shoulder jacket design by reflecting the details of shoulder compositional methods and changing trends over time.