• 제목/요약/키워드: compositional data

검색결과 129건 처리시간 0.02초

A guideline for the statistical analysis of compositional data in immunology

  • Yoo, Jinkyung;Sun, Zequn;Greenacre, Michael;Ma, Qin;Chung, Dongjun;Kim, Young Min
    • Communications for Statistical Applications and Methods
    • /
    • 제29권4호
    • /
    • pp.453-469
    • /
    • 2022
  • The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the alternative approach using Dirichlet regression analysis, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.

Binary classification on compositional data

  • Joo, Jae Yun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제28권1호
    • /
    • pp.89-97
    • /
    • 2021
  • Due to boundedness and sum constraint, compositional data are often transformed by logratio transformation and their transformed data are put into traditional binary classification or discriminant analysis. However, it may be problematic to directly apply traditional multivariate approaches to the transformed data because class distributions are not Gaussian and Bayes decision boundary are not polynomial on the transformed space. In this study, we propose to use flexible classification approaches to transformed data for compositional data classification. Empirical studies using synthetic and real examples demonstrate that flexible approaches outperform traditional multivariate classification or discriminant analysis.

Ranking subjects based on paired compositional data with application to age-related hearing loss subtyping

  • Nam, Jin Hyun;Khatiwada, Aastha;Matthews, Lois J.;Schulte, Bradley A.;Dubno, Judy R.;Chung, Dongjun
    • Communications for Statistical Applications and Methods
    • /
    • 제27권2호
    • /
    • pp.225-239
    • /
    • 2020
  • Analysis approaches for single compositional data are well established; however, effective analysis strategies for paired compositional data remain to be investigated. The current project was motivated by studies of age-related hearing loss (presbyacusis), where subjects are classified into four audiometric phenotypes that need to be ranked within these phenotypes based on their paired compositional data. We address this challenge by formulating this problem as a classification problem and integrating a penalized multinomial logistic regression model with compositional data analysis approaches. We utilize Elastic Net for a penalty function, while considering average, absolute difference, and perturbation operators for compositional data. We applied the proposed approach to the presbyacusis study of 532 subjects with probabilities that each ear of a subject belongs to each of four presbyacusis subtypes. We further investigated the ranking of presbyacusis subjects using the proposed approach based on previous literature. The data analysis results indicate that the proposed approach is effective for ranking subjects based on paired compositional data.

Comparison of Methods for Reducing the Dimension of Compositional Data with Zero Values

  • Song, Taeg-Youn;Choi, Byung-Jin
    • Communications for Statistical Applications and Methods
    • /
    • 제19권4호
    • /
    • pp.559-569
    • /
    • 2012
  • Compositional data consist of compositions that are non-negative vectors of proportions with the unit-sum constraint. In disciplines such as petrology and archaeometry, it is fundamental to statistically analyze this type of data. Aitchison (1983) introduced a log-contrast principal component analysis that involves logratio transformed data, as a dimension-reduction technique to understand and interpret the structure of compositional data. However, the analysis is not usable when zero values are present in the data. In this paper, we introduce 4 possible methods to reduce the dimension of compositional data with zero values. Two real data sets are analyzed using the methods and the obtained results are compared.

Statistical analysis of metagenomics data

  • Calle, M. Luz
    • Genomics & Informatics
    • /
    • 제17권1호
    • /
    • pp.6.1-6.9
    • /
    • 2019
  • Understanding the role of the microbiome in human health and how it can be modulated is becoming increasingly relevant for preventive medicine and for the medical management of chronic diseases. The development of high-throughput sequencing technologies has boosted microbiome research through the study of microbial genomes and allowing a more precise quantification of microbiome abundances and function. Microbiome data analysis is challenging because it involves high-dimensional structured multivariate sparse data and because of its compositional nature. In this review we outline some of the procedures that are most commonly used for microbiome analysis and that are implemented in R packages. We place particular emphasis on the compositional structure of microbiome data. We describe the principles of compositional data analysis and distinguish between standard methods and those that fit into compositional data analysis.

Box-Cox 대비변환을 이용한 구성비율자료의 주성분분석 (Principal Component Analysis of Compositional Data using Box-Cox Contrast Transformation)

  • 최병진;김기영
    • 응용통계연구
    • /
    • 제14권1호
    • /
    • pp.137-148
    • /
    • 2001
  • 비율을 나타내는 요소들로 이루어진 구성비율자료는 각 행들의 합이 1이 되는 제약을 가지고 있어 통계적으로 다루기가 쉽지 않다. 더구나 자료의 구조가 선형적인 형태를 보이지 않는 특성을 가지기 때문에 주성분분석과 같은 선형적인 다변량기법들을 구성비율자료에 적용을 할 때 잘못된 해석과 추론이 이루어질 가능성이 있다. 본 논문에서는 구성비율자료의 주성분분석에서 기존의 방법들이 가지는 문제점을 해결하기 위해 Box-Cox 대비변환(Box-Cox contrast transformation)을 이용한 새로운 형태의 분석방법을 제시한다. 그리고 실제자료의 분석과 모의실험을 통해서 Aitchison(1983)이 제시한 방법과 수행능력을 비교하고자 한다.

  • PDF

다중 자료 변환을 이용한 구성 자료의 지구통계학적 시뮬레이션 (Geostatistical Simulation of Compositional Data Using Multiple Data Transformations)

  • 박노욱
    • 한국지구과학회지
    • /
    • 제35권1호
    • /
    • pp.69-87
    • /
    • 2014
  • 이 논문에서는 구성 자료의 지구통계학적 시뮬레이션을 위해 다중 자료 변환 기반 조건부 시뮬레이션 틀을 제안하였다. 우선 일반적인 통계 기법의 적용이 가능하도록 구성 자료에 로그비 변환을 적용하였다. 다음 변환들로는 최소/최대 자기상관 인자 변환과 지시자 변환을 순차적으로 적용하였다. 독립적인 새로운 변수의 생성을 위해 최소/최대 자기상관 인자 변환을 적용하였으며, 적용 결과 개별 변수들의 독립적인 시뮬레이션이 가능해진다. 그리고 다중 가우시안 확률 모델을 따르지 않는 변수들의 비모수적 조건부 누적 확률 분포 모델링을 위해 지시자 변환을 적용하였다. 최종적으로는 적용한 변환 방법들의 역순으로 역 변환을 적용하였다. 간석지 표층 퇴적물 성분 자료를 대상으로 제안 시뮬레이션 기법의 적용 가능성을 예시하였다. 모든 시뮬레이션 결과들은 구성 자료의 제한 조건을 만족하면서 샘플 자료의 통계 특성을 잘 반영하였다. 구성 자료의 다수의 시뮬레이션 결과들을 이용한 표층 퇴적물 분류를 통해 기존 크리깅에서는 얻을 수 없는 분류 결과의 확률론적 평가가 가능하였다. 따라서 제안 시뮬레이션 틀은 다양한 구성 자료의 지구통계학적 시뮬레이션에 효과적으로 이용될 수 있을 것으로 기대된다.

본초 목록을 이용한 방제의 본초 구성 자동 추출 방법 (Automatic Extraction Method of Compositional Herb Using Herb List)

  • 김기욱;김태열;이병욱
    • 대한한의학원전학회지
    • /
    • 제27권3호
    • /
    • pp.155-166
    • /
    • 2014
  • Objectives : Many keyboarding staff are required for compositional herb data entry from ancient documents. It's the most difficult thing to build DB. So, we have studied to solve this problem. Methods : We have used Windows 7 and Access2013 for automatic extraction compositional herb in the Taipinghuiminhejijufang(太平惠民和劑局方). First of all, we have made synonyms list of herbs and list of herb volumes notation. And than we have analyzied order of arrangements between names of herb and notations of herb volume. Results : We have analyzied a part of prescriptions in the Taipinghuiminhejijufang(太平惠民和劑局方) (519 prescriptions). After 91 seconds, we can complete automatic extraction of compositional herb from 519 prescriptions. Conclusions : If we can analyzie order of arrangements between names of herb and notations of herb volume, we can automatically extract compositional herb of prescription from ancient documents.

지역의 자살률 차이와 관련된 구성적 요인과 상황적 요인 (Compositional and Contextual Factors Related to Area Differentials in Suicide)

  • 강은정
    • 보건교육건강증진학회지
    • /
    • 제30권1호
    • /
    • pp.41-52
    • /
    • 2013
  • Objectives: Rural-urban differences in suicide have been observed in many settings. However, there has been little research addressing what factors can explain these differences. The purpose of this study was to analyze which compositional factors and contextual factors in local areas might be related to local suicide. Methods: The study design was cross-sectional. The data for 251 primary local governments on their age-standardized suicide mortality and their predefined indicators of compositional factors and contextual factors were obtained from Korean Statistical Information Service as of year 2010. Bivariate analysis including one-way ANOVA and chi-square test were used to identify the differences in local features by area type. Seven poisson regression models for each of total, males, and females were used to analyze which compositional and contextual factors were related to suicide. Results: There were differences in suicide between gu and goon in total, male, and female groups. For total, compositional factors including divorce and smoking rate, and contextual factors including financial independency, water and waterwaste coverage, and number of wastewater discharge factories were found to explain the urban-rural differences. Conclusions: This study provided some evidence that contextual factors at the local level as well as compositional factors are useful for predicting local suicide mortality.

파워숄더 재킷의 어깨 구성 방법과 디테일 대응 분석 (A study on the shoulder composition methods of power shoulder jackets and corresponding details)

  • 박정아;이정란
    • 복식문화연구
    • /
    • 제29권3호
    • /
    • pp.388-405
    • /
    • 2021
  • This study classifies the compositional methods of power shoulder jackets from 1980 to the present. It analyzes the relevance of jacket details according to how the power shoulder changes and its compositional methods by era. The research subdivides shoulder compositional techniques into seven, based on shoulder variation, sleeve variation, and the body and sleeve combination. The researcher investigates the frequency and trends of composing shoulders and analyzes details pertaining to the silhouette, jacket length, collar shape, and front closure. The most common method of shoulder composition is an angular shoulder variation. The others are a rounded shoulder variation, puffed sleeve, sleeve variation using pattern incision, raglan and kimono sleeves, and a shoulder variation that expanded the angle and width. The frequency differs slightly for each era. The relationship between shoulder compositional methods and details of power shoulder jackets is statistically significant, showing period-related differences. The homogeneity analysis results reveal that the shoulder composition of power shoulder jackets, the times, and details fall into distinct groups. This analysis shows that the silhouette, length, collar, and front closure of the power shoulder jacket differ depending on the power shoulder's compositional methods. Moreover, the shape of the power shoulder jacket is distinctly different. One can use this data to help develop the power shoulder jacket design by reflecting the details of shoulder compositional methods and changing trends over time.