• 제목/요약/키워드: Multivariate statistical method

검색결과 295건 처리시간 0.029초

Detecting cell cycle-regulated genes using Self-Organizing Maps with statistical Phase Synchronization (SOMPS) algorithm

  • 김창식;차홍준;배철수;김문환
    • 한국정보전자통신기술학회논문지
    • /
    • 제1권2호
    • /
    • pp.39-50
    • /
    • 2008
  • Developing computational methods for identifying cell cycle-regulated genes has been one of important topics in systems biology. Most of previous methods consider the periodic characteristics of expression signals to identify the cell cycle-regulated genes. However, we assume that cell cycle-regulated genes are relatively active having relatively many interactions with each other based on the underlying cellular network. Thus, we are motivated to apply the theory of multivariate phase synchronization to the cell cycle expression analysis. In this study, we apply the method known as "Self-Organizing Maps with statistical Phase Synchronization (SOMPS)", which is the combination of self-organizing map and multivariate phase synchronization, producing several subsets of genes that are expected to have interactions with each other in their subset (Kim, 2008). Our evaluation experiments show that the SOMPS algorithm is able to detect cell cycle-regulated genes as much as one of recently reported method that performs better than most existing methods.

  • PDF

Modified partial least squares method implementing mixed-effect model

  • Kyunga Kim;Shin-Jae Lee;Soo-Heang Eo;HyungJun Cho;Jae Won Lee
    • Communications for Statistical Applications and Methods
    • /
    • 제30권1호
    • /
    • pp.65-73
    • /
    • 2023
  • Contemporary biomedical data often involve an ill-posed problem owing to small sample size and large number of multi-collinear variables. Partial least squares (PLS) method could be a plausible alternative to an ill-conditioned ordinary least squares. However, in the case of a PLS model that includes a random-effect, how to deal with a random-effect or mixed effects remains a widely open question worth further investigation. In the present study, we propose a modified multivariate PLS method implementing mixed-effect model (PLSM). The advantage of PLSM is its versatility in handling serial longitudinal data or its ability for taking a randomeffect into account. We conduct simulations to investigate statistical properties of PLSM, and showcase its real clinical application to predict treatment outcome of esthetic surgical procedures of human faces. The proposed PLSM seemed to be particularly beneficial 1) when random-effect is conspicuous; 2) the number of predictors is relatively large compared to the sample size; 3) the multicollinearity is weak or moderate; and/or 4) the random error is considerable.

SAS, SPSS, MINITAB, 5-PLUS에서 다변량 R-기법의 비교 (A Comparison of Multivariate R-Techniques in SAS, SPSS, Minitab and S-plus)

  • 최용석;문희정
    • 응용통계연구
    • /
    • 제17권1호
    • /
    • pp.153-164
    • /
    • 2004
  • 본 소고에서는 통계 팩키지 사용자들에게 널리 알려진 SAS 8.1, SPSS 10, Minitab 13, 5-plus 2000을 대상으로 R-기법의 다변량 분석 중 주성분분석, 인자분석의 입력 자료 형태, 입력 옵션, 출력 형태를 비교하고 각 분석의 장단점을 정리하였다. 연구의 목적은 사용자들에게 가장 적합한 팩키지를 선택하여 R-기법 다변량 자료분석을 할 수 있도록 도움을 주는 데 있다.

FAULT DETECTION, MONITORING AND DIAGNOSIS OF SEQUENCING BATCH REACTOR FOR INTEGRATED WASTEWATER TREATMENT MANAGEMENT SYSTEM

  • Yoo, Chang-Kyoo;Vanrolleghem, Peter A.;Lee, In-Beum
    • Environmental Engineering Research
    • /
    • 제11권2호
    • /
    • pp.63-76
    • /
    • 2006
  • Multivariate analysis and batch monitoring on a pilot-scale sequencing batch reactor (SBR) are described for integrated wastewater treatment management system, where a batchwise multiway independent component analysis method (MICA) are used to extract meaningful hidden information from non-Gaussian wastewater treatment data. Three-way batch data of SBR are unfolded batch-wisely, and then a non-Gaussian multivariate monitoring method is used to capture the non-Gaussian characteristics of normal batches in biological wastewater treatment plant. It is successfully applied to an 80L SBR for biological wastewater treatment, which is characterized by a variety of error sources with non-Gaussian characteristics. The batchwise multivariate monitoring results of a pilot-scale SBR for integrated wastewater treatment management system showed more powerful monitoring performance on a WWTP application than the conventional method since it can extract non-Gaussian source signals which are independent and cross-correlation of variables.

A Simple Nonparametric Test of Complete Independence

  • Park, Cheol-Yong
    • Communications for Statistical Applications and Methods
    • /
    • 제5권2호
    • /
    • pp.411-416
    • /
    • 1998
  • A simple nonparametric test of complete or total independence is suggested for continuous multivariate distributions. This procedure first discretizes the original variables based on their order statistics, and then tests the hypothesis of complete independence for the resulting contingency table. Under the hypothesis of independence, the chi-squared test statistic has an asymptotic chi-squared distribution. We present a simulation study to illustrate the accuracy in finite samples of the limiting distribution of the test statistic. We compare our method to another nonparametric test of complete independence via a simulation study. Finally, we apply our method to the residuals from a real data set.

  • PDF

A General Mixed Linear Model with Left-Censored Data

  • Ha, Il-Do
    • Communications for Statistical Applications and Methods
    • /
    • 제15권6호
    • /
    • pp.969-976
    • /
    • 2008
  • Mixed linear models have been widely used in various correlated data including multivariate survival data. In this paper we extend hierarchical-likelihood(h-likelihood) approach for mixed linear models with right censored data to that for left censored data. We also allow a general random-effect structure and propose the estimation procedure. The proposed method is illustrated using a numerical data set and is also compared with marginal likelihood method.

Variable Selection Based on Direction Vectors

  • Kyungmee Choi
    • Communications for Statistical Applications and Methods
    • /
    • 제5권1호
    • /
    • pp.25-33
    • /
    • 1998
  • We review a multivariate version of Kendall's tau based on direction vectors of observations. And with this statistic we propose an analog of the forward variable selection method which selects a set of independent variables for further studies to build the eventual predicting model. This method does not assume the distributions of observations and the linear model and it is strong to the outliers with high asymptotic efficiencies relative to the parametric Pearson's correlation coefficient.

  • PDF

다변량 시계열 자료를 이용한 부정맥 예측 (Prediction of arrhythmia using multivariate time series data)

  • 이민혜;노호석
    • 응용통계연구
    • /
    • 제32권5호
    • /
    • pp.671-681
    • /
    • 2019
  • 최근에 부정맥 환자가 증가하면서 머신러닝을 이용한 부정맥을 예측하는 연구가 활발하게 진행되고 있다. 기존의 많은 연구들은 특정한 시점의 RR 간격 데이터에서 추출한 특징변수 다변량 데이터에 기반하여 부정맥을 예측하였다. 본 연구에서는 심장 상태가 시간에 따라 변해가는 패턴도 부정맥 예측에 중요한 정보가 될 수 있다고 생각하여 일정한 시간 간격을 두고 특징변수의 다변량 벡터를 추출하여 쌓음으써 얻어지는 다변량 시계열 데이터로 부정맥을 예측하는 것의 유용성에 대해 살펴보았다. 1-Nearest Neighbor 방법과 그것을 앙상블(ensemble)한 learner를 중심으로 비교했을 경우 시계열의 특징을 고려한 적절한 시계열 거리함수를 선택하여 시계열 정보를 활용한 다변량 시계열 데이터 기반 방법의 분류 성능이 더 좋게 나오는 것을 확인하였다.

다변량 확률분포함수의 추정을 위한 MKDE-ebd 개발 (Development of MKDE-ebd for Estimation of Multivariate Probabilistic Distribution Functions)

  • 강영진;노유정;임오강
    • 한국전산구조공학회논문집
    • /
    • 제32권1호
    • /
    • pp.55-63
    • /
    • 2019
  • 공학문제에서 많은 확률 변수들은 상관성을 가지고 있고, 입력변수의 상관성은 기계시스템의 통계적 성능 분석 결과에 큰 영향을 미친다. 하지만, 상관 변수들은 결합분포함수를 모델링하기 어렵다는 이유로 종종 독립변수로 취급되거나 특정한 모수적 모델로 표현되는 경우가 많으며, 특히 데이터가 적은 경우 결합분포함수를 정확히 모델링하는데 더 큰 어려움이 있다. 본 연구에서 개발된 경계데이터를 이용한 다변량 커널밀도추정은 비선형성을 갖는 다양한 형태의 다변량 확률 분포 추정을 위해 개발되었다. 다변량 커널밀도추정은 주어진 데이터와 균등분포함수의 파라미터의 신뢰구간으로부터 생성된 경계데이터를 결합하여 데이터의 질과 수에 덜 민감하다. 따라서 제안된 방법은 보수적인 통계모델링과 신뢰성 해석 결과를 도출할 수 있으며, 통계시뮬레이션과 공학예제를 통해 그 성능을 검증하였다.

Order-Restricted Inference with Linear Rank Statistics in Microarray Data

  • Kang, Moon-Su
    • 응용통계연구
    • /
    • 제24권1호
    • /
    • pp.137-143
    • /
    • 2011
  • The classification of subjects with unknown distribution in a small sample size often involves order-restricted constraints in multivariate parameter setups. Those problems make the optimality of a conventional likelihood ratio based statistical inferences not feasible. Fortunately, Roy (1953) introduced union-intersection principle(UIP) which provides an alternative avenue. Multivariate linear rank statistics along with that principle, yield a considerably appropriate robust testing procedure. Furthermore, conditionally distribution-free test based upon exact permutation theory is used to generate p-values, even in a small sample. Applications of this method are illustrated in a real microarray data example (Lobenhofer et al., 2002).