• 제목/요약/키워드: Multivariate algorithm

검색결과 186건 처리시간 0.023초

Simple Compromise Strategies in Multivariate Stratification

  • Park, Inho
    • Communications for Statistical Applications and Methods
    • /
    • 제20권2호
    • /
    • pp.97-105
    • /
    • 2013
  • Stratification (among other applications) is a popular technique used in survey practice to improve the accuracy of estimators. Its full potential benefit can be gained by the effective use of auxiliary variables in stratification related to survey variables. This paper focuses on the problem of stratum formation when multiple stratification variables are available. We first review a variance reduction strategy in the case of univariate stratification. We then discuss its use for multivariate situations in convenient and efficient ways using three methods: compromised measures of size, principal components analysis and a K-means clustering algorithm. We also consider three types of compromising factors to data when using these three methods. Finally, we compare their efficiency using data from MU281 Swedish municipality population.

Unmasking Multiple Outliers in Multivariate Data

  • Yoo Jong-Young
    • Communications for Statistical Applications and Methods
    • /
    • 제13권1호
    • /
    • pp.29-38
    • /
    • 2006
  • We proposed a procedure for detecting of multiple outliers in multivariate data. Rousseeuw and van Zomeren (1990) have suggested the robust distance $RD_i$ by using the Resampling Algorithm. But $RD_i$ are based on the assumption that X is in the general position.(X is said to be in the general position when every subsample of size p+1 has rank p) From the practical points of view, this is clearly unrealistic. In this paper, we proposed a computing method for approximating MVE, which is not subject to these problems. The procedure is easy to compute, and works well even if subsample is singular or nearly singular matrix.

확률적 순서를 갖는 다변량분포에서 불완전자료에 의한 추정 (Estimation from Incomplete Data in Multivariate Distributions under Stochastic Ordering)

  • Kwang Mo Jeoung
    • 응용통계연구
    • /
    • 제7권2호
    • /
    • pp.145-157
    • /
    • 1994
  • 확률적 순서관계를 갖는 다변량분포에서 얻어진 자료가 결측값을 갖는 불완전한 자료일 때, EM 알고리즘을 이용한 최우추정법을 논의하였다. 본 논문에서는 관찰값들이 부분적으로 분류된 분할표자료에 국한하여 연구되었으며 기존의 동위회귀추정 프로그램을 써서 EM을 수행할 수 있는 이점이 있다. 예를 통하여 제안된 추정법을 설명한다.

  • PDF

Detecting cell cycle-regulated genes using Self-Organizing Maps with statistical Phase Synchronization (SOMPS) algorithm

  • 김창식;차홍준;배철수;김문환
    • 한국정보전자통신기술학회논문지
    • /
    • 제1권2호
    • /
    • pp.39-50
    • /
    • 2008
  • Developing computational methods for identifying cell cycle-regulated genes has been one of important topics in systems biology. Most of previous methods consider the periodic characteristics of expression signals to identify the cell cycle-regulated genes. However, we assume that cell cycle-regulated genes are relatively active having relatively many interactions with each other based on the underlying cellular network. Thus, we are motivated to apply the theory of multivariate phase synchronization to the cell cycle expression analysis. In this study, we apply the method known as "Self-Organizing Maps with statistical Phase Synchronization (SOMPS)", which is the combination of self-organizing map and multivariate phase synchronization, producing several subsets of genes that are expected to have interactions with each other in their subset (Kim, 2008). Our evaluation experiments show that the SOMPS algorithm is able to detect cell cycle-regulated genes as much as one of recently reported method that performs better than most existing methods.

  • PDF

다변량 통계기법을 활용한 실시간 수질이상 유무 판단 시스템 개발 (Development of Real-Time Water Quality Abnormality Warning System for Using Multivariate Statistical Method)

  • 허태영;전항배;박상민;이영주
    • 대한환경공학회지
    • /
    • 제37권3호
    • /
    • pp.137-144
    • /
    • 2015
  • 본 연구는 다변량 통계기법 중 하나인 주성분분석을 활용하여 실시간으로 수질이상 유무를 판단할 수 있는 경보시스템 개발을 목적으로 하였다. 본 연구에서는 다변량 분석 방법 중 수질항목 간의 상관성을 고려한 주성분 분석 방법을 실시간으로 수질이상 유무를 판단하는 알고리즘에 적용시켰다. K-water에서 제공하는 실제 자료를 이용하여 수질 이상에 대한 실시간 감시 알고리즘의 활용성을 검증하였으며, 집중호우 등과 같은 기후변화에 따른 수질이상에 대해서는 기상청 자료와의 비교를 통해 검증하였다.

다변량 분위수 회귀나무 모형에 대한 연구 (Multivariate quantile regression tree)

  • 김재오;조형준;방성완
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권3호
    • /
    • pp.533-545
    • /
    • 2017
  • 분위수 회귀모형은 반응변수의 조건부 분포에 대하여 포괄적이고 유용한 통계적 정보를 제공한다. 그러나 많은 실제 자료는 설명변수와 반응변수가 비선형의 관계를 갖고 있어 전통적인 선형 분위수 회귀모형은 왜곡되고 잘못된 결과를 초래할 수 있다. 또한 자료의 복잡성이 증가하여 반응변수가 여러개인 다변량 자료의 분석에 대한 보다 정확한 예측과 더불어 풍부한 해석에 대한 요구가 증가하고 있다. 이러한 이유로 본 연구에서는 다변량 분위수 회귀나무 모형을 제안하였다. 본 연구에서는 기존의 다변량 회귀나무 모형의 분할변수 선택 알고리즘의 문제점을 지적하고 향상된 분할변수 선택 알고리즘을 제안하였다. 제안한 알고리즘은 합리적인 계산시간으로 적용 가능하며 분할변수 선택에서 편향 발생의 문제를 갖지 않는 동시에 기존 방법보다 더 정확하게 분할변수를 선택할 수 있있다. 본 연구에서는 모의실험과 실증 예제를 통해 제안한 방법의 우수한 성능과 유용성을 확인하였다.

Projection Pursuit K-Means Visual Clustering

  • Kim, Mi-Kyung;Huh, Myung-Hoe
    • Journal of the Korean Statistical Society
    • /
    • 제31권4호
    • /
    • pp.519-532
    • /
    • 2002
  • K-means clustering is a well-known partitioning method of multivariate observations. Recently, the method is implemented broadly in data mining softwares due to its computational efficiency in handling large data sets. However, it does not yield a suitable visual display of multivariate observations that is important especially in exploratory stage of data analysis. The aim of this study is to develop a K-means clustering method that enables visual display of multivariate observations in a low-dimensional space, for which the projection pursuit method is adopted. We propose a computationally inexpensive and reliable algorithm and provide two numerical examples.

Multivariate adaptive regression splines model for reliability assessment of serviceability limit state of twin caverns

  • Zhang, Wengang;Goh, Anthony T.C.
    • Geomechanics and Engineering
    • /
    • 제7권4호
    • /
    • pp.431-458
    • /
    • 2014
  • Construction of a new cavern close to an existing cavern will result in a modification of the state of stresses in a zone around the existing cavern as interaction between the twin caverns takes place. Extensive plane strain finite difference analyses were carried out to examine the deformations induced by excavation of underground twin caverns. From the numerical results, a fairly simple nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) has been used to relate the maximum key point displacement and the percent strain to various parameters including the rock quality, the cavern geometry and the in situ stress. Probabilistic assessments on the serviceability limit state of twin caverns can be performed using the First-order reliability spreadsheet method (FORM) based on the built MARS model. Parametric studies indicate that the probability of failure $P_f$ increases as the coefficient of variation of Q increases, and $P_f$ decreases with the widening of the pillar.

Nonlinear structural modeling using multivariate adaptive regression splines

  • Zhang, Wengang;Goh, A.T.C.
    • Computers and Concrete
    • /
    • 제16권4호
    • /
    • pp.569-585
    • /
    • 2015
  • Various computational tools are available for modeling highly nonlinear structural engineering problems that lack a precise analytical theory or understanding of the phenomena involved. This paper adopts a fairly simple nonparametric adaptive regression algorithm known as multivariate adaptive regression splines (MARS) to model the nonlinear interactions between variables. The MARS method makes no specific assumptions about the underlying functional relationship between the input variables and the response. Details of MARS methodology and its associated procedures are introduced first, followed by a number of examples including three practical structural engineering problems. These examples indicate that accuracy of the MARS prediction approach. Additionally, MARS is able to assess the relative importance of the designed variables. As MARS explicitly defines the intervals for the input variables, the model enables engineers to have an insight and understanding of where significant changes in the data may occur. An example is also presented to demonstrate how the MARS developed model can be used to carry out structural reliability analysis.

trunmnt: An R package for calculating moments in a truncated multivariate normal distribution

  • Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • 제28권6호
    • /
    • pp.673-679
    • /
    • 2021
  • The moment calculation in a truncated multivariate normal distribution is a long-standing problem in statistical computation. Recently, Kan and Robotti (2017) developed an algorithm able to calculate all orders of moment under different types of truncation. This result was implemented in an R package MomTrunc by Galarza et al. (2021); however, it is difficult to use the package in practical statistical problems because the computational burden increases exponentially as the order of the moment or the dimension of the random vector increases. Meanwhile, Lee (2021) presented an efficient numerical method in both accuracy and computational burden using Gauss-Hermit quadrature. This article introduces trunmnt implementation of Lee's work as an R package. The Package is believed to be useful for moment calculations in most practical statistical problems.