• 제목/요약/키워드: Multivariate Statistical Analysis

검색결과 632건 처리시간 0.025초

Binary classification on compositional data

  • Joo, Jae Yun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제28권1호
    • /
    • pp.89-97
    • /
    • 2021
  • Due to boundedness and sum constraint, compositional data are often transformed by logratio transformation and their transformed data are put into traditional binary classification or discriminant analysis. However, it may be problematic to directly apply traditional multivariate approaches to the transformed data because class distributions are not Gaussian and Bayes decision boundary are not polynomial on the transformed space. In this study, we propose to use flexible classification approaches to transformed data for compositional data classification. Empirical studies using synthetic and real examples demonstrate that flexible approaches outperform traditional multivariate classification or discriminant analysis.

Bayesian Analysis of a New Skewed Multivariate Probit for Correlated Binary Response Data

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • 제30권4호
    • /
    • pp.613-635
    • /
    • 2001
  • This paper proposes a skewed multivariate probit model for analyzing a correlated binary response data with covariates. The proposed model is formulated by introducing an asymmetric link based upon a skewed multivariate normal distribution. The model connected to the asymmetric multivariate link, allows for flexible modeling of the correlation structure among binary responses and straightforward interpretation of the parameters. However, complex likelihood function of the model prevents us from fitting and analyzing the model analytically. Simulation-based Bayesian inference methodologies are provided to overcome the problem. We examine the suggested methods through two data sets in order to demonstrate their performances.

  • PDF

Differentiation of Roots of Glycyrrhiza Species by 1H Nuclear Magnetic Resonance Spectroscopy and Multivariate Statistical Analysis

  • Yang, Seung-Ok;Hyun, Sun-Hee;Kim, So-Hyun;Kim, Hee-Su;Lee, Jae-Hwi;Whang, Wan-Kyun;Lee, Min-Won;Choi, Hyung-Kyoon
    • Bulletin of the Korean Chemical Society
    • /
    • 제31권4호
    • /
    • pp.825-828
    • /
    • 2010
  • To classify Glycyrrhiza species, samples of different species were analyzed by $^1H$ NMR-based metabolomics technique. Partial least squares discriminant analysis (PLS-DA) was used as the multivariate statistical analysis of the 1H NMR data sets. There was a clear separation between various Glycyrrhiza species in the PLS-DA derived score plots. The PLS-DA model was validated, and the key metabolites contributing to the separation in the score plots of various Glycyrrhiza species were lactic acid, alanine, arginine, proline, malic acid, asparagine, choline, glycine, glucose, sucrose, 4-hydroxy-phenylacetic acid, and formic acid. The compounds present at relatively high levels were glucose, and 4-hydroxyphenylacetic acid in G. glabra; lactic acid, alanine, and proline in G. inflata; and arginine, malic acid, and sucrose in G. uralensis. This is the first study to perform the global metabolomic profiling and differentiation of Glycyrrhiza species using $^1H$ NMR and multivariate statistical analysis.

역정규 손실함수를 이용한 다변량 공정능력지수 (Multivariate Process Capability Index Using Inverted Normal Loss Function)

  • 문혜진;정영배
    • 산업경영시스템학회지
    • /
    • 제41권2호
    • /
    • pp.174-183
    • /
    • 2018
  • In the industrial fields, the process capability index has been using to evaluate the variation of quality in the process. The traditional process capability indices such as $C_p$, $C_{pk}$, $C_{pm}$ and $C^+_{pm}$ have been applied in the industrial fields. These traditional process capability indices are mainly applied in the univariate analysis. However, the main streams in the recent industry are the multivariate manufacturing process and the multiple quality characteristics are corrected each other. Therefore, the multivariate statistical method should be used in the process capability analysis. The multivariate process indices need to be enhanced with more useful information and extensive application in the recent industrial fields. Hence, the purpose of the study is to develop a more effective multivariate process index ($MC_{pI}$) using the multivariate inverted normal loss function. The multivariate inverted normal loss function has the flexibility for the any type of the symmetrical and asymmetrical loss functions as well as the economic information. Especially, the proposed modeling method for the multivariate inverted normal loss function (MINLF) and the expected loss from MINLF in this paper can be applied to the any type of the symmetrical and asymmetrical loss functions. And this modeling method can be easily expanded from a bivariate case to a multivariate case.

다변량통계기법을 이용한 부가가치생산성 구조모델의 구상에 관한 연구 (A Study on Constuct of Value-Added Productivity Structure Model using Multivariate Statistical Method)

  • 이영찬;조성훈;김태성
    • 산업경영시스템학회지
    • /
    • 제19권38호
    • /
    • pp.117-129
    • /
    • 1996
  • This Study intends to analysis what 3 factors, which are indices of Capital, Labor and Distribution, really affect to Value-Added Productivity through Statistical Analysis. For this, We selected 12 indices of Value-Added from the edition of 'Annual report of Korean companies' published in 'Korea Investors Service., Inc', especially in parts of Chemicals and Chemical products of total 85 companies. Using this data, Multivariate Statistical Analysis such as Principal Component Analysis, Factor Analysis, Covariance Structure Analysis is taken for modeling the effect of 3 factor(Labor Productivity, Capital Productivity and the Index of Distribution) on Value-Added Productivity.

  • PDF

Simultaneous determination and difference evaluation of 14 ginsenosides in Panax ginseng roots cultivated in different areas and ages by high-performance liquid chromatography coupled with triple quadrupole mass spectrometer in the multiple reaction-monitoring mode combined with multivariate statistical analysis

  • Xiu, Yang;Li, Xue;Sun, Xiuli;Xiao, Dan;Miao, Rui;Zhao, Huanxi;Liu, Shuying
    • Journal of Ginseng Research
    • /
    • 제43권4호
    • /
    • pp.508-516
    • /
    • 2019
  • Background: Ginsenosides are not only the principal bioactive components but also the important indexes to the quality assessment of Panax ginseng Meyer. Their contents in cultivated ginseng vary with the growth environment and age. The present study aimed at evaluating the significant difference between 36 cultivated ginseng of different cultivation areas and ages based on the simultaneously determined contents of 14 ginsenosides. Methods: A high-performance liquid chromatography (HPLC) coupled with triple quadrupole mass spectrometer (MS) method was developed and used in the multiple reaction-monitoring (MRM) mode (HPLC-MRM/MS) for the quantitative analysis of ginsenosides. Multivariate statistical analysis, such as principal component analysis and partial least squares-discriminant analysis, was applied to discriminate ginseng samples of various cultivation areas and ages and to discover the differentially accumulated ginsenoside markers. Results: The developed HPLC-MRM/MS method was validated to be precise, accurate, stable, sensitive, and repeatable for the simultaneous determination of 14 ginsenosides. It was found that the 3- and 5-yr-old ginseng samples were differentiated distinctly by all means of multivariate statistical analysis, whereas the 4-yr-old samples exhibited similarity to either 3- or 5-yr-old samples in the contents of ginsenosides. Among the 14 detected ginsenosides, Rg1, Rb1, Rb2, Rc, 20(S)-Rf, 20(S)-Rh1, and Rb3 were identified as potential markers for the differentiation of cultivation ages. In addition, the 5-yr-old samples were able to be classified in cultivation area based on the contents of ginsenosides, whereas the 3- and 4-yr-old samples showed little differences in cultivation area. Conclusion: This study demonstrated that the HPLC-MRM/MS method combined with multivariate statistical analysis provides deep insight into the accumulation characteristics of ginsenosides and could be used to differentiate ginseng that are cultivated in different areas and ages.

A Robust Principal Component Neural Network

  • Changha Hwang;Park, Hyejung;A, Eunyoung-N
    • Communications for Statistical Applications and Methods
    • /
    • 제8권3호
    • /
    • pp.625-632
    • /
    • 2001
  • Principal component analysis(PCA) is a multivariate technique falling under the general title of factor analysis. The purpose of PCA is to Identify the dependence structure behind a multivariate stochastic observation In order to obtain a compact description of it. In engineering field PCA is utilized mainly (or data compression and restoration. In this paper we propose a new robust Hebbian algorithm for robust PCA. This algorithm is based on a hyperbolic tangent function due to Hampel ef al.(1989) which is known to be robust in Statistics. We do two experiments to investigate the performance of the new robust Hebbian learning algorithm for robust PCA.

  • PDF

INVITED PAPER MULTIVARIATE ANALYSIS FOR THE CASE WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE

  • Fujikoshi, Yasunori
    • Journal of the Korean Statistical Society
    • /
    • 제33권1호
    • /
    • pp.1-24
    • /
    • 2004
  • This paper is concerned with statistical methods for multivariate data when the number p of variables is large compared to the sample size n. Such data appear typically in analysis of DNA microarrays, curve data, financial data, etc. However, there is little statistical theory for high dimensional data. On the other hand, there are some asymptotic results under the assumption that both and p tend to $\infty$, in some ratio p/n ${\rightarrow}$c. The results suggest that the new asymptotic results are more useful and insightful than the classical large sample asymptotics. The main purpose of this paper is to review some asymptotic results for high dimensional statistics as well as classical statistics under a high dimensional asymptotic framework.

Multivariate confidence region using quantile vectors

  • Hong, Chong Sun;Kim, Hong Il
    • Communications for Statistical Applications and Methods
    • /
    • 제24권6호
    • /
    • pp.641-649
    • /
    • 2017
  • Multivariate confidence regions were defined using a chi-square distribution function under a normal assumption and were represented with ellipse and ellipsoid types of bivariate and trivariate normal distribution functions. In this work, an alternative confidence region using the multivariate quantile vectors is proposed to define the normal distribution as well as any other distributions. These lower and upper bounds could be obtained using quantile vectors, and then the appropriate region between two bounds is referred to as the quantile confidence region. It notes that the upper and lower bounds of the bivariate and trivariate quantile confidence regions are represented as a curve and surface shapes, respectively. The quantile confidence region is obtained for various types of distribution functions that are both symmetric and asymmetric distribution functions. Then, its coverage rate is also calculated and compared. Therefore, we conclude that the quantile confidence region will be useful for the analysis of multivariate data, since it is found to have better coverage rates, even for asymmetric distributions.

정준상관분석을 통한 다변량 금융시계열의 변동성 분석 (Multivariate Volatility Analysis via Canonical Correlations for Financial Time Series)

  • 이승연;황선영
    • 응용통계연구
    • /
    • 제27권7호
    • /
    • pp.1139-1149
    • /
    • 2014
  • 다변량 금융시계열의 변동성분석을 다변량 기법인 정준상관분석(canonocal correaltion analysis)을 이용해 분석하였다. 변동성의 특성상 계수들이 비음(non-negative)인 정준상관분석, 즉, non-negative and sparse canonical correlation analysis (NSCCA)를 이용해 보았다. 본 논문은 다변량 시계열의 변동성 커브에 대해 연구하고 있으며 제시된 방법론을 이변량 주식자료분석을 통해 예시해 보았다.