• 제목/요약/키워드: Multivariate Statistical Analysis

검색결과 632건 처리시간 0.027초

Selection of markers in the framework of multivariate receiver operating characteristic curve analysis in binary classification

  • Sameera, G;Vishnu, Vardhan R
    • Communications for Statistical Applications and Methods
    • /
    • 제26권2호
    • /
    • pp.79-89
    • /
    • 2019
  • Classification models pertaining to receiver operating characteristic (ROC) curve analysis have been extended from univariate to multivariate setup by linearly combining available multiple markers. One such classification model is the multivariate ROC curve analysis. However, not all markers contribute in a real scenario and may mask the contribution of other markers in classifying the individuals/objects. This paper addresses this issue by developing an algorithm that helps in identifying the important markers that are significant and true contributors. The proposed variable selection framework is supported by real datasets and a simulation study, it is shown to provide insight about the individual marker's significance in providing a classifier rule/linear combination with good extent of classification.

Residuals Plots for Repeated Measures Data

  • 박태성
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2000년도 추계학술발표회 논문집
    • /
    • pp.187-191
    • /
    • 2000
  • In the analysis of repeated measurements, multivariate regression models that account for the correlations among the observations from the same subject are widely used. Like the usual univariate regression models, these multivariate regression models also need some model diagnostic procedures. In this paper, we propose a simple graphical method to detect outliers and to investigate the goodness of model fit in repeated measures data. The graphical method is based on the quantile-quantile(Q-Q) plots of the $X^2$ distribution and the standard normal distribution. We also propose diagnostic measures to detect influential observations. The proposed method is illustrated using two examples.

  • PDF

2000년 미국대선 플로리다주의 투표결과 분석 (Statistical Outliers in Florida Counties at the Presidential Election 2000)

  • 김현철
    • 응용통계연구
    • /
    • 제15권1호
    • /
    • pp.21-32
    • /
    • 2002
  • We searched out in the votes data of the State of Florida at presidential election 2000. We used a multivariate regression analysis. We got there were several outliers including Palm Beach County. It means that we should analyze the number of disqualified ballots which were double-punched as well as the votes, to insist the " Butterfly Ballot" made Palm Beach outlier.

다변량통계기법을 이용한 지하저장시설 주변의 지하수질 변동에 관한 연구 (Use of Multivariate Statistical Approaches for Decoding Chemical Evolution of Groundwater near Underground Storage Caverns)

  • 이정훈
    • 한국지구과학회지
    • /
    • 제35권4호
    • /
    • pp.225-236
    • /
    • 2014
  • 다변량통계기법은 수리지구화학 자료의 분석 및 해석에 많이 이용되어 왔다. 본 연구에서 대응분석과 주성분분석을 동시에 사용하여 인위적인 활동에 의한 지하수의 특징을 살펴보았다. 본 연구의 목적은 NETPATH 프로그램 속의 WATEQ4F를 이용하여 지하수 화학성분의 분화를 계산하고 이를 다변량통계기법을 이용하여 지구화학적인 정보를 추출하는 것이다. 연구지역은 한반도의 남동쪽에 위치한 울산의 LPG 저장시설이다. 본 연구지역에서는 다른 저장시설에서 관찰되는 초염기성의 조성을 가지는 지하수가 관찰되었다. 이러한 인위적인 영향에 의한 높은 pH를 가지는 지하수로 인해 Al의 분화특징과 탄산염의 침전을 유발할 수 있다. 본 연구에서는 연구지역에 지하수에 영향을 주는 두 인위적인 요소(세정작용와 시멘트영향)에 의해서 수리지구화학적인 특징과 상이 어떻게 변하는 가에 초점을 두었다. 이전 연구결과와 두 통계분석을 통해 제시된 결과를 비교하여 지구화학적인 정보를 이용한 주성분분석과 대응분석인 수리지구화학 연구에서 기초연구로 활용될 수 있음을 알 수 있다.

Assessment of Water Quality using Multivariate Statistical Techniques: A Case Study of the Nakdong River Basin, Korea

  • Park, Seongmook;Kazama, Futaba;Lee, Shunhwa
    • Environmental Engineering Research
    • /
    • 제19권3호
    • /
    • pp.197-203
    • /
    • 2014
  • This study estimated spatial and seasonal variation of water quality to understand characteristics of Nakdong river basin, Korea. All together 11 parameters (discharge, water temperature, dissolved oxygen, 5-day biochemical oxygen demand, chemical oxygen demand, pH, suspended solids, electrical conductivity, total nitrogen, total phosphorus, and total organic carbon) at 22 different sites for the period of 2003-2011 were analyzed using multivariate statistical techniques (cluster analysis, principal component analysis and factor analysis). Hierarchical cluster analysis grouped whole river basin into three zones, i.e., relatively less polluted (LP), medium polluted (MP) and highly polluted (HP) based on similarity of water quality characteristics. The results of factor analysis/principal component analysis explained up to 83.0%, 81.7% and 82.7% of total variance in water quality data of LP, MP, and HP zones, respectively. The rotated components of PCA obtained from factor analysis indicate that the parameters responsible for water quality variations were mainly related to discharge and total pollution loads (non-point pollution source) in LP, MP and HP areas; organic and nutrient pollution in LP and HP zones; and temperature, DO and TN in LP zone. This study demonstrates the usefulness of multivariate statistical techniques for analysis and interpretation of multi-parameter, multi-location and multi-year data sets.

Multivariate analysis of longitudinal surveys for population median

  • Priyanka, Kumari;Mittal, Richa
    • Communications for Statistical Applications and Methods
    • /
    • 제24권3호
    • /
    • pp.255-269
    • /
    • 2017
  • This article explores the analysis of longitudinal surveys in which same units are investigated on several occasions. Multivariate exponential ratio type estimator has been proposed for the estimation of the finite population median at the current occasion in two occasion longitudinal surveys. Information on several additional auxiliary variables, which are stable over time and readily available on both the occasions, has been utilized. Properties of the proposed multivariate estimator, including the optimum replacement strategy, are presented. The proposed multivariate estimator is compared with the sample median estimator when there is no matching from a previous occasion and with the exponential ratio type estimator in successive sampling when information is available on only one additional auxiliary variable. The merits of the proposed estimator are justified by empirical interpretations and validated by a simulation study with the help of some natural populations.

보건조사연구에서 다변량결측치가 내포된 자료를 효율적으로 분석하기 위한 통계학적 방법 (Statistical Methods for Multivariate Missing Data in Health Survey Research)

  • 김동기;박은철;손명세;김한중;박형욱;안재형;임종건;송기준
    • Journal of Preventive Medicine and Public Health
    • /
    • 제31권4호
    • /
    • pp.875-884
    • /
    • 1998
  • Missing observations are common in medical research and health survey research. Several statistical methods to handle the missing data problem have been proposed. The EM algorithm (Expectation-Maximization algorithm) is one of the ways of efficiently handling the missing data problem based on sufficient statistics. In this paper, we developed statistical models and methods for survey data with multivariate missing observations. Especially, we adopted the EM algorithm to handle the multivariate missing observations. We assume that the multivariate observations follow a multivariate normal distribution, where the mean vector and the covariance matrix are primarily of interest. We applied the proposed statistical method to analyze data from a health survey. The data set we used came from a physician survey on Resource-Based Relative Value Scale(RBRVS). In addition to the EM algorithm, we applied the complete case analysis, which uses only completely observed cases, and the available case analysis, which utilizes all available information. The residual and normal probability plots were evaluated to access the assumption of normality. We found that the residual sum of squares from the EM algorithm was smaller than those of the complete-case and the available-case analyses.

  • PDF

Combining cluster analysis and neural networks for the classification problem

  • Kim, Kyungsup;Han, Ingoo
    • 한국경영과학회:학술대회논문집
    • /
    • 한국경영과학회 1996년도 추계학술대회발표논문집; 고려대학교, 서울; 26 Oct. 1996
    • /
    • pp.31-34
    • /
    • 1996
  • The extensive researches have compared the performance of neural networks(NN) with those of various statistical techniques for the classification problem. The empirical results of these comparative studies have indicated that the neural networks often outperform the traditional statistical techniques. Moreover, there are some efforts that try to combine various classification methods, especially multivariate discriminant analysis with neural networks. While these efforts improve the performance, there exists a problem violating robust assumptions of multivariate discriminant analysis that are multivariate normality of the independent variables and equality of variance-covariance matrices in each of the groups. On the contrary, cluster analysis alleviates this assumption like neural networks. We propose a new approach to classification problems by combining the cluster analysis with neural networks. The resulting predictions of the composite model are more accurate than each individual technique.

  • PDF

Resistant Singular Value Decomposition and Its Statistical Applications

  • Park, Yong-Seok;Huh, Myung-Hoe
    • Journal of the Korean Statistical Society
    • /
    • 제25권1호
    • /
    • pp.49-66
    • /
    • 1996
  • The singular value decomposition is one of the most useful methods in the area of matrix computation. It gives dimension reduction which is the centeral idea in many multivariate analyses. But this method is not resistant, i.e., it is very sensitive to small changes in the input data. In this article, we derive the resistant version of singular value decomposition for principal component analysis. And we give its statistical applications to biplot which is similar to principal component analysis in aspects of the dimension reduction of an n x p data matrix. Therefore, we derive the resistant principal component analysis and biplot based on the resistant singular value decomposition. They provide graphical multivariate data analyses relatively little influenced by outlying observations.

  • PDF

Diagnosis of Thickness Quality Using Multivariate Statistical Analysis in Hot Finishing Mill

  • Kim, Heung-Mook
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2001년도 ICCAS
    • /
    • pp.116.3-116
    • /
    • 2001
  • A diagnosis methodology for thickness quality in hot finishing mill is proposed based on multivariate statistical analysis. The thickness of hot strip is a key quality factor that is measured by x-ray thickness gauge. Currently, the thickness quality is guaranteed by upper and lower limit of thickness deviation from target thickness. But if any over-limit is occurred, there is no in-line method to identify the causes. In this paper, many parameters are extracted from the thickness deviation signal such as mean deviation(top, middle, tail), rms deviation(top, middle, tail) and peak deviation(top, middle, tail) as time domain parameters ...

  • PDF