• 제목/요약/키워드: selection of principal components

검색결과 51건 처리시간 0.023초

Principal Component Regression by Principal Component Selection

  • Lee, Hosung;Park, Yun Mi;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제22권2호
    • /
    • pp.173-180
    • /
    • 2015
  • We propose a selection procedure of principal components in principal component regression. Our method selects principal components using variable selection procedures instead of a small subset of major principal components in principal component regression. Our procedure consists of two steps to improve estimation and prediction. First, we reduce the number of principal components using the conventional principal component regression to yield the set of candidate principal components and then select principal components among the candidate set using sparse regression techniques. The performance of our proposals is demonstrated numerically and compared with the typical dimension reduction approaches (including principal component regression and partial least square regression) using synthetic and real datasets.

주성분회귀분석에서 주성분선정을 위한 새로운 방법 (Procedure for the Selection of Principal Components in Principal Components Regression)

  • 김부용;신명희
    • 응용통계연구
    • /
    • 제23권5호
    • /
    • pp.967-975
    • /
    • 2010
  • 데이터마이닝 분야에서의 회귀모형에는 연관성이 높은 설명변수들이 포함되어 다중공선성을 유발하는 경우가 많은데, 다중공선성이 야기하는 문제를 해결하기 위하여 주성분회귀분석을 적용할 수 있다. 이 분석에서는 적절한 주성분을 선정하는 과정이 핵심인데, 기존의 선정방법들은 다중공선성을 잘 해결하지 못하거나 모형의 적합성을 저하시킨다는 지적을 받고 있다. 따라서 본 논문에서는 다중공선성 문제와 적합성 저하 현상을 동시에 해결할 수 있는 새로운 선정방법을 제안하였다. 다중공선성에 의해 최소제곱추정량의 분산이 팽창되는 문제를 주성분회귀에 의해 해결할 수 있지만, 주성분의 일부를 선정함에 따라 발생하는 편의도 동시에 통제해야 한다. 따라서 주성분회귀추정량의 평균제곱오차를 최소가 되게 하는 상태지수를 측정하고, 이 값에 영향을 미치는 주요 요인들을 컨조인트분석에 의해 파악하여 주성분 선정기준 모형을 구축하였다. 선정기준의 상한과 하한을 설정하고, 상태지수가 상한을 초과하면 해당 주성분을 제외시키고, 하한에 미달하면 해당 주성분을 포함시킨다. 그리고 상한과 하한 사이의 상태지수에 대응하는 주성분들에 대해서는 일반화선형검정을 순차적으로 적용하여 주성분을 선정하는 방법이다.

로버스트주성분회귀에서 최적의 주성분선정을 위한 기준 (A Criterion for the Selection of Principal Components in the Robust Principal Component Regression)

  • 김부용
    • Communications for Statistical Applications and Methods
    • /
    • 제18권6호
    • /
    • pp.761-770
    • /
    • 2011
  • 회귀모형에 연관성이 높은 설명변수들이 포함되면 다중공선성의 문제가 야기되며, 동시에 자료에 회귀 이상점들이 포함되면 최소자승추정량에 바탕을 둔 제반 통계적 추론은 심각한 결함을 갖게 된다. 이러한 현상들은 데이터마이닝 분야에서 많이 볼 수 있는데, 본 논문에서는 두 가지 문제를 동시에 해결하기 위한 방안으로서 로버스트주성분회귀를 제안하였다. 특히 최적의 주성분을 선정하기 위한 새로운 기준을 개발하였는데, 설명변수들의 표본공분산 대신에 MVE-추정량을 기반으로 하였으며, 고유치가 아니라 상태지수의 크기에 바탕을 둔 선정기준을 제안하였다. 그리고 주성분모형에서의 추정을 위하여 회귀이상점에 대해 로버스트한 LTS-추정을 도입하였다. 제안된 선정기준이 기존의 기준들보다 다중공선성과 이상점이 유발하는 문제들을 잘 해결할 수 있음을 모의실험을 통하여 확인하였다.

Logistic Regression Classification by Principal Component Selection

  • Kim, Kiho;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제21권1호
    • /
    • pp.61-68
    • /
    • 2014
  • We propose binary classification methods by modifying logistic regression classification. We use variable selection procedures instead of original variables to select the principal components. We describe the resulting classifiers and discuss their properties. The performance of our proposals are illustrated numerically and compared with other existing classification methods using synthetic and real datasets.

A Penalized Principal Components using Probabilistic PCA

  • Park, Chong-Sun;Wang, Morgan
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2003년도 춘계 학술발표회 논문집
    • /
    • pp.151-156
    • /
    • 2003
  • Variable selection algorithm for principal component analysis using penalized likelihood method is proposed. We will adopt a probabilistic principal component idea to utilize likelihood function for the problem and use HARD penalty function to force coefficients of any irrelevant variables for each component to zero. Consistency and sparsity of coefficient estimates will be provided with results of small simulated and illustrative real examples.

  • PDF

근적외 스펙트럼을 이용한 정량분석용 최적 주성분회귀모델을 얻기 위한 알고리듬 (Algorithm for Finding the Best Principal Component Regression Models for Quantitative Analysis using NIR Spectra)

  • 조정환
    • Journal of Pharmaceutical Investigation
    • /
    • 제37권6호
    • /
    • pp.377-395
    • /
    • 2007
  • Near infrared(NIR) spectral data have been used for the noninvasive analysis of various biological samples. Nonetheless, absorption bands of NIR region are overlapped extensively. It is very difficult to select the proper wavelengths of spectral data, which give the best PCR(principal component regression) models for the analysis of constituents of biological samples. The NIR data were used after polynomial smoothing and differentiation of 1st order, using Savitzky-Golay filters. To find the best PCR models, all-possible combinations of available principal components from the given NIR spectral data were derived by in-house programs written in MATLAB codes. All of the extensively generated PCR models were compared in terms of SEC(standard error of calibration), $R^2$, SEP(standard error of prediction) and SECP(standard error of calibration and prediction) to find the best combination of principal components of the initial PCR models. The initial PCR models were found by SEC or Malinowski's indicator function and a priori selection of spectral points were examined in terms of correlation coefficients between NIR data at each wavelength and corresponding concentrations. For the test of the developed program, aqueous solutions of BSA(bovine serum albumin) and glucose were prepared and analyzed. As a result, the best PCR models were found using a priori selection of spectral points and the final model selection by SEP or SECP.

Application of varimax rotated principal component analysis in quantifying some zoometrical traits of a relict cow

  • Pares-Casanova, P.M.;Sinfreu, I.;Villalba, D.
    • 대한수의학회지
    • /
    • 제53권1호
    • /
    • pp.7-10
    • /
    • 2013
  • A study was conducted to determine the interdependence among the conformation traits of 28 "Pallaresa" cows using principal component analysis. Originally 21 body linear measurements were obtained, from which eight traits are subsequently eliminated. From the principal components analysis, with raw varimax rotation of the transformation matrix, two principal components were extracted, which accounted for 65.8% of the total variance. The first principal component alone explained 51.6% of the variation, and tended to describe general size, while the second principal component had its loadings for back-sternal diameter. The two extracted principal components, which are traits related to dorsal heights and back-sternal diameter, could be considered in selection programs.

피복 구성을 위한 경부 형태의 관찰 (Observation on the shape of the neck -by principal component analysis of the mesurements-)

  • 이연순
    • 대한인간공학회지
    • /
    • 제10권2호
    • /
    • pp.31-42
    • /
    • 1991
  • To understand the shape of the neck in a view of garment planning, principal component analysis has been appliedto the measurement of the neck. The neck surface development and the cross sections of the neck have been observed. The materials consist of the body mearsurements, the neck surface developments and the cross sec- tions of the necks of a total of 108 korean woman students. The difference between the right side and the left side of the neck has not been reconginiged. But the differenece among the height of the front neck point, that of the side neck point and that of the back neck point has been recognized. 2. The initial 41 items have been found having variety and duplication. So two criteria have been made to solve those problems and the selection of 34 items have been made by each criterion. 3. 43 and 34 items have been compared by means of accumulative ratios of contribution and of clearness within the meaning of principal component. As a result, 34 measurement items have been further anylysis. 4. As a result of principal component analysis on the 34 items, the four principal components have been found obtaines and inter-preted. The four principal components are 1) the thick of the neck, 2) the front neck-line on the waist basic pattern, basic pattern, 3) the shape of the neck surface development, and 4) the back neck-line on the waist basic pattern. 5. According to the graphic informations concerning these principal components, the meaning of these four principal components has been grasped on the visual. As a result, there is a large individual difference in the shape of neck.

  • PDF

화자인식에서 연속밀도 은닉마코프모델의 혼합밀도 결정방법 (Gaussian Density Selection Method of CDHMM in Speaker Recognition)

  • 서창우;이주헌;임재열;이기용
    • 한국음향학회지
    • /
    • 제22권8호
    • /
    • pp.711-716
    • /
    • 2003
  • 본 논문은 연속밀도 은닉마코프모델에서 각 상태별 혼합성분 개수를 결정하는 방법을 제안한다. 지금까지의 대부분의 연구가 연속밀도 은닉마코프모델에서 화자의 스펙트럼 특성에 상관없이 각 상태별 동일한 혼합성분 개수를 적용하였다. 이런 접근방법은 많은 계산량을 요구할 뿐만 아니라, 각 상태의 특성을 무시하고 있기 때문에 각 상태별 음성신호의 정확한 모델링을 할 수 없다. 따라서 본 논문에서 제안한 연속밀도 은닉마코프모델의 파라미터 추정은 각 상태별 혼합성분에 대한 발생 확률값에 따라서 결정하였다. 또한 혼합성분의 개수를 줄이는 과정에서 신호의 상관성을 줄이고 시스템의 전체적인 안정성을 얻기 위해서 주성분 분석을 이용하였다. 제안한 방법은 기존의 은닉마코프모델에 비해서 평균 10% 작은 혼합성분 개수를 이용했을 때를 기준으로 실험하였다. 실험결과에서 혼합성분 결정만을 적용했을 때 거의 비슷한 성능을 얻을 수 있었다. 그리고 주성분 분석을 이용했을 때, 특정벡터가 16 차일 때 평균 0.35%의 성능감소가 일어났지만, 25 차에서는 평균 0.65%의 성능개선을 얻을 수 있었다.

식생이 무성한 지역에서의 Principal Component Analysis 에 의한 Landsat TM 자료의 광역지질도 작성 (Regional Geological Mapping by Principal Component Analysis of the Landsat TM Data in a Heavily Vegetated Area)

  • 朴鍾南;徐延熙
    • 대한원격탐사학회지
    • /
    • 제4권1호
    • /
    • pp.49-60
    • /
    • 1988
  • Principal Component Analysis (PCA) was applied for regional geological mapping to a multivariate data set of the Landsat TM data in the heavily vegetated and topographically rugged Chungju area. The multivariate data set selection was made by statistical analysis based on the magnitude of regression of squares in multiple regression, and it includes R1/2/R3/4, R2/3, R5/7/R4/3, R1/2, R3/4. R4/3. AND R4/5. As a result of application of PCA, some of later principal components (in this study PC 3 and PC 5) are geologically more significant than earlier major components, PC 1 and PC 2 herein. The earlier two major components which comprise 96% of the total information of the data set, mainly represent reflectance of vegetation and topographic effects, while though the rest represent 3% of the total information which statistically indicates the information unstable, geological significance of PC3 and PC5 in the study implies that application of the technique in more favorable areas should lead to much better results.