• Title/Summary/Keyword: PCA(principal component analysis)

Search Result 1,222, Processing Time 0.033 seconds

A STOCHASTIC VARIANCE REDUCTION METHOD FOR PCA BY AN EXACT PENALTY APPROACH

  • Jung, Yoon Mo;Lee, Jae Hwa;Yun, Sangwoon
    • Bulletin of the Korean Mathematical Society
    • /
    • v.55 no.4
    • /
    • pp.1303-1315
    • /
    • 2018
  • For principal component analysis (PCA) to efficiently analyze large scale matrices, it is crucial to find a few singular vectors in cheaper computational cost and under lower memory requirement. To compute those in a fast and robust way, we propose a new stochastic method. Especially, we adopt the stochastic variance reduced gradient (SVRG) method [11] to avoid asymptotically slow convergence in stochastic gradient descent methods. For that purpose, we reformulate the PCA problem as a unconstrained optimization problem using a quadratic penalty. In general, increasing the penalty parameter to infinity is needed for the equivalence of the two problems. However, in this case, exact penalization is guaranteed by applying the analysis in [24]. We establish the convergence rate of the proposed method to a stationary point and numerical experiments illustrate the validity and efficiency of the proposed method.

The Forest Communities of Mt. Chombong Described by Combined Methods of Classification and Ordination (Classification과 Ordination 분석법(分析法)의 병용(竝用)에 의한 점봉산일대(點鳳山一帶) 삼림군집(森林群集)의 해석(解析))

  • Kim, Ji Hong
    • Journal of Korean Society of Forest Science
    • /
    • v.78 no.3
    • /
    • pp.255-262
    • /
    • 1989
  • Vegetation data of the mixed mesophytic forest in Mt. Chombong area were analyzed by the methods of classification and ordination. 'Weighted group average linkage cluster analysis' recognized five distinctive vegetation groups, based on the abundance data of 83 woody plant species in 70 sampling units. The species diversity was also examined for each group. The importance values of 42 tree species in the groups were subjected to principal component analysis (PCA). The PCA ordinated five vegetation groups on the first two axes, so as to compare similarity among them in terms of species composition. Acer palmatum, Fraxinus rhynchophylla, Quercus mongolica, and Acer mono had greatest influence on the determination of group scores with high eigenvectors (component loadings) in the first axis. Distribution of these four dominant species appeared to be important in determining community association in this diversified forest.

  • PDF

Choice of frequency via principal component in high-frequency multivariate volatility models (주성분을 이용한 다변량 고빈도 실현 변동성의 주기 선택)

  • Jin, M.K.;Yoon, J.E.;Hwang, S.Y.
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.747-757
    • /
    • 2017
  • We investigate multivariate volatilities based on high frequency time series. The PCA (principal component analysis) method is employed to achieve a dimension reduction in multivariate volatility. Multivariate realized volatilities (RV) with various frequencies are calculated from high frequency data and "optimum" frequency is suggested using PCA. Specifically, RVs with various frequencies are compared with existing daily volatilities such as Cholesky, EWMA and BEKK after dimension reduction via PCA. An analysis of high frequency stock prices of KOSPI, Samsung Electronics and Hyundai motor company is illustrated.

Comparisons of Linear Feature Extraction Methods (선형적 특징추출 방법의 특성 비교)

  • Oh, Sang-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.4
    • /
    • pp.121-130
    • /
    • 2009
  • In this paper, feature extraction methods, which is one field of reducing dimensions of high-dimensional data, are empirically investigated. We selected the traditional PCA(Principal Component Analysis), ICA(Independent Component Analysis), NMF(Non-negative Matrix Factorization), and sNMF(Sparse NMF) for comparisons. ICA has a similar feature with the simple cell of V1. NMF implemented a "parts-based representation in the brain" and sNMF is a improved version of NMF. In order to visually investigate the extracted features, handwritten digits are handled. Also, the extracted features are used to train multi-layer perceptrons for recognition test. The characteristic of each feature extraction method will be useful when applying feature extraction methods to many real-world problems.

A Comparison of PCA, LDA, and Matching Methods for Face Recognition (얼굴인식을 위한 PCA, LDA 및 정합기법의 비교)

  • 박세제;박영태
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.372-378
    • /
    • 2003
  • Limitations on the linear discriminant analysis (LDA) for face rerognition, such as the loss of generalization and the computational infeasibility, are addressed and illustrated for a small number of samples. The principal component analysis (PCA) followed by the LDA mapping may be an alternative that ran overcome these limitations. We also show that any schemes based on either mappings or template matching are vulnerable to image variations due to rotation, translation, facial expressions, or local illumination conditions. This entails the importance of a proper preprocessing that can compensate for such variations. A simple template matching, when combined with the geometrically correlated feature-based detection as a preprocessing, is shown to outperform mapping techniques in terms of both the accuracy and the robustness to image variations.

The analysis of variational characteristics on water quality and phytoplankton by principal component analysis(PCA) in Kogum-sudo, Southwestern part of Korea (주성분분석에 의한 거금수도의 수질환경 및 식물플랑크톤 변동 요인 해석)

  • 윤양호;박종식
    • Journal of Environmental Science International
    • /
    • v.9 no.1
    • /
    • pp.1-11
    • /
    • 2000
  • A study on the variational characteristics of water quality and phytoplankton biomass by principal component analysis(PCA) was carried out in Kogum-sudo from February to October in 1993. We analyzed PCA on biological factors such as chlorophyll a and phytoplankton cell numbers for centric and pennate diatoms, phytoflagellates, and total phytoplankton as well as physico-chemical factors as water temperature, salinity, transparency, dissolved oxygen(DO), saturation of DO, apparent oxygen utilization (AOU), chemical oxygen demand(COD), nutrient (ammonia, nitrite, nitrate, phosphate and silicate), N/P ratio and suspended solid(SS). The source of nutrients supply depended on the mineralization of organic matters and inputs of seawater from outside rather than runoff of freshwater. The phytoplankton biomass was changed within short interval period by nutrients change. And it was controlled by the combination of several environmental factors, especially of light intensity, ammonia and phosphate. The marine environmental characteristics were determined by the mineralization of organic matters in winter, by runoff of freshwater including high nutrients concentration in spring, by ammonia uptake and high phytoplankton productivity in summer, and phosphate supplied input seawater from outside of Kogeum-sudo in autumn. And Kogum-sudo was separated with 2 regions by score distributions of PCA. That is to say, one region was middle parts of straits which was characterized by the mixing seawater and the accumulated organic matters, other one region was Pungnam Bay and the water around Kogum Island which was done by high phytoplankyon biomass and productivity year-round.

  • PDF

The Impact of the PCA Dimensionality Reduction for CNN based Hyperspectral Image Classification (CNN 기반 초분광 영상 분류를 위한 PCA 차원축소의 영향 분석)

  • Kwak, Taehong;Song, Ahram;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.6_1
    • /
    • pp.959-971
    • /
    • 2019
  • CNN (Convolutional Neural Network) is one representative deep learning algorithm, which can extract high-level spatial and spectral features, and has been applied for hyperspectral image classification. However, one significant drawback behind the application of CNNs in hyperspectral images is the high dimensionality of the data, which increases the training time and processing complexity. To address this problem, several CNN based hyperspectral image classification studies have exploited PCA (Principal Component Analysis) for dimensionality reduction. One limitation to this is that the spectral information of the original image can be lost through PCA. Although it is clear that the use of PCA affects the accuracy and the CNN training time, the impact of PCA for CNN based hyperspectral image classification has been understudied. The purpose of this study is to analyze the quantitative effect of PCA in CNN for hyperspectral image classification. The hyperspectral images were first transformed through PCA and applied into the CNN model by varying the size of the reduced dimensionality. In addition, 2D-CNN and 3D-CNN frameworks were applied to analyze the sensitivity of the PCA with respect to the convolution kernel in the model. Experimental results were evaluated based on classification accuracy, learning time, variance ratio, and training process. The size of the reduced dimensionality was the most efficient when the explained variance ratio recorded 99.7%~99.8%. Since the 3D kernel had higher classification accuracy in the original-CNN than the PCA-CNN in comparison to the 2D-CNN, the results revealed that the dimensionality reduction was relatively less effective in 3D kernel.

Multivariate Time Series Simulation With Component Analysis (독립성분분석을 이용한 다변량 시계열 모의)

  • Lee, Tae-Sam;Salas, Jose D.;Karvanen, Juha;Noh, Jae-Kyoung
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2008.05a
    • /
    • pp.694-698
    • /
    • 2008
  • In hydrology, it is a difficult task to deal with multivariate time series such as modeling streamflows of an entire complex river system. Normal distribution based model such as MARMA (Multivariate Autorgressive Moving average) has been a major approach for modeling the multivariate time series. There are some limitations for the normal based models. One of them might be the unfavorable data-transformation forcing that the data follow the normal distribution. Furthermore, the high dimension multivariate model requires the very large parameter matrix. As an alternative, one might be decomposing the multivariate data into independent components and modeling it individually. In 1985, Lins used Principal Component Analysis (PCA). The five scores, the decomposed data from the original data, were taken and were formulated individually. The one of the five scores were modeled with AR-2 while the others are modeled with AR-1 model. From the time series analysis using the scores of the five components, he noted "principal component time series might provide a relatively simple and meaningful alternative to conventional large MARMA models". This study is inspired from the researcher's quote to develop a multivariate simulation model. The multivariate simulation model is suggested here using Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Three modeling step is applied for simulation. (1) PCA is used to decompose the correlated multivariate data into the uncorrelated data while ICA decomposes the data into independent components. Here, the autocorrelation structure of the decomposed data is still dominant, which is inherited from the data of the original domain. (2) Each component is resampled by block bootstrapping or K-nearest neighbor. (3) The resampled components bring back to original domain. From using the suggested approach one might expect that a) the simulated data are different with the historical data, b) no data transformation is required (in case of ICA), c) a complex system can be decomposed into independent component and modeled individually. The model with PCA and ICA are compared with the various statistics such as the basic statistics (mean, standard deviation, skewness, autocorrelation), and reservoir-related statistics, kernel density estimate.

  • PDF

Assessment of CO2 Emissions of Vehicles in Highway Sections Using Principal Component Analysis (주성분분석을 이용한 간선도로 구간 별 차량 당 CO2 다량 배출구간 평가)

  • Lee, Yoon Seok;Kim, Da Ye;Oh, Heung Un
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.33 no.5
    • /
    • pp.1981-1987
    • /
    • 2013
  • $CO_2$ emissions of vehicles vary with vehicle's speeds. In addition, the speeds vary with road type, location, time and traffic volume. In this paper, the section in which a large quantity of $CO_2$ emissions per vehicle is exhausted is determined and analyzed with principal component analysis(PCA). In results of analysis, the principal components analysis were divided into two principal components. It had been identified that the main component was the time zone one which is able to explain each components' role. The first principal component could explain the role of a major component on $CO_2$ emissions per vehicle in the early morning and afternoon hour, respectively. The second principal component could explain the role of the component on $CO_2$ emissions per vehicle in the morning and afternoon peak hours, respectively. Therefore, the section in which a large quantity of $CO_2$ emissions per vehicle could be deterimined by PCA scores.

Probabilistic penalized principal component analysis

  • Park, Chongsun;Wang, Morgan C.;Mo, Eun Bi
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.2
    • /
    • pp.143-154
    • /
    • 2017
  • A variable selection method based on probabilistic principal component analysis (PCA) using penalized likelihood method is proposed. The proposed method is a two-step variable reduction method. The first step is based on the probabilistic principal component idea to identify principle components. The penalty function is used to identify important variables in each component. We then build a model on the original data space instead of building on the rotated data space through latent variables (principal components) because the proposed method achieves the goal of dimension reduction through identifying important observed variables. Consequently, the proposed method is of more practical use. The proposed estimators perform as the oracle procedure and are root-n consistent with a proper choice of regularization parameters. The proposed method can be successfully applied to high-dimensional PCA problems with a relatively large portion of irrelevant variables included in the data set. It is straightforward to extend our likelihood method in handling problems with missing observations using EM algorithms. Further, it could be effectively applied in cases where some data vectors exhibit one or more missing values at random.