• Title/Summary/Keyword: 주성분 분석

Search Result 1,998, Processing Time 0.03 seconds

On Robust Principal Component using Analysis Neural Networks (신경망을 이용한 로버스트 주성분 분석에 관한 연구)

  • Kim, Sang-Min;Oh, Kwang-Sik;Park, Hee-Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.7 no.1
    • /
    • pp.113-118
    • /
    • 1996
  • Principal component analysis(PCA) is an essential technique for data compression and feature extraction, and has been widely used in statistical data analysis, communication theory, pattern recognition, and image processing. Oja(1992) found that a linear neuron with constrained Hebbian learning rule can extract the principal component by using stochastic gradient ascent method. In practice real data often contain some outliers. These outliers will significantly deteriorate the performances of the PCA algorithms. In order to make PCA robust, Xu & Yuille(1995) applied statistical physics to the problem of robust principal component analysis(RPCA). Devlin et.al(1981) obtained principal components by using techniques such as M-estimation. The propose of this paper is to investigate from the statistical point of view how Xu & Yuille's(1995) RPCA works under the same simulation condition as in Devlin et.al(1981).

  • PDF

A study on principal component analysis using penalty method (페널티 방법을 이용한 주성분분석 연구)

  • Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.4
    • /
    • pp.721-731
    • /
    • 2017
  • In this study, principal component analysis methods using Lasso penalty are introduced. There are two popular methods that apply Lasso penalty to principal component analysis. The first method is to find an optimal vector of linear combination as the regression coefficient vector of regressing for each principal component on the original data matrix with Lasso penalty (elastic net penalty in general). The second method is to find an optimal vector of linear combination by minimizing the residual matrix obtained from approximating the original matrix by the singular value decomposition with Lasso penalty. In this study, we have reviewed two methods of principal components using Lasso penalty in detail, and shown that these methods have an advantage especially in applying to data sets that have more variables than cases. Also, these methods are compared in an application to a real data set using R program. More specifically, these methods are applied to the crime data in Ahamad (1967), which has more variables than cases.

A dimensional reduction method in cluster analysis for multidimensional data: principal component analysis and factor analysis comparison (다차원 데이터의 군집분석을 위한 차원축소 방법: 주성분분석 및 요인분석 비교)

  • Hong, Jun-Ho;Oh, Min-Ji;Cho, Yong-Been;Lee, Kyung-Hee;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.135-143
    • /
    • 2020
  • This paper proposes a pre-processing method and a dimensional reduction method in the analysis of shopping carts where there are many correlations between variables when dividing the types of consumers in the agri-food consumer panel data. Cluster analysis is a widely used method for dividing observational objects into several clusters in multivariate data. However, cluster analysis through dimensional reduction may be more effective when several variables are related. In this paper, the food consumption data surveyed of 1,987 households was clustered using the K-means method, and 17 variables were re-selected to divide it into the clusters. Principal component analysis and factor analysis were compared as the solution for multicollinearity problems and as the way to reduce dimensions for clustering. In this study, both principal component analysis and factor analysis reduced the dataset into two dimensions. Although the principal component analysis divided the dataset into three clusters, it did not seem that the difference among the characteristics of the cluster appeared well. However, the characteristics of the clusters in the consumption pattern were well distinguished under the factor analysis method.

Modified Kernel PCA Applied To Classification Problem (수정된 커널 주성분 분석 기법의 분류 문제에의 적용)

  • Kim, Byung-Joo;Sim, Joo-Yong;Hwang, Chang-Ha;Kim, Il-Kon
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.243-248
    • /
    • 2003
  • An incremental kernel principal component analysis (IKPCA) is proposed for the nonlinear feature extraction from the data. The problem of batch kernel principal component analysis (KPCA) is that the computation becomes prohibitive when the data set is large. Another problem is that, in order to update the eigenvectors with another data, the whole eigenspace should be recomputed. IKPCA overcomes these problems by incrementally computing eigenspace model and empirical kernel map The IKPCA is more efficient in memory requirement than a batch KPCA and can be easily improved by re-learning the data. In our experiments we show that IKPCA is comparable in performance to a batch KPCA for the feature extraction and classification problem on nonlinear data set.

Classification and Selection of the Breeding materials in the Silkworm, Bombyx mori, by Multivariate Analysis 1. Classification of the Silkworm Genetic Stocks by Principal Component Analysis and Cluster Analysis (다변량 해석법에 의한 누에 육종소재의 탐색 1. 주성분분석과 집락분석을 이용한 누에품종분류)

  • 정도섭;이인정
    • Journal of Sericultural and Entomological Science
    • /
    • v.31 no.2
    • /
    • pp.102-112
    • /
    • 1989
  • Principal component analysis and cluster analysis were performed on the nine quantitative characters of the one hundred and forty eight silkworm genetic stocks. The six major quantitative characters such as cocoon yield, cocoon weight, cocoon shell weight, cocoon shell percentage, larval period of the 5th instar silkworm, and total larval period showed significantly positive correlation between them. The first three principal components extracted form the initial nine variables by principal component analysis accounted for about eighty percent of original information. The first and second principal components were characterized as factors related to silk productivity, and cocoon productivity, respectively. On the basis of multivariate analysis using city block distance determined from the first three principal components to measure the phenotypic diversity, the one hundred and forty eight silkworm genetic stocks could be clustered into seven varietal groups, and the phenotypic diversity between the varietal groups was partly related to their geographical origins. Among 7 varietal group, group II and IV revealed higher silk and cocoon productivity.

  • PDF

Assessment and Classification of Korean Local Corn Lines by the Application of Principal Component Analysis (I) (Principal Component Analysis Method에 의(依)한 한국재래종(韓國在來種) 옥수수의 해석(解析) 및 계통분류(系統分類)(I))

  • Lee, In Sup;Choe, Bong Ho
    • Korean Journal of Agricultural Science
    • /
    • v.8 no.2
    • /
    • pp.139-151
    • /
    • 1981
  • To obtain breeding materials 57 Korean local corn lines collected were assessed and classified by the application of principal component analysis. The results obtained were as follows. 1. In the result of principal component analysis for 27 characters, 67.1% and 88.6% of total variation could be appreciated by the first four and fir st ten principal component respectively. 2. According to the value of characters and principal components, contribution of characters to principal components were very variable. 3. Biological meaning of the principal component and plant type corresponded to the each principal component were explained clear by the correlation coefficients between principal components and characters. 4. 57 lines were classified into 4 lineal groups by the taxonomic distances.

  • PDF

A study on the design of fault diagnostic system based on PCA (PCA-기반 고장 진단 시스템 설계에 관한 연구)

  • Kim, Sung-Ho;Lee, Young-Sam;Han, Yoon-Jong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.5
    • /
    • pp.600-605
    • /
    • 2003
  • PCA(Principle Component Analysis) has emerged as a useful tool for process monitoring and fault diagnosis. The general approach requires the user to identify the root cause by interpreting the residual or principle components. This could be tedious and often impossible for a large process. In this paper, PCA scheme is combined with the FCM-based fault diagnostic algorithm to enhance the diagnostic results. The implementation of the FCM-based fault diagnostic system by using PCA is done and its application is illustrated on the two-tank system.

IoT Attack Detection Using PCA and Machine Learning (주성분 분석과 기계학습을 이용한 사물인터넷 공격 탐지)

  • Lee, Ji-Gu;Lee, Soo-Jin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.245-246
    • /
    • 2022
  • 최근 IoT 환경에서 기계학습을 이용한 공격 탐지 모델의 연구가 활발히 진행되고 있으며, 탐지 정확도도 점차 향상되고 있다. 하지만, IoT 환경의 특징인 저 사양 하드웨어, 고차원의 특징, 방대한 트래픽 등으로 인해 탐지성능이 저하되는 문제가 있다. 따라서 본 논문에서는 MQTT(Message Queuing Telementry Transport) 프로토콜 기반의 IoT 환경에서 수집된 데이터셋을 대상으로 주성분 분석(Principal Component Analysis)과 LightGBM을 이용하여 데이터셋 차원을 감소시키고, 공격 클래스를 분류하였다. 실험결과 원본 데이터셋 차원을 주성분 3개(약 9%)로 감소시켰음에도 모든 특징(33개)을 사용한 실험결과와 거의 유사한 성능을 보였다. 또한 기존 연구의 특징 선택을 통한 탐지 모델과 비교하였을 때도 분류성능이 더 우수한 것으로 나타났다.

  • PDF

Application of the supplementary principal component analysis for the 1982-1992 Korean Pro Baseball data (89-92 한국 프로야구의 각 팀과 부문별 평균 성적에 대한 추가적 주성분분석의 응용)

  • 최용석;심희정
    • The Korean Journal of Applied Statistics
    • /
    • v.8 no.1
    • /
    • pp.51-60
    • /
    • 1995
  • Given an $n \times p$ data matrix, if we add the $p_s$ variables somewhat different nature than the p variables to this matrix, we have a new $n \times (p+p_s)$ data matrix. Because of these $p_s$ variables, the traditional principal component analysis can't provide its efficient results. In this study, to improve this problem we review the supplementary principal component analysis putting $p_s$ variables to supplementary variable. This technique is based on the algebraic and geometric aspects of the traditional principal component analysis. So we provide a type of statistical data analysis for the records of eight teams and fourteen fields of the 1982-1992 Korean Pro Baseball Data based on the supplementary principal component analysis and the traditional principal component analysis. And we compare the their results.

  • PDF