• Title/Summary/Keyword: Partial least-squares regression (PLS)

Search Result 100, Processing Time 0.03 seconds

Combining Ridge Regression and Latent Variable Regression

  • Kim, Jong-Duk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.51-61
    • /
    • 2007
  • Ridge regression (RR), principal component regression (PCR) and partial least squares regression (PLS) are among popular regression methods for collinear data. While RR adds a small quantity called ridge constant to the diagonal of X'X to stabilize the matrix inversion and regression coefficients, PCR and PLS use latent variables derived from original variables to circumvent the collinearity problem. One problem of PCR and PLS is that they are very sensitive to overfitting. A new regression method is presented by combining RR and PCR and PLS, respectively, in a unified manner. It is intended to provide better predictive ability and improved stability for regression models. A real-world data from NIR spectroscopy is used to investigate the performance of the newly developed regression method.

  • PDF

Expressions for Shrinkage Factors of PLS Estimator

  • Kim, Jong-Duk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.4
    • /
    • pp.1169-1180
    • /
    • 2006
  • Partial least squares regression (PLS) is a biased, non-least squares regression method and is an alternative to the ordinary least squares regression (OLS) when predictors are highly collinear or predictors outnumber observations. One way to understand the properties of biased regression methods is to know how the estimators shrink the OLS estimator. In this paper, we introduce an expression for the shrinkage factor of PLS and develop a new shrinkage expression, and then prove the equivalence of the two representations. We use two near-infrared (NIR) data sets to show general behavior of the shrinkage and in particular for what eigendirections PLS expands the OLS coefficients.

  • PDF

A Method for Screening Product Design Variables for Building A Usability Model : Genetic Algorithm Approach (사용편의성 모델수립을 위한 제품 설계 변수의 선별방법 : 유전자 알고리즘 접근방법)

  • Yang, Hui-Cheol;Han, Seong-Ho
    • Journal of the Ergonomics Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.45-62
    • /
    • 2001
  • This study suggests a genetic algorithm-based partial least squares (GA-based PLS) method to select the design variables for building a usability model. The GA-based PLS uses a genetic algorithm to minimize the root-mean-squared error of a partial least square regression model. A multiple linear regression method is applied to build a usability model that contains the variables seleded by the GA-based PLS. The performance of the usability model turned out to be generally better than that of the previous usability models using other variable selection methods such as expert rating, principal component analysis, cluster analysis, and partial least squares. Furthermore, the model performance was drastically improved by supplementing the category type variables selected by the GA-based PLS in the usability model. It is recommended that the GA-based PLS be applied to the variable selection for developing a usability model.

  • PDF

Unified Non-iterative Algorithm for Principal Component Regression, Partial Least Squares and Ordinary Least Squares

  • Kim, Jong-Duk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.355-366
    • /
    • 2003
  • A unified procedure for principal component regression (PCR), partial least squares (PLS) and ordinary least squares (OLS) is proposed. The process gives solutions for PCR, PLS and OLS in a unified and non-iterative way. This enables us to see the interrelationships among the three regression coefficient vectors, and it is seen that the so-called E-matrix in the solution expression plays the key role in differentiating the methods. In addition to setting out the procedure, the paper also supplies a robust numerical algorithm for its implementation, which is used to show how the procedure performs on a real world data set.

  • PDF

Shrinkage Structure of Ridge Partial Least Squares Regression

  • Kim, Jong-Duk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.2
    • /
    • pp.327-344
    • /
    • 2007
  • Ridge partial least squares regression (RPLS) is a regression method which can be obtained by combining ridge regression and partial least squares regression and is intended to provide better predictive ability and less sensitive to overfitting. In this paper, explicit expressions for the shrinkage factor of RPLS are developed. The structure of the shrinkage factor is explored and compared with those of other biased regression methods, such as ridge regression, principal component regression, ridge principal component regression, and partial least squares regression using a near infrared data set.

  • PDF

Missing Values Estimation for Time Course Gene Expression Data Using the Sequential Partial Least Squares Regression Fitting (순차적 부분최소제곱 회귀적합에 의한 시간경로 유전자 발현 자료의 결측치 추정)

  • Kim, Kyung-Sook;Oh, Mi-Ra;Baek, Jang-Sun;Son, Young-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.2
    • /
    • pp.275-290
    • /
    • 2008
  • The size of microarray gene expression data is very big and its observation process is also very complex. Thus missing values are frequently occurred. In this paper we propose the sequential partial least squares(SPLS) regression fitting method to estimate missing values for time course gene expression data that has correlations among observations over time points. The SPLS method is to combine the sequential technique with the partial least squares(PLS) regression fitting method. The usefulness of method proposed is evaluated through some simulation study for three yeast time course data.

Simultaneous Kinetic Spectrophotometric Determination of Sulfite and Sulfide Using Partial Least Squares (PLS) Regression

  • Afkhami, Abbas;Sarlak, Nahid;Zarei, Ali Reza;Madrakian, Tayyebeh
    • Bulletin of the Korean Chemical Society
    • /
    • v.27 no.6
    • /
    • pp.863-868
    • /
    • 2006
  • The partial least squares (PLS-1) calibration model based on spectrophotometric measurement, for the simultaneous determination of sulfite and sulfide is described. This method is based on the difference between the rate of the reaction of sulfide and sulfite with Malachite Green in pH 7.0 buffer solution and at 25 ${^{\circ}C}$. The absorption kinetic profiles of the solutions were monitored by measuring the decrease in the absorbance of Malachite Green at 617 nm in the time range 10-180 s after initiation of the reactions with 2 s intervals. The experimental calibration matrix for partial least squares (PLS-1) calibration was designed with 24 samples. The cross-validation method was used for selecting the number of factors. The results showed that simultaneous determination could be performed in the range 0.030-1.5 and 0.030-1.2 $\mu$g m$L ^{-1}$ for sulfite and sulfide, respectively. The proposed method was successfully applied to simultaneous determination of sulfite and sulfide in water samples and whole human blood.

Investigation of Partial Least Squares (PLS) Calibration Performance based on Different Resolutions of Near Infrared Spectra

  • Chung, Hoe-Il;Choi, Seung-Yeol;Choo, Jae-Bum;Lee, Young-Il
    • Bulletin of the Korean Chemical Society
    • /
    • v.25 no.5
    • /
    • pp.647-651
    • /
    • 2004
  • Partial Least Squares (PLS) calibration performance has been systematically investigated by changing spectral resolutions of near-infrared (NIR) spectra. For this purpose, synthetic samples simulating naphtha were prepared to examine the calibration performance in complex chemical matrix. These samples were composed of $C_6-C_9$ normal paraffin, iso-paraffin, naphthene, and aromatic hydrocarbons. NIR spectra with four different resolutions of 4, 8, 16, and 32$cm^{-1}$ were collected and then PLS regression was performed. For PLS calibration, five different group compositions (such as total paraffin content) and six different pure components (such as benzene concentration) were selected. The overall results showed that at least 8$cm^{-1}$ resolution was required to resolve the complex chemical matrix such as naphtha. It was found that the influence of resolution on the PLS calibration was varied by the spectral features of a component.

AI Technology Analysis using Partial Least Square Regression

  • Choi, JunHyeog;Jun, Sunghae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.109-115
    • /
    • 2020
  • In this paper, we propose an artificial intelligence(AI) technology analysis using partial least square(PLS) regression model. AI technology is now affecting most areas of our society. So, it is necessary to understand this technology. To analyze the AI technology, we collect the patent documents related to AI from the patent databases in the world. We extract AI technology keywords from the patent documents by text mining techniques. In addition, we analyze the AI keyword data by PLS regression model. This regression model is based on the technique of partial least squares used in the advanced analyses such as bioinformatics, social science, and engineering. To show the performance of our proposed method, we make experiments using AI patent documents, and we illustrate how our research can be applied to real problems. This paper is applicable not only to AI technology but also to other technological fields. This also contributes to understanding other various technologies by PLS regression analysis.

Pathway and Network Analysis in Glioma with the Partial Least Squares Method

  • Gu, Wen-Tao;Gu, Shi-Xin;Shou, Jia-Jun
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.7
    • /
    • pp.3145-3149
    • /
    • 2014
  • Gene expression profiling facilitates the understanding of biological characteristics of gliomas. Previous studies mainly used regression/variance analysis without considering various background biological and environmental factors. The aim of this study was to investigate gene expression differences between grade III and IV gliomas through partial least squares (PLS) based analysis. The expression data set was from the Gene Expression Omnibus database. PLS based analysis was performed with the R statistical software. A total of 1,378 differentially expressed genes were identified. Survival analysis identified four pathways, including Prion diseases, colorectal cancer, CAMs, and PI3K-Akt signaling, which may be related with the prognosis of the patients. Network analysis identified two hub genes, ELAVL1 and FN1, which have been reported to be related with glioma previously. Our results provide new understanding of glioma pathogenesis and prognosis with the hope to offer theoretical support for future therapeutic studies.