• Title/Summary/Keyword: Lasso regression

Search Result 107, Processing Time 0.021 seconds

MP-Lasso chart: a multi-level polar chart for visualizing group Lasso analysis of genomic data

  • Min Song;Minhyuk Lee;Taesung Park;Mira Park
    • Genomics & Informatics
    • /
    • v.20 no.4
    • /
    • pp.48.1-48.7
    • /
    • 2022
  • Penalized regression has been widely used in genome-wide association studies for joint analyses to find genetic associations. Among penalized regression models, the least absolute shrinkage and selection operator (Lasso) method effectively removes some coefficients from the model by shrinking them to zero. To handle group structures, such as genes and pathways, several modified Lasso penalties have been proposed, including group Lasso and sparse group Lasso. Group Lasso ensures sparsity at the level of pre-defined groups, eliminating unimportant groups. Sparse group Lasso performs group selection as in group Lasso, but also performs individual selection as in Lasso. While these sparse methods are useful in high-dimensional genetic studies, interpreting the results with many groups and coefficients is not straightforward. Lasso's results are often expressed as trace plots of regression coefficients. However, few studies have explored the systematic visualization of group information. In this study, we propose a multi-level polar Lasso (MP-Lasso) chart, which can effectively represent the results from group Lasso and sparse group Lasso analyses. An R package to draw MP-Lasso charts was developed. Through a real-world genetic data application, we demonstrated that our MP-Lasso chart package effectively visualizes the results of Lasso, group Lasso, and sparse group Lasso.

Comparison of Lasso Type Estimators for High-Dimensional Data

  • Kim, Jaehee
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.4
    • /
    • pp.349-361
    • /
    • 2014
  • This paper compares of lasso type estimators in various high-dimensional data situations with sparse parameters. Lasso, adaptive lasso, fused lasso and elastic net as lasso type estimators and ridge estimator are compared via simulation in linear models with correlated and uncorrelated covariates and binary regression models with correlated covariates and discrete covariates. Each method is shown to have advantages with different penalty conditions according to sparsity patterns of regression parameters. We applied the lasso type methods to Arabidopsis microarray gene expression data to find the strongly significant genes to distinguish two groups.

Simple principal component analysis using Lasso (라소를 이용한 간편한 주성분분석)

  • Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.3
    • /
    • pp.533-541
    • /
    • 2013
  • In this study, a simple principal component analysis using Lasso is proposed. This method consists of two steps. The first step is to compute principal components by the principal component analysis. The second step is to regress each principal component on the original data matrix by Lasso regression method. Each of new principal components is computed as the linear combination of original data matrix using the scaled estimated Lasso regression coefficient as the coefficients of the combination. This method leads to easily interpretable principal components with more 0 coefficients by the properties of Lasso regression models. This is because the estimator of the regression of each principal component on the original data matrix is the corresponding eigenvector. This method is applied to real and simulated data sets with the help of an R package for Lasso regression and its usefulness is demonstrated.

An Additive Sparse Penalty for Variable Selection in High-Dimensional Linear Regression Model

  • Lee, Sangin
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.2
    • /
    • pp.147-157
    • /
    • 2015
  • We consider a sparse high-dimensional linear regression model. Penalized methods using LASSO or non-convex penalties have been widely used for variable selection and estimation in high-dimensional regression models. In penalized regression, the selection and prediction performances depend on which penalty function is used. For example, it is known that LASSO has a good prediction performance but tends to select more variables than necessary. In this paper, we propose an additive sparse penalty for variable selection using a combination of LASSO and minimax concave penalties (MCP). The proposed penalty is designed for good properties of both LASSO and MCP.We develop an efficient algorithm to compute the proposed estimator by combining a concave convex procedure and coordinate descent algorithm. Numerical studies show that the proposed method has better selection and prediction performances compared to other penalized methods.

Evaluating Variable Selection Techniques for Multivariate Linear Regression (다중선형회귀모형에서의 변수선택기법 평가)

  • Ryu, Nahyeon;Kim, Hyungseok;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.5
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

Lasso Regression of RNA-Seq Data based on Bootstrapping for Robust Feature Selection (안정적 유전자 특징 선택을 위한 유전자 발현량 데이터의 부트스트랩 기반 Lasso 회귀 분석)

  • Jo, Jeonghee;Yoon, Sungroh
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.9
    • /
    • pp.557-563
    • /
    • 2017
  • When large-scale gene expression data are analyzed using lasso regression, the estimation of regression coefficients may be unstable due to the highly correlated expression values between associated genes. This irregularity, in which the coefficients are reduced by L1 regularization, causes difficulty in variable selection. To address this problem, we propose a regression model which exploits the repetitive bootstrapping of gene expression values prior to lasso regression. The genes selected with high frequency were used to build each regression model. Our experimental results show that several genes were consistently selected in all regression models and we verified that these genes were not false positives. We also identified that the sign distribution of the regression coefficients of the selected genes from each model was correlated to the real dependent variables.

A study on bias effect of LASSO regression for model selection criteria (모형 선택 기준들에 대한 LASSO 회귀 모형 편의의 영향 연구)

  • Yu, Donghyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.643-656
    • /
    • 2016
  • High dimensional data are frequently encountered in various fields where the number of variables is greater than the number of samples. It is usually necessary to select variables to estimate regression coefficients and avoid overfitting in high dimensional data. A penalized regression model simultaneously obtains variable selection and estimation of coefficients which makes them frequently used for high dimensional data. However, the penalized regression model also needs to select the optimal model by choosing a tuning parameter based on the model selection criterion. This study deals with the bias effect of LASSO regression for model selection criteria. We numerically describes the bias effect to the model selection criteria and apply the proposed correction to the identification of biomarkers for lung cancer based on gene expression data.

Variable Selection Via Penalized Regression

  • Yoon, Young-Joo;Song, Moon-Sup
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.615-624
    • /
    • 2005
  • In this paper, we review the variable-selection properties of LASSO and SCAD in penalized regression. To improve the weakness of SCAD for high noise level, we propose a new penalty function called MSCAD which relaxes the unbiasedness condition of SCAD. In order to compare MSCAD with LASSO and SCAD, comparative studies are performed on simulated datasets and also on a real dataset. The performances of penalized regression methods are compared in terms of relative model error and the estimates of coefficients. The results of experiments show that the performance of MSCAD is between those of LASSO and SCAD as expected.

VARIABLE SELECTION VIA PENALIZED REGRESSION

  • Yoon, Young-Joo;Song, Moon-Sup
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.05a
    • /
    • pp.7-12
    • /
    • 2005
  • In this paper, we review the variable-selection properties of LASSO and SCAD in penalized regression. To improve the weakness of SCAD for high noise level, we propose a new penalty function called MSCAD which relaxes the unbiasedness condition of SCAD. In order to compare MSCAD with LASSO and SCAD, comparative studies are performed on simulated datasets and also on a real dataset. The performances of penalized regression methods are compared in terms of relative model error and the estimates of coefficients. The results of experiments show that the performance of MSCAD is between those of LASSO and SCAD as expected.

  • PDF

Weighted Least Absolute Deviation Lasso Estimator

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.6
    • /
    • pp.733-739
    • /
    • 2011
  • The linear absolute shrinkage and selection operator(Lasso) method improves the low prediction accuracy and poor interpretation of the ordinary least squares(OLS) estimate through the use of $L_1$ regularization on the regression coefficients. However, the Lasso is not robust to outliers, because the Lasso method minimizes the sum of squared residual errors. Even though the least absolute deviation(LAD) estimator is an alternative to the OLS estimate, it is sensitive to leverage points. We propose a robust Lasso estimator that is not sensitive to outliers, heavy-tailed errors or leverage points.