• Title/Summary/Keyword: Ridge regression

Search Result 118, Processing Time 0.023 seconds

Development of Regression Models Resolving High-Dimensional Data and Multicollinearity Problem for Heavy Rain Damage Data (호우피해자료에서의 고차원 자료 및 다중공선성 문제를 해소한 회귀모형 개발)

  • Kim, Jeonghwan;Park, Jihyun;Choi, Changhyun;Kim, Hung Soo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.6
    • /
    • pp.801-808
    • /
    • 2018
  • The learning of the linear regression model is stable on the assumption that the sample size is sufficiently larger than the number of explanatory variables and there is no serious multicollinearity between explanatory variables. In this study, we investigated the difficulty of model learning when the assumption was violated by analyzing a real heavy rain damage data and we proposed to use a principal component regression model or a ridge regression model after integrating data to overcome the difficulty. We evaluated the predictive performance of the proposed models by using the test data independent from the training data, and confirmed that the proposed methods showed better predictive performances than the linear regression model.

Development of Ridge Regression Model of Pollutant Load Using Runoff Weighted Value Based on Distributed Curve-Number (분포형 CN 기반 토지피복별 유출가중치를 이용한 오염부하량 능형회귀모형 개발)

  • Song, Chul Min;Kim, Jin Soo
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.60 no.1
    • /
    • pp.111-120
    • /
    • 2018
  • The purpose of this study was to develop a ridge regression (RR) model to estimate BOD and TP load using runoff weighted value. The concept of runoff weighted value, based on distributed curve-number (CN), was introduced to reflect the impact of land covers on runoff. The estimated runoff depths by distributed CN were closer to the observed values than those by area weighted mean CN. The RR is a technique used when the data suffers from multicollinearity. The RR model was developed for five flow duration intervals with the independent variables of daily runoff discharge of seven land covers and dependent variables of daily pollutant load. The RR model was applied to Heuk river watershed, a subwatershed of the Han river watershed. The variance inflation factors of the RR model decreased to the value less than 10. The RR model showed a good performance with Nash-Sutcliffe efficiency (NSE) of 0.73 and 0.87, and Pearson correlation coefficient of 0.88 and 0.93 for BOD and TP, respectively. The results suggest that the methods used in the study can be applied to estimate pollutant load of different land cover watersheds using limited data.

Crop Yield and Crop Production Predictions using Machine Learning

  • Divya Goel;Payal Gulati
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.9
    • /
    • pp.17-28
    • /
    • 2023
  • Today Agriculture segment is a significant supporter of Indian economy as it represents 18% of India's Gross Domestic Product (GDP) and it gives work to half of the nation's work power. Farming segment are required to satisfy the expanding need of food because of increasing populace. Therefore, to cater the ever-increasing needs of people of nation yield prediction is done at prior. The farmers are also benefited from yield prediction as it will assist the farmers to predict the yield of crop prior to cultivating. There are various parameters that affect the yield of crop like rainfall, temperature, fertilizers, ph level and other atmospheric conditions. Thus, considering these factors the yield of crop is thus hard to predict and becomes a challenging task. Thus, motivated this work as in this work dataset of different states producing different crops in different seasons is prepared; which was further pre-processed and there after machine learning techniques Gradient Boosting Regressor, Random Forest Regressor, Decision Tree Regressor, Ridge Regression, Polynomial Regression, Linear Regression are applied and their results are compared using python programming.

Scar formation after lower eyelid incision for reconstruction of the inferior orbital wall related to the lower eyelid crease or ridge in Asians

  • Oh, Seong Jin;Kim, Kwang Seog;Choi, Jun Ho;Hwang, Jae Ha;Lee, Sam Yong
    • Archives of Craniofacial Surgery
    • /
    • v.22 no.6
    • /
    • pp.310-318
    • /
    • 2021
  • Background: Transcutaneous lower eyelid approaches are associated with a risk of postoperative scarring depending on the distance between the incision line and the lower eyelid margin. The lower eyelid crease of Caucasians corresponds to a ridge-shaped fold in young Asians. However, this relationship has not been sufficiently evaluated in the latter. The authors, therefore, investigated the location of the scar and the lower eyelid crease or ridge to find the optimal location for the incision line. Methods: This study included 60 out of 139 patients who underwent inferior orbital wall reconstruction through a lower eyelid skin incision between July 2019 and June 2020. According to the location of the scar, the patients were classified into three groups: group A (≥ 2 mm above the lower eyelid crease or ridge), group B (within the lower eyelid crease or ridge to 2 mm above the lower eyelid crease or ridge), and group C (within the lower eyelid crease or ridge to 2 mm below the lower eyelid crease or ridge). At 6 or 12 months after surgery, the Patient and Observer Scar Assessment Scale (POSAS) score was obtained, the distance between the lower eyelid margin and the scar (DMS) and the distance between the margins of the peripheral pupil and the lower eyelid (DMPE) were measured, and the occurrence of ectropion was evaluated. Results: Group B had the lowest POSAS score (A: 22.7 ± 8.0, B: 20.9 ± 2.4, C: 32.5 ± 4.1, p< 0.001). Linear regression analysis showed that the DMS was positively correlated with the POSAS score (p< 0.001) and that the risk of DMPE widening increased as the DMS decreased (p= 0.029). None of the patients had ectropion. Conclusion: When using the transcutaneous approach for inferior orbital wall reconstruction, the optimal incision site is within the lower eyelid crease or ridge to 2 mm above the lower eyelid crease or ridge.

Forecasting Korea's GDP growth rate based on the dynamic factor model (동적요인모형에 기반한 한국의 GDP 성장률 예측)

  • Kyoungseo Lee;Yaeji Lim
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.255-263
    • /
    • 2024
  • GDP represents the total market value of goods and services produced by all economic entities, including households, businesses, and governments in a country, during a specific time period. It is a representative economic indicator that helps identify the size of a country's economy and influences government policies, so various studies are being conducted on it. This paper presents a GDP growth rate forecasting model based on a dynamic factor model using key macroeconomic indicators of G20 countries. The extracted factors are combined with various regression analysis methodologies to compare results. Additionally, traditional time series forecasting methods such as the ARIMA model and forecasting using common components are also evaluated. Considering the significant volatility of indicators following the COVID-19 pandemic, the forecast period is divided into pre-COVID and post-COVID periods. The findings reveal that the dynamic factor model, incorporating ridge regression and lasso regression, demonstrates the best performance both before and after COVID.

Robust ridge regression for nonlinear mixed effects models with applications to quantitative high throughput screening assay data (비선형 혼합효과모형에서의 로버스트 능형회귀 방법과 정량적 고속 대량 스크리닝 자료에의 응용)

  • Yoo, Jiseon;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.123-137
    • /
    • 2018
  • A nonlinear mixed effects model is mainly used to analyze repeated measurement data in various fields. A nonlinear mixed effects model consists of two stages: the first-stage individual-level model considers intra-individual variation and the second-stage population model considers inter-individual variation. The individual-level model, which is the first stage of the nonlinear mixed effects model, estimates the parameters of the nonlinear regression model. It is the same as the general nonlinear regression model, and usually estimates parameters using the least squares estimation method. However, the least squares estimation method may have a problem that the estimated value of the parameters and standard errors become extremely large if the assumed nonlinear function is not explicitly revealed by the data. In this paper, a new estimation method is proposed to solve this problem by introducing the ridge regression method recently proposed in the nonlinear regression model into the first-stage individual-level model of the nonlinear mixed effects model. The performance of the proposed estimator is compared with the performance with the standard estimator through a simulation study. The proposed methodology is also illustrated using quantitative high throughput screening data obtained from the US National Toxicology Program.

CONFLICT AMONG THE SHRINKAGE ESTIMATORS INDUCED BY W, LR AND LM TESTS UNDER A STUDENT'S t REGRESSION MODEL

  • Kibria, B.M.-Golam
    • Journal of the Korean Statistical Society
    • /
    • v.33 no.4
    • /
    • pp.411-433
    • /
    • 2004
  • The shrinkage preliminary test ridge regression estimators (SPTRRE) based on Wald (W), Likelihood Ratio (LR) and Lagrangian Multiplier (LM) tests for estimating the regression parameters of the multiple linear regression model with multivariate Student's t error distribution are considered in this paper. The quadratic biases and risks of the proposed estimators are compared under both null and alternative hypotheses. It is observed that there is conflict among the three estimators with respect to their risks because of certain inequalities that exist among the test statistics. In the neighborhood of the restriction, the SPTRRE based on LM test has the smallest risk followed by the estimators based on LR and W tests. However, the SPTRRE based on W test performs the best followed by the LR and LM based estimators when the parameters move away from the subspace of the restrictions. Some tables for the maximum and minimum guaranteed efficiency of the proposed estimators have been given, which allow us to determine the optimum level of significance corresponding to the optimum estimator among proposed estimators. It is evident that in the choice of the smallest significance level to yield the best estimator the SPTRRE based on Wald test dominates the other two estimators.

Penalized logistic regression models for determining the discharge of dyspnea patients (호흡곤란 환자 퇴원 결정을 위한 벌점 로지스틱 회귀모형)

  • Park, Cheolyong;Kye, Myo Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.125-133
    • /
    • 2013
  • In this paper, penalized binary logistic regression models are employed as statistical models for determining the discharge of 668 patients with a chief complaint of dyspnea based on 11 blood tests results. Specifically, the ridge model based on $L^2$ penalty and the Lasso model based on $L^1$ penalty are considered in this paper. In the comparison of prediction accuracy, our models are compared with the logistic regression models with all 11 explanatory variables and the selected variables by variable selection method. The results show that the prediction accuracy of the ridge logistic regression model is the best among 4 models based on 10-fold cross-validation.

A Study on Regularization Methods to Evaluate the Sediment Trapping Efficiency of Vegetative Filter Strips (식생여과대 유사 저감 효율 산정을 위한 정규화 방안)

  • Bae, JooHyun;Han, Jeongho;Yang, Jae E;Kim, Jonggun;Lim, Kyoung Jae;Jang, Won Seok
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.61 no.6
    • /
    • pp.9-19
    • /
    • 2019
  • Vegetative Filter Strip (VFS) is the best management practice which has been widely used to mitigate water pollutants from agricultural fields by alleviating runoff and sediment. This study was conducted to improve an equation for estimating sediment trapping efficiency of VFS using several different regularization methods (i.e., ordinary least squares analysis, LASSO, ridge regression analysis and elastic net). The four different regularization methods were employed to develop the sediment trapping efficiency equation of VFS. Each regularization method indicated high accuracy in estimating the sediment trapping efficiency of VFS. Among the four regularization methods, the ridge method showed the most accurate results according to $R^2$, RMSE and MAPE which were 0.94, 7.31% and 14.63%, respectively. The equation developed in this study can be applied in watershed-scale hydrological models in order to estimate the sediment trapping efficiency of VFS in agricultural fields for an effective watershed management in Korea.

A Ridge-type Estimator For Generalized Linear Models (일반화 선형모형에서의 능형형태의 추정량)

  • Byoung Jin Ahn
    • The Korean Journal of Applied Statistics
    • /
    • v.7 no.1
    • /
    • pp.75-82
    • /
    • 1994
  • It is known that collinearity among the explanatory variables in generalized linear models inflates the variance of maximum likelihood estimators. A ridge-type estimator is presented using penalized likelihood. A method for choosing a shrinkage parameter is discussed and this method is based on a prediction-oriented criterion, which is Mallow's $C_L$ statistic in a linear regression setting.

  • PDF