• Title/Summary/Keyword: Linear regression analysis

Search Result 2,903, Processing Time 0.024 seconds

An Approach to Applying Multiple Linear Regression Models by Interlacing Data in Classifying Similar Software

  • Lim, Hyun-il
    • Journal of Information Processing Systems
    • /
    • v.18 no.2
    • /
    • pp.268-281
    • /
    • 2022
  • The development of information technology is bringing many changes to everyday life, and machine learning can be used as a technique to solve a wide range of real-world problems. Analysis and utilization of data are essential processes in applying machine learning to real-world problems. As a method of processing data in machine learning, we propose an approach based on applying multiple linear regression models by interlacing data to the task of classifying similar software. Linear regression is widely used in estimation problems to model the relationship between input and output data. In our approach, multiple linear regression models are generated by training on interlaced feature data. A combination of these multiple models is then used as the prediction model for classifying similar software. Experiments are performed to evaluate the proposed approach as compared to conventional linear regression, and the experimental results show that the proposed method classifies similar software more accurately than the conventional model. We anticipate the proposed approach to be applied to various kinds of classification problems to improve the accuracy of conventional linear regression.

Robustness of model averaging methods for the violation of standard linear regression assumptions

  • Lee, Yongsu;Song, Juwon
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.2
    • /
    • pp.189-204
    • /
    • 2021
  • In a regression analysis, a single best model is usually selected among several candidate models. However, it is often useful to combine several candidate models to achieve better performance, especially, in the prediction viewpoint. Model combining methods such as stacking and Bayesian model averaging (BMA) have been suggested from the perspective of averaging candidate models. When the candidate models include a true model, it is expected that BMA generally gives better performance than stacking. On the other hand, when candidate models do not include the true model, it is known that stacking outperforms BMA. Since stacking and BMA approaches have different properties, it is difficult to determine which method is more appropriate under other situations. In particular, it is not easy to find research papers that compare stacking and BMA when regression model assumptions are violated. Therefore, in the paper, we compare the performance among model averaging methods as well as a single best model in the linear regression analysis when standard linear regression assumptions are violated. Simulations were conducted to compare model averaging methods with the linear regression when data include outliers and data do not include them. We also compared them when data include errors from a non-normal distribution. The model averaging methods were applied to the water pollution data, which have a strong multicollinearity among variables. Simulation studies showed that the stacking method tends to give better performance than BMA or standard linear regression analysis (including the stepwise selection method) in the sense of risks (see (3.1)) or prediction error (see (3.2)) when typical linear regression assumptions are violated.

Fuzzy Local Linear Regression Analysis

  • Hong, Dug-Hun;Kim, Jong-Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.2
    • /
    • pp.515-524
    • /
    • 2007
  • This paper deals with local linear estimation of fuzzy regression models based on Diamond(1998) as a new class of non-linear fuzzy regression. The purpose of this paper is to introduce a use of smoothing in testing for lack of fit of parametric fuzzy regression models.

  • PDF

Quantitative Analysis by Derivative Spectrophotometry (III) -Simultaneous quantitation of vitamin B group and vitamin C in by multiple linear regression analysis-

  • Park, Man-Ki;Cho, Jung-Hwan
    • Archives of Pharmacal Research
    • /
    • v.11 no.1
    • /
    • pp.45-51
    • /
    • 1988
  • The feature of resolution enhancement by derivative operation is linked to one of the multivariate analysis, which is multiple linear regression with two options, all possible and stepwise regression. Examined samples were synthetic mixtures of 5 vitamins, thiamine mononitrate, riboflavin phosphate, nicotinamide, pyridoxine hydrochloride and ascorbic acid. All components in mixture were quantified with reasonably good accuracy and precision. Whole data processing procedure was accomplished on-line by the development of three computer programs written in APPLESOFT BASIC language.

  • PDF

Multicollinarity in Logistic Regression

  • Jong-Han lee;Myung-Hoe Huh
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.2
    • /
    • pp.303-309
    • /
    • 1995
  • Many measures to detect multicollinearity in linear regression have been proposed in statistics and numerical analysis literature. Among them, condition number and variance inflation factor(VIF) are most popular. In this study, we give new interpretations of condition number and VIF in linear regression, using geometry on the explanatory space. In the same line, we derive natural measures of condition number and VIF for logistic regression. These computer intensive measures can be easily extended to evaluate multicollinearity in generalized linear models.

  • PDF

Efficiency of Aggregate Data in Non-linear Regression

  • Huh, Jib
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.2
    • /
    • pp.327-336
    • /
    • 2001
  • This work concerns estimating a regression function, which is not linear, using aggregate data. In much of the empirical research, data are aggregated for various reasons before statistical analysis. In a traditional parametric approach, a linear estimation of the non-linear function with aggregate data can result in unstable estimators of the parameters. More serious consequence is the bias in the estimation of the non-linear function. The approach we employ is the kernel regression smoothing. We describe the conditions when the aggregate data can be used to estimate the regression function efficiently. Numerical examples will illustrate our findings.

  • PDF

Comparison of Linear and Nonlinear Regressions and Elements Analysis for Wind Speed Prediction (풍속 예측을 위한 선형회귀분석과 비선형회귀분석 기법의 비교 및 인자분석)

  • Kim, Dongyeon;Seo, Kisung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.5
    • /
    • pp.477-482
    • /
    • 2015
  • Linear regressions and evolutionary nonlinear regression based compensation techniques for the short-range prediction of wind speed are investigated. Development of an efficient MOS(Model Output Statistics) is necessary to correct systematic errors of the model, but a linear regression based MOS is hard to manage an irregular nature of weather prediction. In order to solve the problem, a nonlinear and symbolic regression method using GP(Genetic Programming) is suggested for a development of MOS for wind speed prediction. The proposed method is compared to various linear regression methods for prediction of wind speed. Also, statistical analysis of distribution for UM elements for each method is executed. experiments are performed for KLAPS(Korea Local Analysis and Prediction System) re-analysis data from 2007 to 2013 year for Jeju Island and Busan area in South Korea.

A Comparative Study on the Performance of Bayesian Partially Linear Models

  • Woo, Yoonsung;Choi, Taeryon;Kim, Wooseok
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.885-898
    • /
    • 2012
  • In this paper, we consider Bayesian approaches to partially linear models, in which a regression function is represented by a semiparametric additive form of a parametric linear regression function and a nonparametric regression function. We make a comparative study on the performance of widely used Bayesian partially linear models in terms of empirical analysis. Specifically, we deal with three Bayesian methods to estimate the nonparametric regression function, one method using Fourier series representation, the other method based on Gaussian process regression approach, and the third method based on the smoothness of the function and differencing. We compare the numerical performance of three methods by the root mean squared error(RMSE). For empirical analysis, we consider synthetic data with simulation studies and real data application by fitting each of them with three Bayesian methods and comparing the RMSEs.

Comparison of MLR and SVR Based Linear and Nonlinear Regressions - Compensation for Wind Speed Prediction (MLR 및 SVR 기반 선형과 비선형회귀분석의 비교 - 풍속 예측 보정)

  • Kim, Junbong;Oh, Seungchul;Seo, Kisung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.5
    • /
    • pp.851-856
    • /
    • 2016
  • Wind speed is heavily fluctuated and quite local than other weather elements. It is difficult to improve the accuracy of prediction only in a numerical prediction model. An MOS (Model Output Statistics) technique is used to correct the systematic errors of the model using a statistical data analysis. The Most of previous MOS has used a linear regression model for weather prediction, but it is hard to manage an irregular nature of prediction of wind speed. In order to solve the problem, a nonlinear regression method using SVR (Support Vector Regression) is introduced for a development of MOS for wind speed prediction. Experiments are performed for KLAPS (Korea Local Analysis and Prediction System) re-analysis data from 2007 to 2013 year for Jeju Island and Busan area in South Korea. The MLR and SVR based linear and nonlinear methods are compared to each other for prediction accuracy of wind speed. Also, the comparison experiments are executed for the variation in the number of UM elements.