• Title/Summary/Keyword: Stepwise regression model

Search Result 382, Processing Time 0.029 seconds

A Prediction Method Combining Clustering Method and Stepwise Regression (군집분석 기법과 단계별 회귀모델을 결합한 예측 방법)

  • Chong Il-gyo;Jun Chi-Hyuck
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2002.05a
    • /
    • pp.949-952
    • /
    • 2002
  • A regression model is used in predicting the response variable given predictor variables However, in case of large number of predictor variables, a regression model has some problems such as multicollinearity, interpretation of the functional relationship between the response and predictors and prediction accuracy. A clustering method and stepwise regression could be used to reduce the amount of data by grouping predictors having similar properties and by selecting the subset of predictors. respectively. This paper proposes a prediction method combining clustering method and stepwise regression. The proposed method fits a global model and local models and predicts responses given new observations by using both models. The paper also compares the performance of proposed method with stepwise regression via a real data of ample obtained in a steel process.

  • PDF

Alternative Derivation of Stepwise Multivariate Linear Regression (段階的 多變量 線型回歸에 관하여)

  • 申敏雄;金周成
    • Journal of the Korean Statistical Society
    • /
    • v.7 no.2
    • /
    • pp.105-108
    • /
    • 1978
  • Freund, Vail, and Ross, Goldberger and Jochems and Goldberger have given some results for the stepwise estimation of the parameters of a univariate regression model, D.G. Kabe gave similar results for a multivariate linear regression model. We give here alternative derivation of some results derived by D.G. Kabe.

  • PDF

Analysis of Client Propensity in Cyber Counseling Using Bayesian Variable Selection

  • Pi, Su-Young
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.6 no.4
    • /
    • pp.277-281
    • /
    • 2006
  • Cyber counseling, one of the most compatible type of consultation for the information society, enables people to reveal their mental agonies and private problems anonymously, since it does not require face-to-face interview between a counsellor and a client. However, there are few cyber counseling centers which provide high quality and trustworthy service, although the number of cyber counseling center has highly increased. Therefore, this paper is intended to enable an appropriate consultation for each client by analyzing client propensity using Bayesian variable selection. Bayesian variable selection is superior to stepwise regression analysis method in finding out a regression model. Stepwise regression analysis method, which has been generally used to analyze individual propensity in linear regression model, is not efficient since it is hard to select a proper model for its own defects. In this paper, based on the case database of current cyber counseling centers in the web, we will analyze clients' propensities using Bayesian variable selection to enable individually target counseling and to activate cyber counseling programs.

A Comparative Analysis of the Forecasting Performance of Coal and Iron Ore in Gwangyang Port Using Stepwise Regression and Artificial Neural Network Model (단계적 회귀분석과 인공신경망 모형을 이용한 광양항 석탄·철광석 물동량 예측력 비교 분석)

  • Cho, Sang-Ho;Nam, Hyung-Sik;Ryu, Ki-Jin;Ryoo, Dong-Keun
    • Journal of Navigation and Port Research
    • /
    • v.44 no.3
    • /
    • pp.187-194
    • /
    • 2020
  • It is very important to forecast freight volume accurately to establish major port policies and future operation plans. Thus, related studies are being conducted because of this importance. In this paper, stepwise regression analysis and artificial neural network model were analyzed to compare the predictive power of each model on Gwangyang Port, the largest domestic port for coal and iron ore transportation. Data of a total of 121 months J anuary 2009-J anuary 2019 were used. Factors affecting coal and iron ore trade volume were selected and classified into supply-related factors and market/economy-related factors. In the stepwise regression analysis, the tonnage of ships entering the port, coal price, and dollar exchange rate were selected as the final variables in case of the Gwangyang Port coal volume forecasting model. In the iron ore volume forecasting model, the tonnage of ships entering the port and the price of iron ore were selected as the final variables. In the analysis using the artificial neural network model, trial-and-error method that various Hyper-parameters affecting the performance of the model were selected to identify the most optimal model used. The analysis results showed that the artificial neural network model had better predictive performance than the stepwise regression analysis. The model which showed the most excellent performance was the Gwangyang Port Coal Volume Forecasting Artificial Neural Network Model. In comparing forecasted values by various predictive models and actually measured values, the artificial neural network model showed closer values to the actual highest point and the lowest point than the stepwise regression analysis.

A Climate Prediction Method Based on EMD and Ensemble Prediction Technique

  • Bi, Shuoben;Bi, Shengjie;Chen, Xuan;Ji, Han;Lu, Ying
    • Asia-Pacific Journal of Atmospheric Sciences
    • /
    • v.54 no.4
    • /
    • pp.611-622
    • /
    • 2018
  • Observed climate data are processed under the assumption that their time series are stationary, as in multi-step temperature and precipitation prediction, which usually leads to low prediction accuracy. If a climate system model is based on a single prediction model, the prediction results contain significant uncertainty. In order to overcome this drawback, this study uses a method that integrates ensemble prediction and a stepwise regression model based on a mean-valued generation function. In addition, it utilizes empirical mode decomposition (EMD), which is a new method of handling time series. First, a non-stationary time series is decomposed into a series of intrinsic mode functions (IMFs), which are stationary and multi-scale. Then, a different prediction model is constructed for each component of the IMF using numerical ensemble prediction combined with stepwise regression analysis. Finally, the results are fit to a linear regression model, and a short-term climate prediction system is established using the Visual Studio development platform. The model is validated using temperature data from February 1957 to 2005 from 88 weather stations in Guangxi, China. The results show that compared to single-model prediction methods, the EMD and ensemble prediction model is more effective for forecasting climate change and abrupt climate shifts when using historical data for multi-step prediction.

Evaluating Variable Selection Techniques for Multivariate Linear Regression (다중선형회귀모형에서의 변수선택기법 평가)

  • Ryu, Nahyeon;Kim, Hyungseok;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.5
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

A study on Estimation of NO2 concentration by Statistical model (통계모형을 이용한 NO2 농도 예측에 관한 연구)

  • Jang Nan-Sim
    • Journal of Environmental Science International
    • /
    • v.14 no.11
    • /
    • pp.1049-1056
    • /
    • 2005
  • [ $NO_2$ ] concentration characteristics of Busan metropolitan city was analysed by statistical method using hourly $NO_2$ concentration data$(1998\~2000)$ collected from air quality monitoring sites of the metropolitan city. 4 representative regions were selected among air quality monitoring sites of Ministry of environment. Concentration data of $NO_2$, 5 air pollutants, and data collected at AWS was used. Both Stepwise Multiple Regression model and ARIMA model for prediction of $NO_2$ concentrations were adopted, and then their results were compared with observed concentration. While ARIMA model was useful for the prediction of daily variation of the concentration, it was not satisfactory for the prediction of both rapid variation and seasonal variation of the concentration. Multiple Regression model was better estimated than ARIMA model for prediction of $NO_2$ concentration.

A Multivariate Analysis of Korean Professional Players Salary (한국 프로스포츠 선수들의 연봉에 대한 다변량적 분석)

  • Song, Jong-Woo
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.441-453
    • /
    • 2008
  • We analyzed Korean professional basketball and baseball players salary under the assumption that it depends on the personal records and contribution to the team in the previous year. We extensively used data visualization tools to check the relationship among the variables, to find outliers and to do model diagnostics. We used multiple linear regression and regression tree to fit the model and used cross-validation to find an optimal model. We check the relationship between variables carefully and chose a set of variables for the stepwise regression instead of using all variables. We found that points per game, number of assists, number of free throw successes, career are important variables for the basketball players. For the baseball pitchers, career, number of strike-outs per 9 innings, ERA, number of homeruns are important variables. For the baseball hitters, career, number of hits, FA are important variables.

Parameter Calibration of Storage Function Model and Flood Forecasting (2) Comparative Study on the Flood Forecasting Methods (저류함수모형의 매개변수 보정과 홍수예측 (2) 홍수예측방법의 비교 연구)

  • Kim, Bum Jun;Song, Jae Hyun;Kim, Hung Soo;Hong, Il Pyo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.1B
    • /
    • pp.39-50
    • /
    • 2006
  • The flood control offices of main rivers have used a storage function model to forecast flood stage in Korea and studies of flood forecasting actively have been done even now. On this account, the storage function model, which is used in flood control office, regression models and artificial neural network model are applied into flood forecasting of study watershed in this paper. The result obtained by each method are analyzed for the comparative study. In case of storage function model, this paper uses the representative parameters of the flood control offices and the optimized parameters. Regression coefficients are obtained by regression analysis and neural network is trained by backpropagation algorithm after selecting four events between 1995 to 2001. As a result of this study, it is shown that the optimized parameters are superior to the representative parameters for flood forecasting. The results obtained by multiple, robust, stepwise regression analysis, one of the regression methods, show very good forecasts. Although the artificial neural network model shows less exact results than the regression model, it can be efficient way to produce a good forecasts.

Identifying Factors for Corn Yield Prediction Models and Evaluating Model Selection Methods

  • Chang Jiyul;Clay David E.
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.50 no.4
    • /
    • pp.268-275
    • /
    • 2005
  • Early predictions of crop yields call provide information to producers to take advantages of opportunities into market places, to assess national food security, and to provide early food shortage warning. The objectives of this study were to identify the most useful parameters for estimating yields and to compare two model selection methods for finding the 'best' model developed by multiple linear regression. This research was conducted in two 65ha corn/soybean rotation fields located in east central South Dakota. Data used to develop models were small temporal variability information (STVI: elevation, apparent electrical conductivity $(EC_a)$, slope), large temporal variability information (LTVI : inorganic N, Olsen P, soil moisture), and remote sensing information (green, red, and NIR bands and normalized difference vegetation index (NDVI), green normalized difference vegetation index (GDVI)). Second order Akaike's Information Criterion (AICc) and Stepwise multiple regression were used to develop the best-fitting equations in each system (information groups). The models with $\Delta_i\leq2$ were selected and 22 and 37 models were selected at Moody and Brookings, respectively. Based on the results, the most useful variables to estimate corn yield were different in each field. Elevation and $EC_a$ were consistently the most useful variables in both fields and most of the systems. Model selection was different in each field. Different number of variables were selected in different fields. These results might be contributed to different landscapes and management histories of the study fields. The most common variables selected by AICc and Stepwise were different. In validation, Stepwise was slightly better than AICc at Moody and at Brookings AICc was slightly better than Stepwise. Results suggest that the Alec approach can be used to identify the most useful information and select the 'best' yield models for production fields.