• Title/Summary/Keyword: Regression Analysis

Search Result 23,392, Processing Time 0.049 seconds

Robustness of model averaging methods for the violation of standard linear regression assumptions

  • Lee, Yongsu;Song, Juwon
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.2
    • /
    • pp.189-204
    • /
    • 2021
  • In a regression analysis, a single best model is usually selected among several candidate models. However, it is often useful to combine several candidate models to achieve better performance, especially, in the prediction viewpoint. Model combining methods such as stacking and Bayesian model averaging (BMA) have been suggested from the perspective of averaging candidate models. When the candidate models include a true model, it is expected that BMA generally gives better performance than stacking. On the other hand, when candidate models do not include the true model, it is known that stacking outperforms BMA. Since stacking and BMA approaches have different properties, it is difficult to determine which method is more appropriate under other situations. In particular, it is not easy to find research papers that compare stacking and BMA when regression model assumptions are violated. Therefore, in the paper, we compare the performance among model averaging methods as well as a single best model in the linear regression analysis when standard linear regression assumptions are violated. Simulations were conducted to compare model averaging methods with the linear regression when data include outliers and data do not include them. We also compared them when data include errors from a non-normal distribution. The model averaging methods were applied to the water pollution data, which have a strong multicollinearity among variables. Simulation studies showed that the stacking method tends to give better performance than BMA or standard linear regression analysis (including the stepwise selection method) in the sense of risks (see (3.1)) or prediction error (see (3.2)) when typical linear regression assumptions are violated.

Multivariate statistical analysis of the comparative antioxidant activity of the total phenolics and tannins in the water and ethanol extracts of dried goji berry (Lycium chinense) fruits

  • Kim, Joo-Shin;Kimm, Haklin Alex
    • Korean Journal of Food Science and Technology
    • /
    • v.51 no.3
    • /
    • pp.227-236
    • /
    • 2019
  • Antioxidant activity in water and ethanol extracts of dried Lycium chinense fruit, as a result of the total phenolic and tannin content, was measured using a number of chemical and biochemical assays for radical scavenging and inhibition of lipid peroxidation, with the analysis being extended by applying a bootstrapping statistical method. Previous statistical analyses mostly provided linear correlation and regression analyses between antioxidant activity and increasing concentrations of phenolics and tannins in a concentration-dependent mode. The present study showed that multiple component or multivariate analysis by applying multiple regression analysis or regression planes proved more informative than linear regression analysis of the relationship between the concentration of individual components and antioxidant activity. In this paper, we represented the multivariate analysis of antioxidant activities of both phenolic and tannin contents combined in the water and ethanol extracts, which revealed the hidden observations that were not evident from linear statistical analysis.

Price Determinant Factors of Artworks and Prediction Model Based on Machine Learning (작품 가격 추정을 위한 기계 학습 기법의 응용 및 가격 결정 요인 분석)

  • Jang, Dongryul;Park, Minjae
    • Journal of Korean Society for Quality Management
    • /
    • v.47 no.4
    • /
    • pp.687-700
    • /
    • 2019
  • Purpose: The purpose of this study is to investigate the interaction effects between price determinants of artworks. We expand the methodology in art market by applying machine learning techniques to estimate the price of artworks and compare linear regression and machine learning in terms of prediction accuracy. Methods: Moderated regression analysis was performed to verify the interaction effects of artistic characteristics on price. The moderating effects were studied by confirming the significance level of the interaction terms of the derived regression equation. In order to derive price estimation model, we use multiple linear regression analysis, which is a parametric statistical technique, and k-nearest neighbor (kNN) regression, which is a nonparametric statistical technique in machine learning methods. Results: Mostly, the influences of the price determinants of art are different according to the auction types and the artist 's reputation. However, the auction type did not control the influence of the genre of the work on the price. As a result of the analysis, the kNN regression was superior to the linear regression analysis based on the prediction accuracy. Conclusion: It provides a theoretical basis for the complexity that exists between pricing determinant factors of artworks. In addition, the nonparametric models and machine learning techniques as well as existing parameter models are implemented to estimate the artworks' price.

A Study on the Influence of a Sewage Treatment Plant's Operational Parameters using the Multiple Regression Analysis Model

  • Lee, Seung-Pil;Min, Sang-Yun;Kim, Jin-Sik;Park, Jong-Un;Kim, Man-Soo
    • Environmental Engineering Research
    • /
    • v.19 no.1
    • /
    • pp.31-36
    • /
    • 2014
  • In this study, the influence of the control and operational parameters within a sewage treatment plant were reviewed by performing multiple regression analysis on the effluent quality of the sewage treatment. The data used for this review are based on the actual data from a sewage treatment plant using the media process within the year 2012. The prediction models of chemical oxygen demand ($COD_{Mn}$) and total nitrogen (T-N) within the effluent of the 2nd settling tank based on the multiple regression analysis yielded the prediction accuracy measurements of 0.93 and 0.84, respectively; and it was concluded that the model was accurately predicting the variances of the actual observed values. If the data on the energy spent on each operating condition can be collected, then the operating parameter that conserves energy without violating the effluent quality standards of COD and T-N can be determined using the regression model and the standardized regression coefficients. These results can provide appropriate operation guidelines to conserve energy to the operators at sewage treatment plants that consume a lot of energy.

Development of Prediction Model for Flexibly-reconfigurable Roll Forming based on Experimental Study (실험적 연구를 통한 비정형롤판재성형 예측 모델 개발)

  • Park, J.W.;Kil, M.G.;Yoon, J.S.;Kang, B.S.;Lee, K.
    • Transactions of Materials Processing
    • /
    • v.26 no.6
    • /
    • pp.341-347
    • /
    • 2017
  • Flexibly-reconfigurable roll forming (FRRF) is a novel sheet metal forming technology conducive to produce multi-curvature surfaces by controlling strain distribution along longitudinal direction. Reconfigurable rollers could be arranged to implement a kind of punch die set. By utilizing these reconfigurable rollers, desired curved surface can be formed. In FRRF process, three-dimensional surface is formed from two-dimensional curve. Thus, it is difficult to predict the forming result. In this study, a regression analysis was suggested to construct a predictive model for a longitudinal curvature of FRRF process. To facilitate investigation, input parameters affecting the longitudinal curvature of FRRF were determined as maximum compression value, curvature radius in the transverse direction, and initial blank width. Three-factor three-level full factorial experimental design was utilized and 27 experiments using FRRF apparatus were performed to obtain sample data of the regression model. Regression analysis was carried out using experimental results as sample data. The model used for regression analysis was a quadratic nonlinear regression model. Determination factor and root mean square root error were calculated to confirm the conformity of this model. Through goodness of fit test, this regression predictive model was verified.

Correlation Analysis of Water Quality According to Land Use Types of Reservoir Watershed (유역 토지이용과 저수지 수질의 상관관계 분석)

  • Youn, Dong-Koun;Chung, Sang-Ok
    • Proceedings of the Korean Society of Agricultural Engineers Conference
    • /
    • 2005.10a
    • /
    • pp.614-619
    • /
    • 2005
  • The object of this study was to presented regression equations for obtaining simply and quickly values of water quality items, BOD, COD, T-N, and T-P. Regression equations obtained to analyze relationships for water quality items to land use types in agricultural reservoir watersheds. In order to derive regression equations, a multiple linear regression analysis was used in this studying reservoirs. In this regression analysis, a independent values used land used types and dependent values used BOD, COD, T-N, T-P values in water quality items. The results showed that numbers of regression equation ranging above 0.90 in a multiple correlation coefficient (MCC) was not found, ranging from 0.70 to 0.90 in the MCC was 6, ranging from 0.40 to 0.70 in the MCC was 20, and ranging from 0.20 to 0.40 in the MCC was 4. The results of this study can be used as a basic information for evaluating simply and quickly water quality for proposing and designing steps in water quality policy.

  • PDF

Turning of Plastic Mold Steel(STAVAX) using Whisker Reinforced Ceramic (단침보강 세라믹 공구를 이용한 플라스틱 금형강(STAVAX)의 선삭가공)

  • Bae, Myung-Il;Lee, Yi-Seon
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.11 no.6
    • /
    • pp.36-41
    • /
    • 2012
  • In this study, we turning plastic mold steel (STAVAX) against cutting speed, depth of cut, feed rate using whisker reinforced ceramic tool (WA1). To predict cutting force, analyze principal, radial, feed force with multi-regression analysis. Results are follows: From the analysis of variance, affected factor to cutting force feed rate, depth of cut, cutting speed in order and cutting speed was very small affect to cutting force. From multi-regression analysis, we extracted regression equation and the coefficient of determination$(R^2)$ was 0.9, 0.88, 0.856 at principal, radial and feed force. It means regression equation is significant. From the experimental verification, it was confirmed that principal, radial and feed force was predictable by regression equation.

FORECASTING THE COST AND DURATION OF SCHOOL RECONSTRUCTION PROJECTS USING REGRESSION ANALYSIS

  • Wei Tong Chen;Ying-Hua Huang;Shen-Li Liao
    • International conference on construction engineering and project management
    • /
    • 2005.10a
    • /
    • pp.892-896
    • /
    • 2005
  • This paper collected 132 schools reconstruction projects in central Taiwan, which received the most serious damage from the Chi-Chi Earthquake. Regression analysis was implemented to build the prediction model of the cost and the duration for the collected projects. It is found that the cubic regression models are capable for predicting the cost and the duration of the projects contracted by the central agency of which the contracting awarding approach was based on the most advantageous tendering (MAT) approach. On the other hand, power regression models are capable for predicting the cost and the duration of the projects contracted through the low bid tendering (LBT) approach. It is also found that the performance of the regression prediction model differs in accordance with organizations that contracted the reconstruction projects.

  • PDF

Fuzzy Regression Analysis for Core Competency of Construction Subcontractors (건설협력업체 핵심역량의 퍼지회귀분석)

  • Kim, Seong-Il;Hwang, Seung-Gook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.3
    • /
    • pp.203-209
    • /
    • 2015
  • In this paper, we conducted a conventional regression and fuzzy regression analysis of the core competencies of construction subcontractors. The study was undertaken to check whether these two types of regression core capabilities affect the rating of construction subcontractor. Conventional regression result showed some effect on the rating of construction subcontractors on which core competencies to management and firm contribution were conducted. With fuzzy regression analysis, on the other hand, the rating of construction subcontractors could see the Min and Conjunction problem which utilize 100% reliability of Min. Max and Conjunction. From the above, the dependent variable of conventional regression could determine the evaluation grade of construction subcontractor. The fuzzy regression analysis shows the estimator of evaluation grade of the construction subcontractor including or corresponding to the fuzzy output data.

Study on Accident Prediction Models in Urban Railway Casualty Accidents Using Logistic Regression Analysis Model (로지스틱회귀분석 모델을 활용한 도시철도 사상사고 사고예측모형 개발에 대한 연구)

  • Jin, Soo-Bong;Lee, Jong-Woo
    • Journal of the Korean Society for Railway
    • /
    • v.20 no.4
    • /
    • pp.482-490
    • /
    • 2017
  • This study is a railway accident investigation statistic study with the purpose of prediction and classification of accident severity. Linear regression models have some difficulties in classifying accident severity, but a logistic regression model can be used to overcome the weaknesses of linear regression models. The logistic regression model is applied to escalator (E/S) accidents in all stations on 5~8 lines of the Seoul Metro, using data mining techniques such as logistic regression analysis. The forecasting variables of E/S accidents in urban railway stations are considered, such as passenger age, drinking, overall situation, behavior, and handrail grip. In the overall accuracy analysis, the logistic regression accuracy is explained 76.7%. According to the results of this analysis, it has been confirmed that the accuracy and the level of significance of the logistic regression analysis make it a useful data mining technique to establish an accident severity prediction model for urban railway casualty accidents.