• Title/Summary/Keyword: Multiple-Linear-Regression

Search Result 1,745, Processing Time 0.03 seconds

Estimating the Total Precipitation Amount with Simulated Precipitation for Ungauged Stations in Jeju Island (미계측 관측 강수 자료 생성을 통한 제주도 지역의 수문총량 추정)

  • Kim, Nam-Won;Um, Myoung-Jin;Chung, Il-Moon;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.45 no.9
    • /
    • pp.875-885
    • /
    • 2012
  • In this study, the total precipitation amount in Jeju Island was estimated with the simulated precipitation for ungauged stations missing precipitation data using the spatial precipitation analysis. The missing data were generated through the modified multiple linear regression in this study, and the analysis of spatial precipitation was conducted with the PRISM(Parameter-elevation Regression on Independent Slope Model). The generated data with modified multiple linear regression model have similar pattern with original data. Thus, the model in this study shows good applicability to estimate the missing data. The difference of annual average precipitation between Case 1 (original data) and Case 2 (modified data) appears very small ratio which is about 1.5%. However, the difference of annual average precipitation according to elevation shows the large ratio up to 37.4%. As the results, the method of estimating missing data in this study would be useful to calculate the total precipitation amount at the low station density area and the places with the high spatial variation of precipitation.

DETECTION OF OUTLIERS IN WEIGHTED LEAST SQUARES REGRESSION

  • Shon, Bang-Yong;Kim, Guk-Boh
    • Journal of applied mathematics & informatics
    • /
    • v.4 no.2
    • /
    • pp.501-512
    • /
    • 1997
  • In multiple linear regression model we have presupposed assumptions (independence normality variance homogeneity and so on) on error term. When case weights are given because of variance heterogeneity we can estimate efficiently regression parameter using weighted least squares estimator. Unfortunately this estimator is sen-sitive to outliers like ordinary least squares estimator. Thus in this paper we proposed some statistics for detection of outliers in weighted least squares regression.

Price Determinant Factors of Artworks and Prediction Model Based on Machine Learning (작품 가격 추정을 위한 기계 학습 기법의 응용 및 가격 결정 요인 분석)

  • Jang, Dongryul;Park, Minjae
    • Journal of Korean Society for Quality Management
    • /
    • v.47 no.4
    • /
    • pp.687-700
    • /
    • 2019
  • Purpose: The purpose of this study is to investigate the interaction effects between price determinants of artworks. We expand the methodology in art market by applying machine learning techniques to estimate the price of artworks and compare linear regression and machine learning in terms of prediction accuracy. Methods: Moderated regression analysis was performed to verify the interaction effects of artistic characteristics on price. The moderating effects were studied by confirming the significance level of the interaction terms of the derived regression equation. In order to derive price estimation model, we use multiple linear regression analysis, which is a parametric statistical technique, and k-nearest neighbor (kNN) regression, which is a nonparametric statistical technique in machine learning methods. Results: Mostly, the influences of the price determinants of art are different according to the auction types and the artist 's reputation. However, the auction type did not control the influence of the genre of the work on the price. As a result of the analysis, the kNN regression was superior to the linear regression analysis based on the prediction accuracy. Conclusion: It provides a theoretical basis for the complexity that exists between pricing determinant factors of artworks. In addition, the nonparametric models and machine learning techniques as well as existing parameter models are implemented to estimate the artworks' price.

MULTIPLE DELETION MEASURES OF TEST STATISTICS IN MULTIVARIATE REGRESSION

  • Jung, Kang-Mo
    • Journal of applied mathematics & informatics
    • /
    • v.26 no.3_4
    • /
    • pp.679-688
    • /
    • 2008
  • In multivariate regression analysis there exist many influence measures on the regression estimates. However it seems to be few of influence diagnostics on test statistics in hypothesis testing. Case-deletion approach is fundamental for investigating influence of observations on estimates or statistics. Tang and Fung (1997) derived single case-deletion of the Wilks' ratio, Lawley-Hotelling trace, Pillai's trace for testing a general linear hypothesis of the regression coefficients in multivariate regression. In this paper we derived more extended form of those measures to deal with joint influence among observations. A numerical example is given to illustrate the effect of joint influence on the test statistics.

  • PDF

Correlation Analysis of Water Quality According to Land Use Types of Reservoir Watershed (유역 토지이용과 저수지 수질의 상관관계 분석)

  • Youn, Dong-Koun;Chung, Sang-Ok
    • Proceedings of the Korean Society of Agricultural Engineers Conference
    • /
    • 2005.10a
    • /
    • pp.614-619
    • /
    • 2005
  • The object of this study was to presented regression equations for obtaining simply and quickly values of water quality items, BOD, COD, T-N, and T-P. Regression equations obtained to analyze relationships for water quality items to land use types in agricultural reservoir watersheds. In order to derive regression equations, a multiple linear regression analysis was used in this studying reservoirs. In this regression analysis, a independent values used land used types and dependent values used BOD, COD, T-N, T-P values in water quality items. The results showed that numbers of regression equation ranging above 0.90 in a multiple correlation coefficient (MCC) was not found, ranging from 0.70 to 0.90 in the MCC was 6, ranging from 0.40 to 0.70 in the MCC was 20, and ranging from 0.20 to 0.40 in the MCC was 4. The results of this study can be used as a basic information for evaluating simply and quickly water quality for proposing and designing steps in water quality policy.

  • PDF

Procedures for Detecting Multiple Outliers in Linear Regression Using R

  • Kwon, Soon-Sun;Lee, Gwi-Hyun;Park, Sung-Hyun
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.13-17
    • /
    • 2005
  • In recent years, many people use R as a statistics system. R is frequently updated by many R project teams. We are interested in the method of multiple outlier detection and know that R is not supplied the method of multiple outlier detection. In this talk, we review these procedures for detecting multiple outliers and provide more efficient procedures combined with direct methods and indirect methods using R.

  • PDF

Ensemble variable selection using genetic algorithm

  • Seogyoung, Lee;Martin Seunghwan, Yang;Jongkyeong, Kang;Seung Jun, Shin
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.629-640
    • /
    • 2022
  • Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data. Both simulation and real data analysis demonstrate the promising performance of the proposed method.

A Study on the Estimating Solar Radiation in Korea Using Cloud Cover and Hours of Bright Sunshine (국내 운량과 일조시간에 의한 태양광에너지 예측에 관한 연구)

  • Jo, Dok-Ki;Yun, Chang-Yeol;Kim, Kwang-Deuk;Kang, Young-Heack
    • Journal of the Korean Solar Energy Society
    • /
    • v.32 no.2
    • /
    • pp.28-34
    • /
    • 2012
  • It is necessary to estimate the regression coefficients in order to predict the daily global radiation on a horizontal surface. Therefore many different equations have proposed to evaluate them for certain areas. In this work a new correlation has been made to predict the solar radiation for 16 different areas over Korea by estimating the regression coefficients taking into account cloud hours of bright sunshine. Particularly, the multiple linear regression model proposed shows reliable results for estimating the global radiation on a horizontal surface with monthly average deviation of-0.26 to +0.53% and each station annual average deviation of -1.61 to +1.7% from measured values.

Interpretation of Relationship Between Sesame Yield and It's components under Early Sowing Cropping Condition

  • Shim Kang-Bo;Kang Churl-Whan;Seong Jae-Duck;Hwang Chung-Dong;Suh Duck-Yong
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.51 no.4
    • /
    • pp.269-273
    • /
    • 2006
  • Multiple linear regression analysis was conducted to interpretate the relationship between sesame grain yield and its components under early sowing cropping condition. The t test showed that stem length, number of capsules per plant, 1000 seeds weight and seed weight per plant gave significant contribution to sesame grain yield, therefore those variables were assumed to mostly influenced components to grain yield of sesame. In the stepwise regression analysis, the predicted equation for sesame grain yield per square meter (Y) was Y = -7.900 + 0.150X1 + 0.461X5 + 15.553X6 + 8.543X7. Meanwhile, F value showed that stem length, number of capsules per plant and seed weight per plant gave significant contribution to sesame grain yield, while 1000 seeds weight did not significantly show. Based on the results, it is reasonable to assume that high yield. potential of sesame under early sowing cropping condition would be obtained by selecting breeding lines with long stem length, number of capsules per plant, and seed weight per plant, which was different result at the late sowing cropping condition in which days to flowering and maturity were assumed to be more affected factors to the sesame grain yield.

Application of Multiple Linear Regression Analysis and Tree-Based Machine Learning Techniques for Cutter Life Index(CLI) Prediction (커터수명지수 예측을 위한 다중선형회귀분석과 트리 기반 머신러닝 기법 적용)

  • Ju-Pyo Hong;Tae Young Ko
    • Tunnel and Underground Space
    • /
    • v.33 no.6
    • /
    • pp.594-609
    • /
    • 2023
  • TBM (Tunnel Boring Machine) method is gaining popularity in urban and underwater tunneling projects due to its ability to ensure excavation face stability and minimize environmental impact. Among the prominent models for predicting disc cutter life, the NTNU model uses the Cutter Life Index(CLI) as a key parameter, but the complexity of testing procedures and rarity of equipment make measurement challenging. In this study, CLI was predicted using multiple linear regression analysis and tree-based machine learning techniques, utilizing rock properties. Through literature review, a database including rock uniaxial compressive strength, Brazilian tensile strength, equivalent quartz content, and Cerchar abrasivity index was built, and derived variables were added. The multiple linear regression analysis selected input variables based on statistical significance and multicollinearity, while the machine learning prediction model chose variables based on their importance. Dividing the data into 80% for training and 20% for testing, a comparative analysis of the predictive performance was conducted, and XGBoost was identified as the optimal model. The validity of the multiple linear regression and XGBoost models derived in this study was confirmed by comparing their predictive performance with prior research.