• Title/Summary/Keyword: backward elimination

Search Result 36, Processing Time 0.02 seconds

Geometrical description based on forward selection & backward elimination methods for regression models (다중회귀모형에서 전진선택과 후진제거의 기하학적 표현)

  • Hong, Chong-Sun;Kim, Moung-Jin
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.5
    • /
    • pp.901-908
    • /
    • 2010
  • A geometrical description method is proposed to represent the process of the forward selection and backward elimination methods among many variable selection methods for multiple regression models. This graphical method shows the process of the forward selection and backward elimination on the first and second quadrants, respectively, of half circle with a unit radius. At each step, the SSR is represented by the norm of vector and the extra SSR or partial determinant coefficient is represented by the angle between two vectors. Some lines are dotted when the partial F test results are statistically significant, so that statistical analysis could be explored. This geometrical description can be obtained the final regression models based on the forward selection and backward elimination methods. And the goodness-of-fit for the model could be explored.

Evaluating Variable Selection Techniques for Multivariate Linear Regression (다중선형회귀모형에서의 변수선택기법 평가)

  • Ryu, Nahyeon;Kim, Hyungseok;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.5
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

Thermal Error Modeling of a Horizontal Machining Center Using the Fuzzy Logic Strategy (퍼지논리를 이용한 수평 머시닝 센터의 열변형 오차 모델링)

  • Lee, Jae-Ha;Lee, Jin-Hyeon;Yang, Seung-Han
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.24 no.10 s.181
    • /
    • pp.2589-2596
    • /
    • 2000
  • As current manufacturing processes require high spindle speed and precise machining, increasing accuracy by reducing volumetric errors of the machine itself, particularly thermal errors, is very important. Thermal errors can be estimated by many empirical models, for example, an FEM model, a neural network model, a linear regression model, an engineering judgment model, etc. This paper discusses to make a modeling of thermal errors efficiently through backward elimination and fuzzy logic strategy. The model of a thermal error using fuzzy logic strategy overcomes limitation of accuracy in the linear regression model or the engineering judgment model. It shows that the fuzzy model has more better performance than linear regression model, though it has less number of thermal variables than the other. The fuzzy model does not need to have complex procedure such like multi-regression and to know the characteristics of the plant, and the parameters of the model can be mathematically calculated. Also, the fuzzy model can be applied to any machine, but it delivers greater accuracy and robustness.

Thermal Error Modeling of a Horizontal Machining Center Using the Fuzzy Logic Strategy (퍼지논리를 이용한 수평 머시닝 센터의 열변형 오차 모델링)

  • 이재하;양승한
    • Proceedings of the Korean Society of Machine Tool Engineers Conference
    • /
    • 1999.05a
    • /
    • pp.75-80
    • /
    • 1999
  • As current manufacturing processes require high spindle speed and precise machining, increasing accuracy by reducing volumetric errors of the machine itself, particularly thermal errors, is very important. Thermal errors can be estimated by many empirical models, for example, an FEM model, a neural network model, a linear regression model, an engineering judgment model etc. This paper discusses to make a modeling of thermal errors efficiently through backward elimination and fuzzy logic strategy. The model of a thermal error using fuzzy logic strategy overcome limitation of accuracy in the linear regression model or the engineering judgment model. And this model is compared with the engineering judgment model. It is not necessary complex process such like multi-regression analysis of the engineering judgment model. A fuzzy model does not need to know the characteristics of the plant, and the parameters of the model can be mathematically calculated. Like a regression model, this model can be applied to any machine, but it delivers greater accuracy and robustness.

  • PDF

Estimation of Biological Action of Dioxins by Some Geometric Descriptors (기하학적 변수에 의한 다이옥신의 독성 예측)

  • Hwang, Inchul
    • Environmental Analysis Health and Toxicology
    • /
    • v.14 no.3
    • /
    • pp.103-111
    • /
    • 1999
  • To effectively predict the lipophilicity, the aryl hydrocarbon receptor (AhR) affinity, and TEF (Toxic equivalency factor) of dioxins by geometrical descriptors, the multiple linear regression methods with the forward selection and backward elimination were employed with statistical validity. The lipophilicity, the Ah receptor binding affinity, and the toxic equivalency factor of dioxins could be predicted using some geometrical descriptors.

  • PDF

Analysis of Stress level of Korean Household Members due to Household Debt (한국국민의 가계 금융부채에 대한 체감도 분석)

  • Oh, Man-Suk;Hyun, Seung-Me
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.2
    • /
    • pp.297-307
    • /
    • 2009
  • Korean household debt is one of the main sources of the current financial crisis. This paper studies the impact of household members' attributes such as a type of housing(self-own or rent), education, age, average monthly income of the head of household, and the area of residence, on the stress level of the household members due to household debt. We analyze a real data set collected by KB Kookmin Bank in 2004. We consider low and high stress level as a binary response variable and use a logistic regression model with the attributes of household members as explanatory variables. A simple but well-fitting model is selected by backward elimination method based on the likelihood statistic for goodness-of-fit test, and the impact of the attributes on the stress level is studied from parameter estimates of the selected model. We also perform the similar analysis on a binary response variable which distinguishes households with no debt from the rest. From the analysis, the stress level tends to be low for households with self-own houses, high average monthly income, low education level, and young members.

Wine Quality Prediction by Using Backward Elimination Based on XGBoosting Algorithm

  • Umer Zukaib;Mir Hassan;Tariq Khan;Shoaib Ali
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.31-42
    • /
    • 2024
  • Different industries mostly rely on quality certification for promoting their products or brands. Although getting quality certification, specifically by human experts is a tough job to do. But the field of machine learning play a vital role in every aspect of life, if we talk about quality certification, machine learning is having a lot of applications concerning, assigning and assessing quality certifications to different products on a macro level. Like other brands, wine is also having different brands. In order to ensure the quality of wine, machine learning plays an important role. In this research, we use two datasets that are publicly available on the "UC Irvine machine learning repository", for predicting the wine quality. Datasets that we have opted for our experimental research study were comprised of white wine and red wine datasets, there are 1599 records for red wine and 4898 records for white wine datasets. The research study was twofold. First, we have used a technique called backward elimination in order to find out the dependency of the dependent variable on the independent variable and predict the dependent variable, the technique is useful for predicting which independent variable has maximum probability for improving the wine quality. Second, we used a robust machine learning algorithm known as "XGBoost" for efficient prediction of wine quality. We evaluate our model on the basis of error measures, root mean square error, mean absolute error, R2 error and mean square error. We have compared the results generated by "XGBoost" with the other state-of-the-art machine learning techniques, experimental results have showed, "XGBoost" outperform as compared to other state of the art machine learning techniques.

Apartment Price Prediction Using Deep Learning and Machine Learning (딥러닝과 머신러닝을 이용한 아파트 실거래가 예측)

  • Hakhyun Kim;Hwankyu Yoo;Hayoung Oh
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.2
    • /
    • pp.59-76
    • /
    • 2023
  • Since the COVID-19 era, the rise in apartment prices has been unconventional. In this uncertain real estate market, price prediction research is very important. In this paper, a model is created to predict the actual transaction price of future apartments after building a vast data set of 870,000 from 2015 to 2020 through data collection and crawling on various real estate sites and collecting as many variables as possible. This study first solved the multicollinearity problem by removing and combining variables. After that, a total of five variable selection algorithms were used to extract meaningful independent variables, such as Forward Selection, Backward Elimination, Stepwise Selection, L1 Regulation, and Principal Component Analysis(PCA). In addition, a total of four machine learning and deep learning algorithms were used for deep neural network(DNN), XGBoost, CatBoost, and Linear Regression to learn the model after hyperparameter optimization and compare predictive power between models. In the additional experiment, the experiment was conducted while changing the number of nodes and layers of the DNN to find the most appropriate number of nodes and layers. In conclusion, as a model with the best performance, the actual transaction price of apartments in 2021 was predicted and compared with the actual data in 2021. Through this, I am confident that machine learning and deep learning will help investors make the right decisions when purchasing homes in various economic situations.

Application of Random Forest Algorithm for the Decision Support System of Medical Diagnosis with the Selection of Significant Clinical Test (의료진단 및 중요 검사 항목 결정 지원 시스템을 위한 랜덤 포레스트 알고리즘 적용)

  • Yun, Tae-Gyun;Yi, Gwan-Su
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.6
    • /
    • pp.1058-1062
    • /
    • 2008
  • In clinical decision support system(CDSS), unlike rule-based expert method, appropriate data-driven machine learning method can easily provide the information of individual feature(clinical test) for disease classification. However, currently developed methods focus on the improvement of the classification accuracy for diagnosis. With the analysis of feature importance in classification, one may infer the novel clinical test sets which highly differentiate the specific diseases or disease states. In this background, we introduce a novel CDSS that integrate a classifier and feature selection module together. Random forest algorithm is applied for the classifier and the feature importance measure. The system selects the significant clinical tests discriminating the diseases by examining the classification error during backward elimination of the features. The superior performance of random forest algorithm in clinical classification was assessed against artificial neural network and decision tree algorithm by using breast cancer, diabetes and heart disease data in UCI Machine Learning Repository. The test with the same data sets shows that the proposed system can successfully select the significant clinical test set for each disease.

A study on the Reason of China's Anti-Dumping inspection against South Korea (중국(中國)의 대한(對韓) 반(反)덤핑조사(調査) 요인(要因)에 관한 실증(實證) 연구(硏究) - 철강(鐵鋼).석유화학(石油化學).제지(製紙) 산업(産業) 중심(中心) -)

  • Sim, Yoon-Soo
    • THE INTERNATIONAL COMMERCE & LAW REVIEW
    • /
    • v.30
    • /
    • pp.145-174
    • /
    • 2006
  • An anti-dumping has become the trade policy of choice for developing countries as well as advanced countries, hence it is the impending issue to the export-oriented countries including Korea. After colligating the analysis on the trade and industrial policy between Korea and China as well as the analysis on the preceding research, the main reasons of anti-dumping were selected as followings; an unemployment rate, real GDP growth rate and consumer price increase as internal factors, and trade balance, regional coefficient and trade specification index as external factors. Then, the research on how the above seven variable factors can affect the number of anti-dumping measures was accomplished. For the empirical analysis, the above information was used after reorganizing them by on the quarterly basis. Through the use of the correlation analysis, backward elimination of multiple regression analysis model and time-series analysis, it has appeared that the unemployment rate appeared to be the most important factors of anti-dumping measures in addition to the increase rate of trade balance. The variable such as the unemployment rate is uncontrollable for us, so it is appropriate to establish and operate an preemptive monitoring system based on the increasing rate of the amount of export and increasing rate of trade surplus.

  • PDF