• 제목/요약/키워드: backward elimination

검색결과 36건 처리시간 0.018초

다중회귀모형에서 전진선택과 후진제거의 기하학적 표현 (Geometrical description based on forward selection & backward elimination methods for regression models)

  • 홍종선;김명진
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권5호
    • /
    • pp.901-908
    • /
    • 2010
  • 다중회귀모형에서 변수선택법 중에서 전진선택과 후진제거의 과정을 기하학적으로 표현하는 그래픽적 방법을 제안한다. 반지름이 1인 반원의 제1사분면에는 전진선택 과정을, 제2사분면에는 후진제거 과정을 표현한다. 각 단계에서 회귀제곱합을 벡터로 표현하고, 추가제곱합 또는 부분결정계수를 벡터 사이의 각도로 나타내며 벡터의 끝을 연결할 때 통계적으로 유의하면 점선으로 표현하여 부분가설검정의 통계적 분석결과를 인지할 수 있도록 작성한다. 이 방법을 이용하면 전진선택과 후진제거 방법에 의한 최종모형을 비교 분석하고 전체적으로 모형의 적합도를 파악할 수 있다.

다중선형회귀모형에서의 변수선택기법 평가 (Evaluating Variable Selection Techniques for Multivariate Linear Regression)

  • 류나현;김형석;강필성
    • 대한산업공학회지
    • /
    • 제42권5호
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

퍼지논리를 이용한 수평 머시닝 센터의 열변형 오차 모델링 (Thermal Error Modeling of a Horizontal Machining Center Using the Fuzzy Logic Strategy)

  • 이재하;이진현;양승한
    • 대한기계학회논문집A
    • /
    • 제24권10호
    • /
    • pp.2589-2596
    • /
    • 2000
  • As current manufacturing processes require high spindle speed and precise machining, increasing accuracy by reducing volumetric errors of the machine itself, particularly thermal errors, is very important. Thermal errors can be estimated by many empirical models, for example, an FEM model, a neural network model, a linear regression model, an engineering judgment model, etc. This paper discusses to make a modeling of thermal errors efficiently through backward elimination and fuzzy logic strategy. The model of a thermal error using fuzzy logic strategy overcomes limitation of accuracy in the linear regression model or the engineering judgment model. It shows that the fuzzy model has more better performance than linear regression model, though it has less number of thermal variables than the other. The fuzzy model does not need to have complex procedure such like multi-regression and to know the characteristics of the plant, and the parameters of the model can be mathematically calculated. Also, the fuzzy model can be applied to any machine, but it delivers greater accuracy and robustness.

퍼지논리를 이용한 수평 머시닝 센터의 열변형 오차 모델링 (Thermal Error Modeling of a Horizontal Machining Center Using the Fuzzy Logic Strategy)

  • 이재하;양승한
    • 한국공작기계학회:학술대회논문집
    • /
    • 한국공작기계학회 1999년도 춘계학술대회 논문집
    • /
    • pp.75-80
    • /
    • 1999
  • As current manufacturing processes require high spindle speed and precise machining, increasing accuracy by reducing volumetric errors of the machine itself, particularly thermal errors, is very important. Thermal errors can be estimated by many empirical models, for example, an FEM model, a neural network model, a linear regression model, an engineering judgment model etc. This paper discusses to make a modeling of thermal errors efficiently through backward elimination and fuzzy logic strategy. The model of a thermal error using fuzzy logic strategy overcome limitation of accuracy in the linear regression model or the engineering judgment model. And this model is compared with the engineering judgment model. It is not necessary complex process such like multi-regression analysis of the engineering judgment model. A fuzzy model does not need to know the characteristics of the plant, and the parameters of the model can be mathematically calculated. Like a regression model, this model can be applied to any machine, but it delivers greater accuracy and robustness.

  • PDF

기하학적 변수에 의한 다이옥신의 독성 예측 (Estimation of Biological Action of Dioxins by Some Geometric Descriptors)

  • Hwang, Inchul
    • Environmental Analysis Health and Toxicology
    • /
    • 제14권3호
    • /
    • pp.103-111
    • /
    • 1999
  • To effectively predict the lipophilicity, the aryl hydrocarbon receptor (AhR) affinity, and TEF (Toxic equivalency factor) of dioxins by geometrical descriptors, the multiple linear regression methods with the forward selection and backward elimination were employed with statistical validity. The lipophilicity, the Ah receptor binding affinity, and the toxic equivalency factor of dioxins could be predicted using some geometrical descriptors.

  • PDF

한국국민의 가계 금융부채에 대한 체감도 분석 (Analysis of Stress level of Korean Household Members due to Household Debt)

  • 오만숙;현승미
    • 응용통계연구
    • /
    • 제22권2호
    • /
    • pp.297-307
    • /
    • 2009
  • 최근 금융위기의 요인이 되고 있는 가계부채에 대하여 가계구성원이 느끼는 부담감, 즉, 가계부채에 대한 체감도에 가계구성원의 속성들(주택점유형태, 가구주 학력, 가구주 연령, 월소득, 거주지역)이 미치는 영향을 2004년도 국민은행이 조사한 실제자료를 가지고 분석하였다. 체감도를 부채에 대한 부담감이 낮음과 높음의 이항자료로 구분하여 가계구성원의 속성들을 설명변수로 갖는 로지스틱 회귀분석을 수행하였다. 적합도에 대한 우도비 통계량을 이용한 후진제거법을 사용하여 간단하면서도 자료를 잘 적합시키는 모형을 선택한 결과 2개의 2차 교호작용을 갖는 모형이 선택되었다. 선택된 모형에 대한 계수 추정치를 통하여 각 속성이 부채 체감도에 대하여 미치는 영향을 분석하였다. 또한 가계부채의 유무에 대하여 가계구성원의 속성들이 미치는 영향을 로지스틱 회귀모형을 통하여 유사한 방법으로 분석하였다 자가주택일수록, 월소득이 증가할수록, 가구주 학력이 낮을수록 그리고 가구주 연령이 낮아질수록 부채에 대한 체감도가 낮아짐을 알 수 있었다.

Wine Quality Prediction by Using Backward Elimination Based on XGBoosting Algorithm

  • Umer Zukaib;Mir Hassan;Tariq Khan;Shoaib Ali
    • International Journal of Computer Science & Network Security
    • /
    • 제24권2호
    • /
    • pp.31-42
    • /
    • 2024
  • Different industries mostly rely on quality certification for promoting their products or brands. Although getting quality certification, specifically by human experts is a tough job to do. But the field of machine learning play a vital role in every aspect of life, if we talk about quality certification, machine learning is having a lot of applications concerning, assigning and assessing quality certifications to different products on a macro level. Like other brands, wine is also having different brands. In order to ensure the quality of wine, machine learning plays an important role. In this research, we use two datasets that are publicly available on the "UC Irvine machine learning repository", for predicting the wine quality. Datasets that we have opted for our experimental research study were comprised of white wine and red wine datasets, there are 1599 records for red wine and 4898 records for white wine datasets. The research study was twofold. First, we have used a technique called backward elimination in order to find out the dependency of the dependent variable on the independent variable and predict the dependent variable, the technique is useful for predicting which independent variable has maximum probability for improving the wine quality. Second, we used a robust machine learning algorithm known as "XGBoost" for efficient prediction of wine quality. We evaluate our model on the basis of error measures, root mean square error, mean absolute error, R2 error and mean square error. We have compared the results generated by "XGBoost" with the other state-of-the-art machine learning techniques, experimental results have showed, "XGBoost" outperform as compared to other state of the art machine learning techniques.

딥러닝과 머신러닝을 이용한 아파트 실거래가 예측 (Apartment Price Prediction Using Deep Learning and Machine Learning)

  • 김학현;유환규;오하영
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제12권2호
    • /
    • pp.59-76
    • /
    • 2023
  • 코로나 시대 이후 아파트 가격 상승은 비상식적이었다. 이러한 불확실한 부동산 시장에서 가격 예측 연구는 매우 중요하다. 본 논문에서는 다양한 부동산 사이트에서 자료 수집 및 크롤링을 통해 2015년부터 2020년까지 87만개의 방대한 데이터셋을 구축하고 다양한 아파트 정보와 경제지표 등 가능한 많은 변수를 모은 뒤 미래 아파트 매매실거래가격을 예측하는 모델을 만든다. 해당 연구는 먼저 다중 공선성 문제를 변수 제거 및 결합으로 해결하였다. 이후 의미있는 독립변수들을 뽑아내는 전진선택법(Forward Selection), 후진소거법(Backward Elimination), 단계적선택법(Stepwise Selection), L1 Regularization, 주성분분석(PCA) 총 5개의 변수 선택 알고리즘을 사용했다. 또한 심층신경망(DNN), XGBoost, CatBoost, Linear Regression 총 4개의 머신러닝 및 딥러닝 알고리즘을 이용해 하이퍼파라미터 최적화 후 모델을 학습시키고 모형간 예측력을 비교하였다. 추가 실험에서는 DNN의 node와 layer 수를 바꿔가면서 실험을 진행하여 가장 적절한 node와 layer 수를 찾고자 하였다. 결론적으로 가장 성능이 우수한 모델로 2021년의 아파트 매매실거래가격을 예측한 후 실제 2021년 데이터와 비교한 결과 훌륭한 성과를 보였다. 이를 통해 머신러닝과 딥러닝은 다양한 경제 상황 속에서 투자자들이 주택을 구매할 때 올바른 판단을 할 수 있도록 도움을 줄 수 있을 것이라 확신한다.

의료진단 및 중요 검사 항목 결정 지원 시스템을 위한 랜덤 포레스트 알고리즘 적용 (Application of Random Forest Algorithm for the Decision Support System of Medical Diagnosis with the Selection of Significant Clinical Test)

  • 윤태균;이관수
    • 전기학회논문지
    • /
    • 제57권6호
    • /
    • pp.1058-1062
    • /
    • 2008
  • In clinical decision support system(CDSS), unlike rule-based expert method, appropriate data-driven machine learning method can easily provide the information of individual feature(clinical test) for disease classification. However, currently developed methods focus on the improvement of the classification accuracy for diagnosis. With the analysis of feature importance in classification, one may infer the novel clinical test sets which highly differentiate the specific diseases or disease states. In this background, we introduce a novel CDSS that integrate a classifier and feature selection module together. Random forest algorithm is applied for the classifier and the feature importance measure. The system selects the significant clinical tests discriminating the diseases by examining the classification error during backward elimination of the features. The superior performance of random forest algorithm in clinical classification was assessed against artificial neural network and decision tree algorithm by using breast cancer, diabetes and heart disease data in UCI Machine Learning Repository. The test with the same data sets shows that the proposed system can successfully select the significant clinical test set for each disease.

중국(中國)의 대한(對韓) 반(反)덤핑조사(調査) 요인(要因)에 관한 실증(實證) 연구(硏究) - 철강(鐵鋼).석유화학(石油化學).제지(製紙) 산업(産業) 중심(中心) - (A study on the Reason of China's Anti-Dumping inspection against South Korea)

  • 심윤수
    • 무역상무연구
    • /
    • 제30권
    • /
    • pp.145-174
    • /
    • 2006
  • An anti-dumping has become the trade policy of choice for developing countries as well as advanced countries, hence it is the impending issue to the export-oriented countries including Korea. After colligating the analysis on the trade and industrial policy between Korea and China as well as the analysis on the preceding research, the main reasons of anti-dumping were selected as followings; an unemployment rate, real GDP growth rate and consumer price increase as internal factors, and trade balance, regional coefficient and trade specification index as external factors. Then, the research on how the above seven variable factors can affect the number of anti-dumping measures was accomplished. For the empirical analysis, the above information was used after reorganizing them by on the quarterly basis. Through the use of the correlation analysis, backward elimination of multiple regression analysis model and time-series analysis, it has appeared that the unemployment rate appeared to be the most important factors of anti-dumping measures in addition to the increase rate of trade balance. The variable such as the unemployment rate is uncontrollable for us, so it is appropriate to establish and operate an preemptive monitoring system based on the increasing rate of the amount of export and increasing rate of trade surplus.

  • PDF