• 제목/요약/키워드: Gradient boosting

검색결과 203건 처리시간 0.03초

대청호 Chl-a 예측을 위한 random forest와 gradient boosting 알고리즘 적용 연구 (A study on applying random forest and gradient boosting algorithm for Chl-a prediction of Daecheong lake)

  • 이상민;김일규
    • 상하수도학회지
    • /
    • 제35권6호
    • /
    • pp.507-516
    • /
    • 2021
  • In this study, the machine learning which has been widely used in prediction algorithms recently was used. the research point was the CD(chudong) point which was a representative point of Daecheong Lake. Chlorophyll-a(Chl-a) concentration was used as a target variable for algae prediction. to predict the Chl-a concentration, a data set of water quality and quantity factors was consisted. we performed algorithms about random forest and gradient boosting with Python. to perform the algorithms, at first the correlation analysis between Chl-a and water quality and quantity data was studied. we extracted ten factors of high importance for water quality and quantity data. as a result of the algorithm performance index, the gradient boosting showed that RMSE was 2.72 mg/m3 and MSE was 7.40 mg/m3 and R2 was 0.66. as a result of the residual analysis, the analysis result of gradient boosting was excellent. as a result of the algorithm execution, the gradient boosting algorithm was excellent. the gradient boosting algorithm was also excellent with 2.44 mg/m3 of RMSE in the machine learning hyperparameter adjustment result.

Cognitive Impairment Prediction Model Using AutoML and Lifelog

  • Hyunchul Choi;Chiho Yoon;Sae Bom Lee
    • 한국컴퓨터정보학회논문지
    • /
    • 제28권11호
    • /
    • pp.53-63
    • /
    • 2023
  • 본 연구는 고령층의 치매 예방을 위한 선별검사 수단으로 자동화된 기계학습(AutoML)을 활용하여 인지기능 장애 예측모형을 개발하였다. 연구 데이터는 한국지능정보사회진흥원의 '치매 고위험군 웨어러블 라이프로그 데이터'를 활용하였다. 분석은 구글 코랩 환경에서 PyCaret 3.0.0이 사용하여 우수한 분류성능을 보여주는 5개의 모형을 선정하고 앙상블 학습을 진행하여 모형을 통합한 뒤, 최종 성능평가를 진행하였다. 연구결과, Voting Classifier, Gradient Boosting Classifier, Extreme Gradient Boosting, Light Gradient Boosting Machine, Extra Trees Classifier, Random Forest Classifier 모형 순으로 높은 예측성능을 보이는 것으로 나타났다. 특히 '수면 중 분당 평균 호흡수'와 '수면 중 분당 평균 심박수'가 가장 중요한 특성변수(feature)로 확인되었다. 본 연구의 결과는 고령층의 인지기능 장애를 보다 효과적으로 관리하고 예방하기 위한 수단으로 기계학습과 라이프로그의 활용 가능성에 대한 고려를 시사한다.

A robust approach in prediction of RCFST columns using machine learning algorithm

  • Van-Thanh Pham;Seung-Eock Kim
    • Steel and Composite Structures
    • /
    • 제46권2호
    • /
    • pp.153-173
    • /
    • 2023
  • Rectangular concrete-filled steel tubular (RCFST) column, a type of concrete-filled steel tubular (CFST), is widely used in compression members of structures because of its advantages. This paper proposes a robust machine learning-based framework for predicting the ultimate compressive strength of RCFST columns under both concentric and eccentric loading. The gradient boosting neural network (GBNN), an efficient and up-to-date ML algorithm, is utilized for developing a predictive model in the proposed framework. A total of 890 experimental data of RCFST columns, which is categorized into two datasets of concentric and eccentric compression, is carefully collected to serve as training and testing purposes. The accuracy of the proposed model is demonstrated by comparing its performance with seven state-of-the-art machine learning methods including decision tree (DT), random forest (RF), support vector machines (SVM), deep learning (DL), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), and categorical gradient boosting (CatBoost). Four available design codes, including the European (EC4), American concrete institute (ACI), American institute of steel construction (AISC), and Australian/New Zealand (AS/NZS) are refereed in another comparison. The results demonstrate that the proposed GBNN method is a robust and powerful approach to obtain the ultimate strength of RCFST columns.

A gradient boosting regression based approach for energy consumption prediction in buildings

  • Bataineh, Ali S. Al
    • Advances in Energy Research
    • /
    • 제6권2호
    • /
    • pp.91-101
    • /
    • 2019
  • This paper proposes an efficient data-driven approach to build models for predicting energy consumption in buildings. Data used in this research is collected by installing humidity and temperature sensors at different locations in a building. In addition to this, weather data from nearby weather station is also included in the dataset to study the impact of weather conditions on energy consumption. One of the main emphasize of this research is to make feature selection independent of domain knowledge. Therefore, to extract useful features from data, two different approaches are tested: one is feature selection through principal component analysis and second is relative importance-based feature selection in original domain. The regression model used in this research is gradient boosting regression and its optimal parameters are chosen through a two staged coarse-fine search approach. In order to evaluate the performance of model, different performance evaluation metrics like r2-score and root mean squared error are used. Results have shown that best performance is achieved, when relative importance-based feature selection is used with gradient boosting regressor. Results of proposed technique has also outperformed the results of support vector machines and neural network-based approaches tested on the same dataset.

XGB 및 LGBM을 활용한 Ti-6Al-4V 적층재의 변형 거동 예측 (Predicting Deformation Behavior of Additively Manufactured Ti-6Al-4V Based on XGB and LGBM)

  • 천세호;유진영;김정기;오정석;남태현;이태경
    • 소성∙가공
    • /
    • 제31권4호
    • /
    • pp.173-178
    • /
    • 2022
  • The present study employed two different machine-learning approaches, the extreme gradient boosting (XGB) and light gradient boosting machine (LGBM), to predict a compressive deformation behavior of additively manufactured Ti-6Al-4V. Such approaches have rarely been verified in the field of metallurgy in contrast to artificial neural network and its variants. XGB and LGBM provided a good prediction for elongation to failure under an extrapolated condition of processing parameters. The predicting accuracy of these methods was better than that of response surface method. Furthermore, XGB and LGBM with optimum hyperparameters well predicted a deformation behavior of Ti-6Al-4V additively manufactured under the extrapolated condition. Although the predicting capability of two methods was comparable, LGBM was superior to XGB in light of six-fold higher rate of machine learning. It is also noted this work has verified the LGBM approach in solving the metallurgical problem for the first time.

Performance Comparison of Machine-learning Models for Analyzing Weather and Traffic Accident Correlations

  • Li Zi Xuan;Hyunho Yang
    • Journal of information and communication convergence engineering
    • /
    • 제21권3호
    • /
    • pp.225-232
    • /
    • 2023
  • Owing to advancements in intelligent transportation systems (ITS) and artificial-intelligence technologies, various machine-learning models can be employed to simulate and predict the number of traffic accidents under different weather conditions. Furthermore, we can analyze the relationship between weather and traffic accidents, allowing us to assess whether the current weather conditions are suitable for travel, which can significantly reduce the risk of traffic accidents. In this study, we analyzed 30000 traffic flow data points collected by traffic cameras at nearby intersections in Washington, D.C., USA from October 2012 to May 2017, using Pearson's heat map. We then predicted, analyzed, and compared the performance of the correlation between continuous features by applying several machine-learning algorithms commonly used in ITS, including random forest, decision tree, gradient-boosting regression, and support vector regression. The experimental results indicated that the gradient-boosting regression machine-learning model had the best performance.

투자와 수출 및 환율의 고용에 대한 의사결정 나무, 랜덤 포레스트와 그래디언트 부스팅 머신러닝 모형 예측 (Investment, Export, and Exchange Rate on Prediction of Employment with Decision Tree, Random Forest, and Gradient Boosting Machine Learning Models)

  • 이재득
    • 무역학회지
    • /
    • 제46권2호
    • /
    • pp.281-299
    • /
    • 2021
  • This paper analyzes the feasibility of using machine learning methods to forecast the employment. The machine learning methods, such as decision tree, artificial neural network, and ensemble models such as random forest and gradient boosting regression tree were used to forecast the employment in Busan regional economy. The following were the main findings of the comparison of their predictive abilities. First, the forecasting power of machine learning methods can predict the employment well. Second, the forecasting values for the employment by decision tree models appeared somewhat differently according to the depth of decision trees. Third, the predictive power of artificial neural network model, however, does not show the high predictive power. Fourth, the ensemble models such as random forest and gradient boosting regression tree model show the higher predictive power. Thus, since the machine learning method can accurately predict the employment, we need to improve the accuracy of forecasting employment with the use of machine learning methods.

Dynamic Caching Routing Strategy for LEO Satellite Nodes Based on Gradient Boosting Regression Tree

  • Yang Yang;Shengbo Hu;Guiju Lu
    • Journal of Information Processing Systems
    • /
    • 제20권1호
    • /
    • pp.131-147
    • /
    • 2024
  • A routing strategy based on traffic prediction and dynamic cache allocation for satellite nodes is proposed to address the issues of high propagation delay and overall delay of inter-satellite and satellite-to-ground links in low Earth orbit (LEO) satellite systems. The spatial and temporal correlations of satellite network traffic were analyzed, and the relevant traffic through the target satellite was extracted as raw input for traffic prediction. An improved gradient boosting regression tree algorithm was used for traffic prediction. Based on the traffic prediction results, a dynamic cache allocation routing strategy is proposed. The satellite nodes periodically monitor the traffic load on inter-satellite links (ISLs) and dynamically allocate cache resources for each ISL with neighboring nodes. Simulation results demonstrate that the proposed routing strategy effectively reduces packet loss rate and average end-to-end delay and improves the distribution of services across the entire network.

AN OPTIMAL BOOSTING ALGORITHM BASED ON NONLINEAR CONJUGATE GRADIENT METHOD

  • CHOI, JOOYEON;JEONG, BORA;PARK, YESOM;SEO, JIWON;MIN, CHOHONG
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • 제22권1호
    • /
    • pp.1-13
    • /
    • 2018
  • Boosting, one of the most successful algorithms for supervised learning, searches the most accurate weighted sum of weak classifiers. The search corresponds to a convex programming with non-negativity and affine constraint. In this article, we propose a novel Conjugate Gradient algorithm with the Modified Polak-Ribiera-Polyak conjugate direction. The convergence of the algorithm is proved and we report its successful applications to boosting.

Gradient Boosting 모형을 이용한 중소기업 R&D 지원금 결정요인 분석 (Who Gets Government SME R&D Subsidy? Application of Gradient Boosting Model)

  • 강성원;강희찬
    • 한국전자거래학회지
    • /
    • 제25권4호
    • /
    • pp.77-109
    • /
    • 2020
  • 본 논문에서는 그래디언트 부스팅 모형을 활용하여 정부의 중소기업 연구개발 지원 결정에 영향을 미치는 요인들을 파악하였다. 기존 연구가 사후적으로 정부의 연구개발 지원이 수혜 기업에 미친 영향을 분석하는 것에 중점을 두었다면, 본 논문은 정부의 연구개발 지원 결정 방식을 파악하고, 그 방식이 기업에게 제공하는 유인을 분석하고자 하였다. 이를 위하여 본 논문은 지원금 결정에 영향을 미치는 다양한 잠재적 요인들을 선택하고, 기계학습 접근법을 활용하여 추정오차 축소효과가 큰 요인들을 선별하였다. 구체적으로 본 논문은 한국과학기술평가원이 구축한 국가연구개발조사분석 자료와 한국신용평가자료를 연결한 자료에 그래디언트 부스팅(Gradient Boosting) 모형을 적용하여 지원금 추정모형을 구축하였다. 본 논문에서 구축한 그래디언트 부스팅 모형은 선형회귀분석 응용모형에 비해 평균제곱근오차를 7.20% 축소할 수 있었다. 각 변수의 순열 중요도(permutation importance)를 분석한 결과 연구성과지표 및 연구개발비가 추정오차 축소에 기여가 큰 것으로 파악되었다. 그리고 각 변수의 부분의존도(Partial Dependence Plot: PDP) 및 SHAP 값(SHAP value: SHapley Additive exPlanation value)을 분석한 결과 연구성과지표가 좋고 연구개발비 지출이 큰 기업이 많은 연구개발 지원금을 받는 반면, 영업이익이 크고 자기자본회전율이 높은 기업은 적은 지원금을 받는 경향이 발견되었다. 본 연구의 결과는 현재 중소기업 연구개발 지원금 배분 방식이 연구성과지표 제고 및 연구개발투자 증가 유인은 제공하나, 기업 경영성과 제고 유인은 취약함을 시사한다.