• Title/Summary/Keyword: Gradient boosting

Search Result 240, Processing Time 0.025 seconds

A study on applying random forest and gradient boosting algorithm for Chl-a prediction of Daecheong lake (대청호 Chl-a 예측을 위한 random forest와 gradient boosting 알고리즘 적용 연구)

  • Lee, Sang-Min;Kim, Il-Kyu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.35 no.6
    • /
    • pp.507-516
    • /
    • 2021
  • In this study, the machine learning which has been widely used in prediction algorithms recently was used. the research point was the CD(chudong) point which was a representative point of Daecheong Lake. Chlorophyll-a(Chl-a) concentration was used as a target variable for algae prediction. to predict the Chl-a concentration, a data set of water quality and quantity factors was consisted. we performed algorithms about random forest and gradient boosting with Python. to perform the algorithms, at first the correlation analysis between Chl-a and water quality and quantity data was studied. we extracted ten factors of high importance for water quality and quantity data. as a result of the algorithm performance index, the gradient boosting showed that RMSE was 2.72 mg/m3 and MSE was 7.40 mg/m3 and R2 was 0.66. as a result of the residual analysis, the analysis result of gradient boosting was excellent. as a result of the algorithm execution, the gradient boosting algorithm was excellent. the gradient boosting algorithm was also excellent with 2.44 mg/m3 of RMSE in the machine learning hyperparameter adjustment result.

GBGNN: Gradient Boosted Graph Neural Networks

  • Eunjo Jang;Ki Yong Lee
    • Journal of Information Processing Systems
    • /
    • v.20 no.4
    • /
    • pp.501-513
    • /
    • 2024
  • In recent years, graph neural networks (GNNs) have been extensively used to analyze graph data across various domains because of their powerful capabilities in learning complex graph-structured data. However, recent research has focused on improving the performance of a single GNN with only two or three layers. This is because stacking layers deeply causes the over-smoothing problem of GNNs, which degrades the performance of GNNs significantly. On the other hand, ensemble methods combine individual weak models to obtain better generalization performance. Among them, gradient boosting is a powerful supervised learning algorithm that adds new weak models in the direction of reducing the errors of the previously created weak models. After repeating this process, gradient boosting combines the weak models to produce a strong model with better performance. Until now, most studies on GNNs have focused on improving the performance of a single GNN. In contrast, improving the performance of GNNs using multiple GNNs has not been studied much yet. In this paper, we propose gradient boosted graph neural networks (GBGNN) that combine multiple shallow GNNs with gradient boosting. We use shallow GNNs as weak models and create new weak models using the proposed gradient boosting-based loss function. Our empirical evaluations on three real-world datasets demonstrate that GBGNN performs much better than a single GNN. Specifically, in our experiments using graph convolutional network (GCN) and graph attention network (GAT) as weak models on the Cora dataset, GBGNN achieves performance improvements of 12.3%p and 6.1%p in node classification accuracy compared to a single GCN and a single GAT, respectively.

Cognitive Impairment Prediction Model Using AutoML and Lifelog

  • Hyunchul Choi;Chiho Yoon;Sae Bom Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.11
    • /
    • pp.53-63
    • /
    • 2023
  • This study developed a cognitive impairment predictive model as one of the screening tests for preventing dementia in the elderly by using Automated Machine Learning(AutoML). We used 'Wearable lifelog data for high-risk dementia patients' of National Information Society Agency, then conducted using PyCaret 3.0.0 in the Google Colaboratory environment. This study analysis steps are as follows; first, selecting five models demonstrating excellent classification performance for the model development and lifelog data analysis. Next, using ensemble learning to integrate these models and assess their performance. It was found that Voting Classifier, Gradient Boosting Classifier, Extreme Gradient Boosting, Light Gradient Boosting Machine, Extra Trees Classifier, and Random Forest Classifier model showed high predictive performance in that order. This study findings, furthermore, emphasized on the the crucial importance of 'Average respiration per minute during sleep' and 'Average heart rate per minute during sleep' as the most critical feature variables for accurate predictions. Finally, these study results suggest that consideration of the possibility of using machine learning and lifelog as a means to more effectively manage and prevent cognitive impairment in the elderly.

A robust approach in prediction of RCFST columns using machine learning algorithm

  • Van-Thanh Pham;Seung-Eock Kim
    • Steel and Composite Structures
    • /
    • v.46 no.2
    • /
    • pp.153-173
    • /
    • 2023
  • Rectangular concrete-filled steel tubular (RCFST) column, a type of concrete-filled steel tubular (CFST), is widely used in compression members of structures because of its advantages. This paper proposes a robust machine learning-based framework for predicting the ultimate compressive strength of RCFST columns under both concentric and eccentric loading. The gradient boosting neural network (GBNN), an efficient and up-to-date ML algorithm, is utilized for developing a predictive model in the proposed framework. A total of 890 experimental data of RCFST columns, which is categorized into two datasets of concentric and eccentric compression, is carefully collected to serve as training and testing purposes. The accuracy of the proposed model is demonstrated by comparing its performance with seven state-of-the-art machine learning methods including decision tree (DT), random forest (RF), support vector machines (SVM), deep learning (DL), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), and categorical gradient boosting (CatBoost). Four available design codes, including the European (EC4), American concrete institute (ACI), American institute of steel construction (AISC), and Australian/New Zealand (AS/NZS) are refereed in another comparison. The results demonstrate that the proposed GBNN method is a robust and powerful approach to obtain the ultimate strength of RCFST columns.

Development and Validation of MRI-Based Radiomics Models for Diagnosing Juvenile Myoclonic Epilepsy

  • Kyung Min Kim;Heewon Hwang;Beomseok Sohn;Kisung Park;Kyunghwa Han;Sung Soo Ahn;Wonwoo Lee;Min Kyung Chu;Kyoung Heo;Seung-Koo Lee
    • Korean Journal of Radiology
    • /
    • v.23 no.12
    • /
    • pp.1281-1289
    • /
    • 2022
  • Objective: Radiomic modeling using multiple regions of interest in MRI of the brain to diagnose juvenile myoclonic epilepsy (JME) has not yet been investigated. This study aimed to develop and validate radiomics prediction models to distinguish patients with JME from healthy controls (HCs), and to evaluate the feasibility of a radiomics approach using MRI for diagnosing JME. Materials and Methods: A total of 97 JME patients (25.6 ± 8.5 years; female, 45.5%) and 32 HCs (28.9 ± 11.4 years; female, 50.0%) were randomly split (7:3 ratio) into a training (n = 90) and a test set (n = 39) group. Radiomic features were extracted from 22 regions of interest in the brain using the T1-weighted MRI based on clinical evidence. Predictive models were trained using seven modeling methods, including a light gradient boosting machine, support vector classifier, random forest, logistic regression, extreme gradient boosting, gradient boosting machine, and decision tree, with radiomics features in the training set. The performance of the models was validated and compared to the test set. The model with the highest area under the receiver operating curve (AUROC) was chosen, and important features in the model were identified. Results: The seven tested radiomics models, including light gradient boosting machine, support vector classifier, random forest, logistic regression, extreme gradient boosting, gradient boosting machine, and decision tree, showed AUROC values of 0.817, 0.807, 0.783, 0.779, 0.767, 0.762, and 0.672, respectively. The light gradient boosting machine with the highest AUROC, albeit without statistically significant differences from the other models in pairwise comparisons, had accuracy, precision, recall, and F1 scores of 0.795, 0.818, 0.931, and 0.871, respectively. Radiomic features, including the putamen and ventral diencephalon, were ranked as the most important for suggesting JME. Conclusion: Radiomic models using MRI were able to differentiate JME from HCs.

A gradient boosting regression based approach for energy consumption prediction in buildings

  • Bataineh, Ali S. Al
    • Advances in Energy Research
    • /
    • v.6 no.2
    • /
    • pp.91-101
    • /
    • 2019
  • This paper proposes an efficient data-driven approach to build models for predicting energy consumption in buildings. Data used in this research is collected by installing humidity and temperature sensors at different locations in a building. In addition to this, weather data from nearby weather station is also included in the dataset to study the impact of weather conditions on energy consumption. One of the main emphasize of this research is to make feature selection independent of domain knowledge. Therefore, to extract useful features from data, two different approaches are tested: one is feature selection through principal component analysis and second is relative importance-based feature selection in original domain. The regression model used in this research is gradient boosting regression and its optimal parameters are chosen through a two staged coarse-fine search approach. In order to evaluate the performance of model, different performance evaluation metrics like r2-score and root mean squared error are used. Results have shown that best performance is achieved, when relative importance-based feature selection is used with gradient boosting regressor. Results of proposed technique has also outperformed the results of support vector machines and neural network-based approaches tested on the same dataset.

Predicting Deformation Behavior of Additively Manufactured Ti-6Al-4V Based on XGB and LGBM (XGB 및 LGBM을 활용한 Ti-6Al-4V 적층재의 변형 거동 예측)

  • Cheon, S.;Yu, J.;Kim, J.G.;Oh, J.S.;Nam, T.H.;Lee, T.
    • Transactions of Materials Processing
    • /
    • v.31 no.4
    • /
    • pp.173-178
    • /
    • 2022
  • The present study employed two different machine-learning approaches, the extreme gradient boosting (XGB) and light gradient boosting machine (LGBM), to predict a compressive deformation behavior of additively manufactured Ti-6Al-4V. Such approaches have rarely been verified in the field of metallurgy in contrast to artificial neural network and its variants. XGB and LGBM provided a good prediction for elongation to failure under an extrapolated condition of processing parameters. The predicting accuracy of these methods was better than that of response surface method. Furthermore, XGB and LGBM with optimum hyperparameters well predicted a deformation behavior of Ti-6Al-4V additively manufactured under the extrapolated condition. Although the predicting capability of two methods was comparable, LGBM was superior to XGB in light of six-fold higher rate of machine learning. It is also noted this work has verified the LGBM approach in solving the metallurgical problem for the first time.

Performance Comparison of Machine-learning Models for Analyzing Weather and Traffic Accident Correlations

  • Li Zi Xuan;Hyunho Yang
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.3
    • /
    • pp.225-232
    • /
    • 2023
  • Owing to advancements in intelligent transportation systems (ITS) and artificial-intelligence technologies, various machine-learning models can be employed to simulate and predict the number of traffic accidents under different weather conditions. Furthermore, we can analyze the relationship between weather and traffic accidents, allowing us to assess whether the current weather conditions are suitable for travel, which can significantly reduce the risk of traffic accidents. In this study, we analyzed 30000 traffic flow data points collected by traffic cameras at nearby intersections in Washington, D.C., USA from October 2012 to May 2017, using Pearson's heat map. We then predicted, analyzed, and compared the performance of the correlation between continuous features by applying several machine-learning algorithms commonly used in ITS, including random forest, decision tree, gradient-boosting regression, and support vector regression. The experimental results indicated that the gradient-boosting regression machine-learning model had the best performance.

Investment, Export, and Exchange Rate on Prediction of Employment with Decision Tree, Random Forest, and Gradient Boosting Machine Learning Models (투자와 수출 및 환율의 고용에 대한 의사결정 나무, 랜덤 포레스트와 그래디언트 부스팅 머신러닝 모형 예측)

  • Chae-Deug Yi
    • Korea Trade Review
    • /
    • v.46 no.2
    • /
    • pp.281-299
    • /
    • 2021
  • This paper analyzes the feasibility of using machine learning methods to forecast the employment. The machine learning methods, such as decision tree, artificial neural network, and ensemble models such as random forest and gradient boosting regression tree were used to forecast the employment in Busan regional economy. The following were the main findings of the comparison of their predictive abilities. First, the forecasting power of machine learning methods can predict the employment well. Second, the forecasting values for the employment by decision tree models appeared somewhat differently according to the depth of decision trees. Third, the predictive power of artificial neural network model, however, does not show the high predictive power. Fourth, the ensemble models such as random forest and gradient boosting regression tree model show the higher predictive power. Thus, since the machine learning method can accurately predict the employment, we need to improve the accuracy of forecasting employment with the use of machine learning methods.

Dynamic Caching Routing Strategy for LEO Satellite Nodes Based on Gradient Boosting Regression Tree

  • Yang Yang;Shengbo Hu;Guiju Lu
    • Journal of Information Processing Systems
    • /
    • v.20 no.1
    • /
    • pp.131-147
    • /
    • 2024
  • A routing strategy based on traffic prediction and dynamic cache allocation for satellite nodes is proposed to address the issues of high propagation delay and overall delay of inter-satellite and satellite-to-ground links in low Earth orbit (LEO) satellite systems. The spatial and temporal correlations of satellite network traffic were analyzed, and the relevant traffic through the target satellite was extracted as raw input for traffic prediction. An improved gradient boosting regression tree algorithm was used for traffic prediction. Based on the traffic prediction results, a dynamic cache allocation routing strategy is proposed. The satellite nodes periodically monitor the traffic load on inter-satellite links (ISLs) and dynamically allocate cache resources for each ISL with neighboring nodes. Simulation results demonstrate that the proposed routing strategy effectively reduces packet loss rate and average end-to-end delay and improves the distribution of services across the entire network.