• Title/Summary/Keyword: Gradient boosting

Search Result 240, Processing Time 0.027 seconds

Comparison of machine learning algorithms for regression and classification of ultimate load-carrying capacity of steel frames

  • Kim, Seung-Eock;Vu, Quang-Viet;Papazafeiropoulos, George;Kong, Zhengyi;Truong, Viet-Hung
    • Steel and Composite Structures
    • /
    • v.37 no.2
    • /
    • pp.193-209
    • /
    • 2020
  • In this paper, the efficiency of five Machine Learning (ML) methods consisting of Deep Learning (DL), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Gradient Tree Booting (GTB) for regression and classification of the Ultimate Load Factor (ULF) of nonlinear inelastic steel frames is compared. For this purpose, a two-story, a six-story, and a twenty-story space frame are considered. An advanced nonlinear inelastic analysis is carried out for the steel frames to generate datasets for the training of the considered ML methods. In each dataset, the input variables are the geometric features of W-sections and the output variable is the ULF of the frame. The comparison between the five ML methods is made in terms of the mean-squared-error (MSE) for the regression models and the accuracy for the classification models, respectively. Moreover, the ULF distribution curve is calculated for each frame and the strength failure probability is estimated. It is found that the GTB method has the best efficiency in both regression and classification of ULF regardless of the number of training samples and the space frames considered.

Machine Learning Methods to Predict Vehicle Fuel Consumption

  • Ko, Kwangho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.9
    • /
    • pp.13-20
    • /
    • 2022
  • It's proposed and analyzed ML(Machine Learning) models to predict vehicle FC(Fuel Consumption) in real-time. The test driving was done for a car to measure vehicle speed, acceleration, road gradient and FC for training dataset. The various ML models were trained with feature data of speed, acceleration and road-gradient for target FC. There are two kind of ML models and one is regression type of linear regression and k-nearest neighbors regression and the other is classification type of k-nearest neighbors classifier, logistic regression, decision tree, random forest and gradient boosting in the study. The prediction accuracy is low in range of 0.5 ~ 0.6 for real-time FC and the classification type is more accurate than the regression ones. The prediction error for total FC has very low value of about 0.2 ~ 2.0% and regression models are more accurate than classification ones. It's for the coefficient of determination (R2) of accuracy score distributing predicted values along mean of targets as the coefficient decreases. Therefore regression models are good for total FC and classification ones are proper for real-time FC prediction.

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]

Korean Dependency Parsing Using Various Ensemble Models (다양한 앙상블 알고리즘을 이용한 한국어 의존 구문 분석)

  • Jo, Gyeong-Cheol;Kim, Ju-Wan;Kim, Gyun-Yeop;Park, Seong-Jin;Gang, Sang-U
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.543-545
    • /
    • 2019
  • 본 논문은 최신 한국어 의존 구문 분석 모델(Korean dependency parsing model)들과 다양한 앙상블 모델(ensemble model)들을 결합하여 그 성능을 분석한다. 단어 표현은 미리 학습된 워드 임베딩 모델(word embedding model)과 ELMo(Embedding from Language Model), Bert(Bidirectional Encoder Representations from Transformer) 그리고 다양한 추가 자질들을 사용한다. 또한 사용된 의존 구문 분석 모델로는 Stack Pointer Network Model, Deep Biaffine Attention Parser와 Left to Right Pointer Parser를 이용한다. 최종적으로 각 모델의 분석 결과를 앙상블 모델인 Bagging 기법과 XGBoost(Extreme Gradient Boosting) 이용하여 최적의 모델을 제안한다.

  • PDF

LightGBM Based Prediction of East Sea Vertical Temperature Profile Using XBT Data (XBT 데이터를 이용한 LightGBM 기반 동해 수직 수온분포 예측)

  • Kim, Young-Joo;Lee, Soo-Jin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.27-28
    • /
    • 2022
  • 최근 우리나라에서도 인공지능 모델을 이용한 수온 예측 관련 연구가 활발히 진행되고 있으나 한반도 주변 해역의 수온 예측 연구에서는 주로 해수면 온도만을 예측하는데 중점을 두고 있다. 본 논문에서는 XBT(eXpendable Bathy-Thermograph) 데이터와 LightGBM(Light Gradient Boosting Model)을 이용하여 잠수함 작전 및 대잠전(Anti Submarine Warfare)에 있어서 군사적으로 중요한 동해의 수직 수온분포를 예측하였다. 동해 특정해역의 해수면부터 수심 200m까지 측정된 XBT 데이터를 이용하여 모델을 학습시키고 성능 평가지표(MAE, MSE, RMSE)와 수직 수온분포 그래프를 통해 예측 정확도를 평가하였다.

  • PDF

An Ensemble Model for Credit Default Discrimination: Incorporating BERT-based NLP and Transformer

  • Sophot Ky;Ju-Hong Lee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.624-626
    • /
    • 2023
  • Credit scoring is a technique used by financial institutions to assess the creditworthiness of potential borrowers. This involves evaluating a borrower's credit history to predict the likelihood of defaulting on a loan. This paper presents an ensemble of two Transformer based models within a framework for discriminating the default risk of loan applications in the field of credit scoring. The first model is FinBERT, a pretrained NLP model to analyze sentiment of financial text. The second model is FT-Transformer, a simple adaptation of the Transformer architecture for the tabular domain. Both models are trained on the same underlying data set, with the only difference being the representation of the data. This multi-modal approach allows us to leverage the unique capabilities of each model and potentially uncover insights that may not be apparent when using a single model alone. We compare our model with two famous ensemble-based models, Random Forest and Extreme Gradient Boosting.

A Study on Smoker Prediction Using Machine Learning Algorithm (기계학습 알고리즘을 이용한 흡연자 예측 연구)

  • Jongwoo Baek;Joonil Bang;Joowon Lee;Hwajong Kim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.07a
    • /
    • pp.537-538
    • /
    • 2023
  • 본 논문에서는 사람에게서 나타나는 생체 특성과 흡연여부의 상관관계 분석을 위해 랜덤 포레스트와 그래디언트 부스팅 트리의 두 가지 기계학습 알고리즘을 사용하였다. 연구에 사용된 데이터는 국민건강보험공단에서 제공하고 Kaggle에서 취합하여 정리한 건강검진 정보를 사용하였다. 분류 모델의 학습에 있어 혈청 정보가 높은 관계성을 보일 것으로 예상하였으나, 실제 결과는 성별이 가장 큰 영향을 끼치는 것으로 확인되었다.

  • PDF

Harvest Forecasting Improvement Using Federated Learning and Ensemble Model

  • Ohnmar Khin;Jin Gwang Koh;Sung Keun Lee
    • Smart Media Journal
    • /
    • v.12 no.10
    • /
    • pp.9-18
    • /
    • 2023
  • Harvest forecasting is the great demand of multiple aspects like temperature, rain, environment, and their relations. The existing study investigates the climate conditions and aids the cultivators to know the harvest yields before planting in farms. The proposed study uses federated learning. In addition, the additional widespread techniques such as bagging classifier, extra tees classifier, linear discriminant analysis classifier, quadratic discriminant analysis classifier, stochastic gradient boosting classifier, blending models, random forest regressor, and AdaBoost are utilized together. These presented nine algorithms achieved exemplary satisfactory accuracies. The powerful contributions of proposed algorithms can create exact harvest forecasting. Ultimately, we intend to compare our study with the earlier research's results.

Performance Analysis of Building Damage Prediction Models using Earthquake Data (지진 데이터를 이용한 건물 피해 예측 모델의 성능 분석)

  • Songhwa Chae;Yujin Lim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.547-548
    • /
    • 2023
  • 내진 설계가 되어있지 않은 건물의 경우, 지진으로 인해 건물 붕괴 가능성이 높아지며 이로 인해 많은 인명 피해가 발생할 수 있다. 지진으로 인한 건물의 피해를 예측하고 이를 기반으로 취약점을 보완한다면 인명 피해를 줄일 수 있으므로 건물 피해 예측 모델에 대한 연구가 필요하다. 본 논문에서는 2015 년 네팔 대지진으로 인해 손상된 건물 데이터를 활용하여 Random Forest 와 Extreme Gradient Boosting 기계학습 분류 알고리즘을 사용하여 지진 피해 예측 모델의 정확도를 비교하였다.

Comparative studies of different machine learning algorithms in predicting the compressive strength of geopolymer concrete

  • Sagar Paruthi;Ibadur Rahman;Asif Husain
    • Computers and Concrete
    • /
    • v.32 no.6
    • /
    • pp.607-613
    • /
    • 2023
  • The objective of this work is to determine the compressive strength of geopolymer concrete utilizing four distinct machine learning approaches. These techniques are known as gradient boosting machine (GBM), generalized linear model (GLM), extremely randomized trees (XRT), and deep learning (DL). Experimentation is performed to collect the data that is then utilized for training the models. Compressive strength is the response variable, whereas curing days, curing temperature, silica fume, and nanosilica concentration are the different input parameters that are taken into consideration. Several kinds of errors, including root mean square error (RMSE), coefficient of correlation (CC), variance account for (VAF), RMSE to observation's standard deviation ratio (RSR), and Nash-Sutcliffe effectiveness (NSE), were computed to determine the effectiveness of each algorithm. It was observed that, among all the models that were investigated, the GBM is the surrogate model that can predict the compressive strength of the geopolymer concrete with the highest degree of precision.