• Title/Summary/Keyword: GBM model

Search Result 119, Processing Time 0.03 seconds

Nakdong River Estuary Salinity Prediction Using Machine Learning Methods (머신러닝 기법을 활용한 낙동강 하구 염분농도 예측)

  • Lee, Hojun;Jo, Mingyu;Chun, Sejin;Han, Jungkyu
    • Smart Media Journal
    • /
    • v.11 no.2
    • /
    • pp.31-38
    • /
    • 2022
  • Promptly predicting changes in the salinity in rivers is an important task to predict the damage to agriculture and ecosystems caused by salinity infiltration and to establish disaster prevention measures. Because machine learning(ML) methods show much less computation cost than physics-based hydraulic models, they can predict the river salinity in a relatively short time. Due to shorter training time, ML methods have been studied as a complementary technique to physics-based hydraulic model. Many studies on salinity prediction based on machine learning have been studied actively around the world, but there are few studies in South Korea. With a massive number of datasets available publicly, we evaluated the performance of various kinds of machine learning techniques that predict the salinity of the Nakdong River Estuary Basin. As a result, LightGBM algorithm shows average 0.37 in RMSE as prediction performance and 2-20 times faster learning speed than other algorithms. This indicates that machine learning techniques can be applied to predict the salinity of rivers in Korea.

Horse race rank prediction using learning-to-rank approaches (Learning-to-rank 기법을 활용한 서울 경마경기 순위 예측)

  • Junhyoung Chung;Donguk Shin;Seyong Hwang;Gunwoong Park
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.239-253
    • /
    • 2024
  • This research applies both point-wise and pair-wise learning strategies within the learning-to-rank (LTR) framework to predict horse race rankings in Seoul. Specifically, for point-wise learning, we employ a linear model and random forest. In contrast, for pair-wise learning, we utilize tools such as RankNet, and LambdaMART (XGBoost Ranker, LightGBM Ranker, and CatBoost Ranker). Furthermore, to enhance predictions, race records are standardized based on race distance, and we integrate various datasets, including race information, jockey information, horse training records, and trainer information. Our results empirically demonstrate that pair-wise learning approaches that can reflect the order information between items generally outperform point-wise learning approaches. Notably, CatBoost Ranker is the top performer. Through Shapley value analysis, we identified that the important variables for CatBoost Ranker include the performance of a horse, its previous race records, the count of its starting trainings, the total number of starting trainings, and the instances of disease diagnoses for the horse.

Optimal Asset Allocation for Defined Contribution Pension to Minimize Shortfall Risk of Income Replacement Rate (소득대체율 부족 위험 최소화를 위한 확정기여형 퇴직연금제도의 최적자산배분)

  • Dong-Hwa Lee;Kyung-Jin Choi
    • Journal of the Korea Society for Simulation
    • /
    • v.33 no.1
    • /
    • pp.27-34
    • /
    • 2024
  • This study aims to propose an optimal asset allocation that minimizes the risk of insufficient realized replacement rates compared to the OECD average replacement rate. To do this, we set the shortfall risk of replacement rates and calculates an asset allocation plan to minimize this risk based on the period of enrollment, the income level and additional contribution. We consider stocks and deposits as investment assets, using Monte Carlo simulation with a GBM model to generate return distributions for stocks. Our result show that, for individuals with a enrollment period of less than 30 years, participants should invest a minimum of 70-80% of their funds in risky assets to minimize the shortfall risk. However, the proportion of funds that need to be invested in risky assets declines significantly when participants contribute an additional premiums. This effect is particularly pronounced among low-income individuals. Therefore, to achieve OECD average replacement rates, the government needs to incentivize participants to invest more in risky assets, while also providing policies to encourage additional contributions, especially for the low-income population.

Machine Learning Based MMS Point Cloud Semantic Segmentation (머신러닝 기반 MMS Point Cloud 의미론적 분할)

  • Bae, Jaegu;Seo, Dongju;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.939-951
    • /
    • 2022
  • The most important factor in designing autonomous driving systems is to recognize the exact location of the vehicle within the surrounding environment. To date, various sensors and navigation systems have been used for autonomous driving systems; however, all have limitations. Therefore, the need for high-definition (HD) maps that provide high-precision infrastructure information for safe and convenient autonomous driving is increasing. HD maps are drawn using three-dimensional point cloud data acquired through a mobile mapping system (MMS). However, this process requires manual work due to the large numbers of points and drawing layers, increasing the cost and effort associated with HD mapping. The objective of this study was to improve the efficiency of HD mapping by segmenting semantic information in an MMS point cloud into six classes: roads, curbs, sidewalks, medians, lanes, and other elements. Segmentation was performed using various machine learning techniques including random forest (RF), support vector machine (SVM), k-nearest neighbor (KNN), and gradient-boosting machine (GBM), and 11 variables including geometry, color, intensity, and other road design features. MMS point cloud data for a 130-m section of a five-lane road near Minam Station in Busan, were used to evaluate the segmentation models; the average F1 scores of the models were 95.43% for RF, 92.1% for SVM, 91.05% for GBM, and 82.63% for KNN. The RF model showed the best segmentation performance, with F1 scores of 99.3%, 95.5%, 94.5%, 93.5%, and 90.1% for roads, sidewalks, curbs, medians, and lanes, respectively. The variable importance results of the RF model showed high mean decrease accuracy and mean decrease gini for XY dist. and Z dist. variables related to road design, respectively. Thus, variables related to road design contributed significantly to the segmentation of semantic information. The results of this study demonstrate the applicability of segmentation of MMS point cloud data based on machine learning, and will help to reduce the cost and effort associated with HD mapping.

Estimation of Ground-level PM10 and PM2.5 Concentrations Using Boosting-based Machine Learning from Satellite and Numerical Weather Prediction Data (부스팅 기반 기계학습기법을 이용한 지상 미세먼지 농도 산출)

  • Park, Seohui;Kim, Miae;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.2
    • /
    • pp.321-335
    • /
    • 2021
  • Particulate matter (PM10 and PM2.5 with a diameter less than 10 and 2.5 ㎛, respectively) can be absorbed by the human body and adversely affect human health. Although most of the PM monitoring are based on ground-based observations, they are limited to point-based measurement sites, which leads to uncertainty in PM estimation for regions without observation sites. It is possible to overcome their spatial limitation by using satellite data. In this study, we developed machine learning-based retrieval algorithm for ground-level PM10 and PM2.5 concentrations using aerosol parameters from Geostationary Ocean Color Imager (GOCI) satellite and various meteorological parameters from a numerical weather prediction model during January to December of 2019. Gradient Boosted Regression Trees (GBRT) and Light Gradient Boosting Machine (LightGBM) were used to estimate PM concentrations. The model performances were examined for two types of feature sets-all input parameters (Feature set 1) and a subset of input parameters without meteorological and land-cover parameters (Feature set 2). Both models showed higher accuracy (about 10 % higher in R2) by using the Feature set 1 than the Feature set 2. The GBRT model using Feature set 1 was chosen as the final model for further analysis(PM10: R2 = 0.82, nRMSE = 34.9 %, PM2.5: R2 = 0.75, nRMSE = 35.6 %). The spatial distribution of the seasonal and annual-averaged PM concentrations was similar with in-situ observations, except for the northeastern part of China with bright surface reflectance. Their spatial distribution and seasonal changes were well matched with in-situ measurements.

Prediction of Larix kaempferi Stand Growth in Gangwon, Korea, Using Machine Learning Algorithms

  • Hyo-Bin Ji;Jin-Woo Park;Jung-Kee Choi
    • Journal of Forest and Environmental Science
    • /
    • v.39 no.4
    • /
    • pp.195-202
    • /
    • 2023
  • In this study, we sought to compare and evaluate the accuracy and predictive performance of machine learning algorithms for estimating the growth of individual Larix kaempferi trees in Gangwon Province, Korea. We employed linear regression, random forest, XGBoost, and LightGBM algorithms to predict tree growth using monitoring data organized based on different thinning intensities. Furthermore, we compared and evaluated the goodness-of-fit of these models using metrics such as the coefficient of determination (R2), mean absolute error (MAE), and root mean square error (RMSE). The results revealed that XGBoost provided the highest goodness-of-fit, with an R2 value of 0.62 across all thinning intensities, while also yielding the lowest values for MAE and RMSE, thereby indicating the best model fit. When predicting the growth volume of individual trees after 3 years using the XGBoost model, the agreement was exceptionally high, reaching approximately 97% for all stand sites in accordance with the different thinning intensities. Notably, in non-thinned plots, the predicted volumes were approximately 2.1 m3 lower than the actual volumes; however, the agreement remained highly accurate at approximately 99.5%. These findings will contribute to the development of growth prediction models for individual trees using machine learning algorithms.

A Study on the Prediction of Rock Classification Using Shield TBM Data and Machine Learning Classification Algorithms (쉴드 TBM 데이터와 머신러닝 분류 알고리즘을 이용한 암반 분류 예측에 관한 연구)

  • Kang, Tae-Ho;Choi, Soon-Wook;Lee, Chulho;Chang, Soo-Ho
    • Tunnel and Underground Space
    • /
    • v.31 no.6
    • /
    • pp.494-507
    • /
    • 2021
  • With the increasing use of TBM, research has recently been conducted in Korea to analyze TBM data with machine learning techniques to predict the ground in front of TBM, predict the exchange cycle of disk cutters, and predict the advance rate of TBM. In this study, classification prediction of rock characteristics of slurry shield TBM sites was made by combining traditional rock classification techniques and machine learning techniques widely used in various fields with machine data during TBM excavation. The items of rock characteristic classification criteria were set as RQD, uniaxial compression strength, and elastic wave speed, and the rock conditions for each item were classified into three classes: class 0 (good), 1 (normal), and 2 (poor), and machine learning was performed on six class algorithms. As a result, the ensemble model showed good performance, and the LigthtGBM model, which showed excellent results in learning speed as well as learning performance, was found to be optimal in the target site ground. Using the classification model for the three rock characteristics set in this study, it is believed that it will be possible to provide rock conditions for sections where ground information is not provided, which will help during excavation work.

Robust Multiloop Controller Design of Uncertain Affine TFM(Transfer Function Matrix) System (불확실한 Affine TFM(Transfer Function Matrix) 시스템의 강인한 다중 루프 제어기 설계)

  • Byun Hwang-Woo;Yang Hai-Won
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.54 no.1
    • /
    • pp.17-25
    • /
    • 2005
  • This paper provides sufficient conditions for the robustness of Affine linear TFM(Transfer Function Matrix) MIMO (Multi-Input Multi-Output) uncertain systems based on Rosenbrock's DNA (Direct Nyquist Array). The parametric uncertainty is modeled through a Affine TFM MIMO description, and the unstructured uncertainty through a bounded perturbation of Affine polynomials. Gershgorin's theorem and concepts of diagonal dominance and GB(Gershgorin Bands) are extended to include model uncertainty. For this type of parametric robust performance we show robustness of the Affine TFM systems using Nyquist diagram and GB, DNA(Direct Nyquist Array). Multiloop PI/PB controllers can be tuned by using a modified version of the Ziegler-Nickels (ZN) relations. Simulation examples show the performance and efficiency of the proposed multiloop design method.

Controller Design and Stability Analysis of Affine System with Dead-Time (불감시간을 갖는 Affine 시스템의 안정도 해석과 제어기 설계)

  • Yang Hai-Won;Byun Hwang-Woo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.11 no.2
    • /
    • pp.93-102
    • /
    • 2005
  • The Nyquist robust stability margin is proposed as a measure of robust stability for systems with Affine TFM(Transfer Function Matrix) parametric uncertainty. The parametric uncertainty is modeled through a Affine TFM MIMO (Multi-Input Multi-Output) description with dead-time, and the unstructured uncertainty through a bounded perturbation of Affine polynomials. Gershgorin's theorem and concepts of diagonal dominance and GB(Gershgorin Bands) are extended to include model uncertainty. Multiloop PI/PID controllers can be tuned by using a modified version of the Ziegler-Nichols (ZN) relations. Consequently, this paper provides sufficient conditions for the robustness of Affine TFM MIMO uncertain systems with dead-time based on Rosenbrock's DNA. Simulation examples show the performance and efficiency of the proposed multiloop design method for Affine uncertain systems with dead-time.

Performance comparison between Decision tree model and TabNet for loan repayment prediction (대출 상환 예측을 위한 의사결정나무모델과 TabNet 간 성능 비교)

  • Sujin Han;Hyeoncheol Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.453-455
    • /
    • 2023
  • 본 연구는 은행에서 리스크 관리 자동화를 위해 고객의 대출 상환 여부 예측 모델을 제안하고자 한다. 예측 모델로 금융 데이터 같은 정형데이터에서 전통적으로 높은 성능을 보인 의사결정나무기반 모델 LightGBM, CatBoost, XGB 와 최근 제안된 정형데이터에서 사용할 수 있는 설명 가능한 딥러닝 기반 모델 TabNet 간의 성능 비교를 진행한다. 다만, 대출 상환 여부 데이터는 불균형 클래스 데이터로 구성되어있어 샘플링을 진행한다. SMOTE, Random Under Sampling, 혼합 방식을 비교해 가장 높은 성능의 샘플링 기법을 제안한다. 대출 상환 여부 예측 결과 TabNet 모델이 의사결정나무모델들보다 좋은 성능을 보여 정형데이터에서 의사결정나무 기반 모델을 딥러닝 모델이 대체 할 수 있는 가능성을 확인했다.