• Title/Summary/Keyword: Boosting methods

Search Result 211, Processing Time 0.025 seconds

Generalized Partially Linear Additive Models for Credit Scoring

  • Shim, Ju-Hyun;Lee, Young-K.
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.587-595
    • /
    • 2011
  • Credit scoring is an objective and automatic system to assess the credit risk of each customer. The logistic regression model is one of the popular methods of credit scoring to predict the default probability; however, it may not detect possible nonlinear features of predictors despite the advantages of interpretability and low computation cost. In this paper, we propose to use a generalized partially linear model as an alternative to logistic regression. We also introduce modern ensemble technologies such as bagging, boosting and random forests. We compare these methods via a simulation study and illustrate them through a German credit dataset.

Classification of Soil Creep Hazard Class Using Machine Learning (기계학습기법을 이용한 땅밀림 위험등급 분류)

  • Lee, Gi Ha;Le, Xuan-Hien;Yeon, Min Ho;Seo, Jun Pyo;Lee, Chang Woo
    • Journal of Korean Society of Disaster and Security
    • /
    • v.14 no.3
    • /
    • pp.17-27
    • /
    • 2021
  • In this study, classification models were built using machine learning techniques that can classify the soil creep risk into three classes from A to C (A: risk, B: moderate, C: good). A total of six machine learning techniques were used: K-Nearest Neighbor, Support Vector Machine, Logistic Regression, Decision Tree, Random Forest, and Extreme Gradient Boosting and then their classification accuracy was analyzed using the nationwide soil creep field survey data in 2019 and 2020. As a result of classification accuracy analysis, all six methods showed excellent accuracy of 0.9 or more. The methods where numerical data were applied for data training showed better performance than the methods based on character data of field survey evaluation table. Moreover, the methods learned with the data group (R1~R4) reflecting the expert opinion had higher accuracy than the field survey evaluation score data group (C1~C4). The machine learning can be used as a tool for prediction of soil creep if high-quality data are continuously secured and updated in the future.

Prediction of Soil Moisture with Open Source Weather Data and Machine Learning Algorithms (공공 기상데이터와 기계학습 모델을 이용한 토양수분 예측)

  • Jang, Young-bin;Jang, Ik-hoon;Choe, Young-chan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.22 no.1
    • /
    • pp.1-12
    • /
    • 2020
  • As one of the essential resources in the agricultural process, soil moisture has been carefully managed by predicting future changes and deficits. In recent years, statistics and machine learning based approach to predict soil moisture has been preferred in academia for its generalizability and ease of use in the field. However, little is known that machine learning based soil moisture prediction is applicable in the situation of South Korea. In this sense, this paper aims to examine 1) whether publicly available weather data generated in South Korea has sufficient quality to predict soil moisture, 2) which machine learning algorithm would perform best in the situation of South Korea, and 3) whether a single machine learning model could be generally applicable in various regions. We used various machine learning methods such as Support Vector Machines (SVM), Random Forest (RF), Extremely Randomized Trees (ET), Gradient Boosting Machines (GBM), and Deep Feedforward Network (DFN) to predict future soil moisture in Andong, Boseong, Cheolwon, Suncheon region with open source weather data. As a result, GBM model showed the lowest prediction error in every data set we used (R squared: 0.96, RMSE: 1.8). Furthermore, GBM showed the lowest variance of prediction error between regions which indicates it has the highest generalizability.

Estimating Farmland Prices Using Distance Metrics and an Ensemble Technique (거리척도와 앙상블 기법을 활용한 지가 추정)

  • Lee, Chang-Ro;Park, Key-Ho
    • Journal of Cadastre & Land InformatiX
    • /
    • v.46 no.2
    • /
    • pp.43-55
    • /
    • 2016
  • This study estimated land prices using instance-based learning. A k-nearest neighbor method was utilized among various instance-based learning methods, and the 10 distance metrics including Euclidean distance were calculated in k-nearest neighbor estimation. One distance metric prediction which shows the best predictive performance would be normally chosen as final estimate out of 10 distance metric predictions. In contrast to this practice, an ensemble technique which combines multiple predictions to obtain better performance was applied in this study. We applied the gradient boosting algorithm, a sort of residual-fitting model to our data in ensemble combining. Sales price data of farm lands in Haenam-gun, Jeolla Province were used to demonstrate advantages of instance-based learning as well as an ensemble technique. The result showed that the ensemble prediction was more accurate than previous 10 distance metric predictions.

A Study on Eye Detection by Using Adaboost for Iris Recognition in Mobile Environments (Adaboost를 이용한 모바일 환경에서의 홍채인식을 위한 눈 검출에 관한 연구)

  • Park, Kang-Ryoung;Park, Sung-Hyo;Cho, Dal-Ho
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.4
    • /
    • pp.1-11
    • /
    • 2008
  • In this paper, we propose the new eye detection method by using adaboost (adaptive boosting) method. Also, to reduce the false alarm rate which identifies the non-eye region as genuine eye that is the Problems of previous method using conventional adaboost, we proposed the post processing methods which used the cornea specular reflection and determined the optimized ratio of eye detecting box. Based on detected eye region by using adaboost, we performed the double circular edge detector for localizing a pupil and an iris region at the same time. Experimental results showed that the accuracy of eye detection was about 98% and the processing time was less than 1 second in mobile device.

Enhancing of Red Tide Blooms Prediction using Ensemble Train (앙상블 학습을 이용한 적조 발생 예측의 성능향상)

  • Park, Sun;Jeong, Min-A;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.1
    • /
    • pp.41-48
    • /
    • 2012
  • Red tide is a natural phenomenon temporary blooming harmful algal with changing sea color from normal to red, which fish and shellfish die en masse. It also give a bad influence to coastal environment and sea ecosystem. The damage of sea farming by a red tide has been occurred each year which it cost much to prevent disasters of red tide blooms. Red tide damage and prevention cost of red tide disasters can be minimized by means of prediction of red tide blooms. In this paper, we proposed the red tide blooms prediction method using ensemble train. The proposed method use the bagging and boosting ensemble train methods for enhancing red tide prediction and forecast. The experimental results demonstrate that the proposed method achieves a better red tide prediction performance than other single classifiers.

Effective Korean sentiment classification method using word2vec and ensemble classifier (Word2vec과 앙상블 분류기를 사용한 효율적 한국어 감성 분류 방안)

  • Park, Sung Soo;Lee, Kun Chang
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.133-140
    • /
    • 2018
  • Accurate sentiment classification is an important research topic in sentiment analysis. This study suggests an efficient classification method of Korean sentiment using word2vec and ensemble methods which have been recently studied variously. For the 200,000 Korean movie review texts, we generate a POS-based BOW feature and a feature using word2vec, and integrated features of two feature representation. We used a single classifier of Logistic Regression, Decision Tree, Naive Bayes, and Support Vector Machine and an ensemble classifier of Adaptive Boost, Bagging, Gradient Boosting, and Random Forest for sentiment classification. As a result of this study, the integrated feature representation composed of BOW feature including adjective and adverb and word2vec feature showed the highest sentiment classification accuracy. Empirical results show that SVM, a single classifier, has the highest performance but ensemble classifiers show similar or slightly lower performance than the single classifier.

Store Sales Prediction Using Gradient Boosting Model (그래디언트 부스팅 모델을 활용한 상점 매출 예측)

  • Choi, Jaeyoung;Yang, Heeyoon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.2
    • /
    • pp.171-177
    • /
    • 2021
  • Through the rapid developments in machine learning, there have been diverse utilization approaches not only in industrial fields but also in daily life. Implementations of machine learning on financial data, also have been of interest. Herein, we employ machine learning algorithms to store sales data and present future applications for fintech enterprises. We utilize diverse missing data processing methods to handle missing data and apply gradient boosting machine learning algorithms; XGBoost, LightGBM, CatBoost to predict the future revenue of individual stores. As a result, we found that using median imputation onto missing data with the appliance of the xgboost algorithm has the best accuracy. By employing the proposed method, fintech enterprises and customers can attain benefits. Stores can benefit by receiving financial assistance beforehand from fintech companies, while these corporations can benefit by offering financial support to these stores with low risk.

Quality Prediction Model for Manufacturing Process of Free-Machining 303-series Stainless Steel Small Rolling Wire Rods (쾌삭 303계 스테인리스강 소형 압연 선재 제조 공정의 생산품질 예측 모형)

  • Seo, Seokjun;Kim, Heungseob
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.12-22
    • /
    • 2021
  • This article suggests the machine learning model, i.e., classifier, for predicting the production quality of free-machining 303-series stainless steel(STS303) small rolling wire rods according to the operating condition of the manufacturing process. For the development of the classifier, manufacturing data for 37 operating variables were collected from the manufacturing execution system(MES) of Company S, and the 12 types of derived variables were generated based on literature review and interviews with field experts. This research was performed with data preprocessing, exploratory data analysis, feature selection, machine learning modeling, and the evaluation of alternative models. In the preprocessing stage, missing values and outliers are removed, and oversampling using SMOTE(Synthetic oversampling technique) to resolve data imbalance. Features are selected by variable importance of LASSO(Least absolute shrinkage and selection operator) regression, extreme gradient boosting(XGBoost), and random forest models. Finally, logistic regression, support vector machine(SVM), random forest, and XGBoost are developed as a classifier to predict the adequate or defective products with new operating conditions. The optimal hyper-parameters for each model are investigated by the grid search and random search methods based on k-fold cross-validation. As a result of the experiment, XGBoost showed relatively high predictive performance compared to other models with an accuracy of 0.9929, specificity of 0.9372, F1-score of 0.9963, and logarithmic loss of 0.0209. The classifier developed in this study is expected to improve productivity by enabling effective management of the manufacturing process for the STS303 small rolling wire rods.

Fuzzy-based Segment-Boost Method for Effective Face Recognition (퍼지기반 Segment-Boost 방법을 통한 효과적인 얼굴인식)

  • Chang, Won-Suk;Noh, Chang-Hyeon;Lee, Jong-Sik
    • Journal of the Korea Society for Simulation
    • /
    • v.18 no.1
    • /
    • pp.17-25
    • /
    • 2009
  • This paper suggests fuzzy-based Segment-Boost method and an effective method for face recognition using the fuzzy-based Segment-Boost. Fuzzy-based Segment-Boost eliminates the limitations of Segment-Boost, and it guarantees improved learning performance and the stability of the performance. By using the fuzzy theory, fuzzy-based Segment-Boost optimizes the selection number of sub-vectors, and leads the optimized learning performance. The fuzzy controller designed in this paper measures learning performance of the fuzzy-based Segment-Boost, and it controls the selection number of sub-vectors by inferring the optimized selection number. The simulation results show that the fuzzy controller inferred the selection number which is very approximate to the true optimized value. As a result, fuzzy-based Segment-Boost showed higher face recognition rate than compared boosting methods and it preserves the velocity of feature selection as fast as that of Segment-Boost. From the experimental results, it was proved that fuzzy-based Segment-Boost has improved and stable performances of learning, feature selection and face recognition.