• Title/Summary/Keyword: boosting algorithm

Search Result 168, Processing Time 0.022 seconds

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Adaptive image contrast enhancement algorithm based on block approach (블럭방법에 근거한 영상의 적응적 대비증폭 알고리즘)

  • Kim, Yeong-Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.371-380
    • /
    • 2011
  • The noise caused by a variety of reasons worsens the quality of input image when we use the images reproducing device. The basic difficulty to solve this problem is that the noise and the signal are difficult to be distinguished. Contrast enhancement such as unsharp masking is one of the most important procedures to improve the quality of input images. The conventional unsharp masking enhances the images by adding their amplified high frequency components. The noise component of the input images, however, also tends to be amplified due to the nature of the unsharp masking. This paper considers the block approach for detecting niose and image feature of the input image so that the unsharp masking could be adaptively applied accordingly. Simulation results show that it is made possible to enhance contrast of the image without boosting up the noisy components by applying the proposed algorithm.

EAR: Enhanced Augmented Reality System for Sports Entertainment Applications

  • Mahmood, Zahid;Ali, Tauseef;Muhammad, Nazeer;Bibi, Nargis;Shahzad, Imran;Azmat, Shoaib
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.12
    • /
    • pp.6069-6091
    • /
    • 2017
  • Augmented Reality (AR) overlays virtual information on real world data, such as displaying useful information on videos/images of a scene. This paper presents an Enhanced AR (EAR) system that displays useful statistical players' information on captured images of a sports game. We focus on the situation where the input image is degraded by strong sunlight. Proposed EAR system consists of an image enhancement technique to improve the accuracy of subsequent player and face detection. The image enhancement is followed by player and face detection, face recognition, and players' statistics display. First, an algorithm based on multi-scale retinex is proposed for image enhancement. Then, to detect players' and faces', we use adaptive boosting and Haar features for feature extraction and classification. The player face recognition algorithm uses boosted linear discriminant analysis to select features and nearest neighbor classifier for classification. The system can be adjusted to work in different types of sports where the input is an image and the desired output is display of information nearby the recognized players. Simulations are carried out on 2096 different images that contain players in diverse conditions. Proposed EAR system demonstrates the great potential of computer vision based approaches to develop AR applications.

Image Retrieval using Distribution Block Signature of Main Colors' Set and Performance Boosting via Relevance feedback (주요 색상의 분포 블록기호를 이용한 영상검색과 유사도 피드백을 통한 이미지 검색)

  • 박한수;유헌우;장동식
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.2
    • /
    • pp.126-136
    • /
    • 2004
  • This paper proposes a new content-based image retrieval algorithm using color-spatial information. For the purpose, the paper suggests two kinds of indexing key to prune away irrelevant images to a given query image; MCS(Main Colors' Set), which is related with color information and DBS (Distribution Block Signature), which is related with spatial information. After successively applying these filters to a database, we could get a small amount of high potential candidates that are somewhat similar to the query image. Then we would make use of new QM(Quad modeling) and relevance feedback mechanism to obtain more accurate retrieval. It would enhance the retrieval effectiveness by dynamically modulating the weights of color-spatial information. Experiments show that the proposed algorithm can apply successfully image retrieval applications.

Comparative characteristic of ensemble machine learning and deep learning models for turbidity prediction in a river (딥러닝과 앙상블 머신러닝 모형의 하천 탁도 예측 특성 비교 연구)

  • Park, Jungsu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.35 no.1
    • /
    • pp.83-91
    • /
    • 2021
  • The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.

Fusion of Blockchain-IoT network to improve supply chain traceability using Ethermint Smart chain: A Review

  • George, Geethu Mary;Jayashree, LS
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.11
    • /
    • pp.3694-3722
    • /
    • 2022
  • In today's globalized world, there is no transparency in exchanging data and information between producers and consumers. However, these tasks experience many challenges, such as administrative barriers, confidential data leakage, and extensive time delays. To overcome these challenges, we propose a decentralized, secured, and verified smart chain framework using Ethereum Smart Contract which employs Inter Planetary File Systems (IPFS) and MongoDB as storage systems to automate the process and exchange information into blocks using the Tendermint algorithm. The proposed work promotes complete traceability of the product, ensures data integrity and transparency in addition to providing security to their personal information using the Lelantos mode of shipping. The Tendermint algorithm helps to speed up the process of validating and authenticating the transaction quickly. More so in this time of pandemic, it is easier to meet the needs of customers through the Ethermint Smart Chain, which increases customer satisfaction, thus boosting their confidence. Moreover, Smart contracts help to exploit more international transaction services and provide an instant block time finality of around 5 sec using Ethermint. The paper concludes with a description of product storage and distribution adopting the Ethermint technique. The proposed system was executed based on the Ethereum-Tendermint Smart chain. Experiments were conducted on variable block sizes and the number of transactions. The experimental results indicate that the proposed system seems to perform better than existing blockchain-based systems. Two configuration files were used, the first one was to describe the storage part, including its topology. The second one was a modified file to include the test rounds that Caliper should execute, including the running time and the workload content. Our findings indicate this is a promising technology for food supply chain storage and distribution.

A Study on the Step-up DC-DC Converter for PV System Application Under Variable Input Voltage Condition (가변 입력 전압 조건하에서 태양광 시스템 적용을 위한 승압형 DC-DC 컨버터 연구)

  • Ju-Yeop Lee;Se-Cheon Oh;Il-Hyeong Jo;Ye-Jin Kim;Yun-Seok Ko
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.4
    • /
    • pp.677-684
    • /
    • 2024
  • In this paper, the design method of a step-up DC-DC converter based on PWM control was studied for solar power system application. The operating principle of the switching mode step-up type DC-DC converter was analyzed and the basic design method was studied. For photovoltaic system application, an output voltage feedback control algorithm based on PWM control was developed to enable the converter's output voltage to follow the target voltage under variable input conditions. As a procedure to verify the effectiveness of the proposed algorithm, a prototype of a step-up DC-DC converter with a single feedback output voltage was designed and made by boosting the input voltage DC 10V to DC 30V. In experiments with prototypes, it was confirmed that the output voltage of the oscilloscope and LCD accurately followed the target output voltage. In the performance evaluation test, it was confirmed that the output voltage of the oscilloscope and LCD accurately followed the target output voltage by showing an error rate within 1 [%] of the reference voltage.

A Recommending System for Care Plan(Res-CP) in Long-Term Care Insurance System (데이터마이닝 기법을 활용한 노인장기요양급여 권고모형 개발)

  • Han, Eun-Jeong;Lee, Jung-Suk;Kim, Dong-Geon;Ka, Im-Ok
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1229-1237
    • /
    • 2009
  • In the long-term care insurance(LTCI) system, the question of how to provide the most appropriate care has become a major issue for the elderly, their family, and for policy makers. To help beneficiaries use LTC services appropriately to their needs of care, National Health Insurance Corporation(NHIC) provide them with the individualized care plan, named the Long-term Care User Guide. It includes recommendations for beneficiaries' most appropriate type of care. The purpose of this study is to develop a recommending system for care plan(Res-CP) in LTCI system. We used data set for Long-term Care User Guide in the 3rd long-term care insurance pilot programs. To develop the model, we tested four models, including a decision-tree model in data-mining, a logistic regression model, and a boosting and boosting techniques in an ensemble model. A decision-tree model was selected to describe the Res-CP, because it may be easy to explain the algorithm of Res-CP to the working groups. Res-CP might be useful in an evidence-based care planning in LTCI system and may contribute to support use of LTC services efficiently.

A study on EPB shield TBM face pressure prediction using machine learning algorithms (머신러닝 기법을 활용한 토압식 쉴드TBM 막장압 예측에 관한 연구)

  • Kwon, Kibeom;Choi, Hangseok;Oh, Ju-Young;Kim, Dongku
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.24 no.2
    • /
    • pp.217-230
    • /
    • 2022
  • The adequate control of TBM face pressure is of vital importance to maintain face stability by preventing face collapse and surface settlement. An EPB shield TBM excavates the ground by applying face pressure with the excavated soil in the pressure chamber. One of the challenges during the EPB shield TBM operation is the control of face pressure due to difficulty in managing the excavated soil. In this study, the face pressure of an EPB shield TBM was predicted using the geological and operational data acquired from a domestic TBM tunnel site. Four machine learning algorithms: KNN (K-Nearest Neighbors), SVM (Support Vector Machine), RF (Random Forest), and XGB (eXtreme Gradient Boosting) were applied to predict the face pressure. The model comparison results showed that the RF model yielded the lowest RMSE (Root Mean Square Error) value of 7.35 kPa. Therefore, the RF model was selected as the optimal machine learning algorithm. In addition, the feature importance of the RF model was analyzed to evaluate appropriately the influence of each feature on the face pressure. The water pressure indicated the highest influence, and the importance of the geological conditions was higher in general than that of the operation features in the considered site.

Ensemble Learning-Based Prediction of Good Sellers in Overseas Sales of Domestic Books and Keyword Analysis of Reviews of the Good Sellers (앙상블 학습 기반 국내 도서의 해외 판매 굿셀러 예측 및 굿셀러 리뷰 키워드 분석)

  • Do Young Kim;Na Yeon Kim;Hyon Hee Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.4
    • /
    • pp.173-178
    • /
    • 2023
  • As Korean literature spreads around the world, its position in the overseas publishing market has become important. As demand in the overseas publishing market continues to grow, it is essential to predict future book sales and analyze the characteristics of books that have been highly favored by overseas readers in the past. In this study, we proposed ensemble learning based prediction model and analyzed characteristics of the cumulative sales of more than 5,000 copies classified as good sellers published overseas over the past 5 years. We applied the five ensemble learning models, i.e., XGBoost, Gradient Boosting, Adaboost, LightGBM, and Random Forest, and compared them with other machine learning algorithms, i.e., Support Vector Machine, Logistic Regression, and Deep Learning. Our experimental results showed that the ensemble algorithm outperforms other approaches in troubleshooting imbalanced data. In particular, the LightGBM model obtained an AUC value of 99.86% which is the best prediction performance. Among the features used for prediction, the most important feature is the author's number of overseas publications, and the second important feature is publication in countries with the largest publication market size. The number of evaluation participants is also an important feature. In addition, text mining was performed on the four book reviews that sold the most among good-selling books. Many reviews were interested in stories, characters, and writers and it seems that support for translation is needed as many of the keywords of "translation" appear in low-rated reviews.