• Title/Summary/Keyword: ensemble learning models

Search Result 197, Processing Time 0.025 seconds

Improving an Ensemble Model by Optimizing Bootstrap Sampling (부트스트랩 샘플링 최적화를 통한 앙상블 모형의 성능 개선)

  • Min, Sung-Hwan
    • Journal of Internet Computing and Services
    • /
    • v.17 no.2
    • /
    • pp.49-57
    • /
    • 2016
  • Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving prediction accuracy. Bagging is one of the most popular ensemble learning techniques. Bagging has been known to be successful in increasing the accuracy of prediction of the individual classifiers. Bagging draws bootstrap samples from the training sample, applies the classifier to each bootstrap sample, and then combines the predictions of these classifiers to get the final classification result. Bootstrap samples are simple random samples selected from the original training data, so not all bootstrap samples are equally informative, due to the randomness. In this study, we proposed a new method for improving the performance of the standard bagging ensemble by optimizing bootstrap samples. A genetic algorithm is used to optimize bootstrap samples of the ensemble for improving prediction accuracy of the ensemble model. The proposed model is applied to a bankruptcy prediction problem using a real dataset from Korean companies. The experimental results showed the effectiveness of the proposed model.

Harvest Forecasting Improvement Using Federated Learning and Ensemble Model

  • Ohnmar Khin;Jin Gwang Koh;Sung Keun Lee
    • Smart Media Journal
    • /
    • v.12 no.10
    • /
    • pp.9-18
    • /
    • 2023
  • Harvest forecasting is the great demand of multiple aspects like temperature, rain, environment, and their relations. The existing study investigates the climate conditions and aids the cultivators to know the harvest yields before planting in farms. The proposed study uses federated learning. In addition, the additional widespread techniques such as bagging classifier, extra tees classifier, linear discriminant analysis classifier, quadratic discriminant analysis classifier, stochastic gradient boosting classifier, blending models, random forest regressor, and AdaBoost are utilized together. These presented nine algorithms achieved exemplary satisfactory accuracies. The powerful contributions of proposed algorithms can create exact harvest forecasting. Ultimately, we intend to compare our study with the earlier research's results.

Pareto RBF network ensemble using multi-objective evolutionary computation

  • Kondo, Nobuhiko;Hatanaka, Toshiharu;Uosaki, Katsuji
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.925-930
    • /
    • 2005
  • In this paper, evolutionary multi-objective selection method of RBF networks structure is considered. The candidates of RBF network structure are encoded into the chromosomes in GAs. Then, they evolve toward Pareto-optimal front defined by several objective functions concerning with model accuracy and model complexity. An ensemble network constructed by such Pareto-optimal models is also considered in this paper. Some numerical simulation results indicate that the ensemble network is much robust for the case of existence of outliers or lack of data, than one selected in the sense of information criteria.

  • PDF

Classification for Imbalanced Breast Cancer Dataset Using Resampling Methods

  • Hana Babiker, Nassar
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.89-95
    • /
    • 2023
  • Analyzing breast cancer patient files is becoming an exciting area of medical information analysis, especially with the increasing number of patient files. In this paper, breast cancer data is collected from Khartoum state hospital, and the dataset is classified into recurrence and no recurrence. The data is imbalanced, meaning that one of the two classes have more sample than the other. Many pre-processing techniques are applied to classify this imbalanced data, resampling, attribute selection, and handling missing values, and then different classifiers models are built. In the first experiment, five classifiers (ANN, REP TREE, SVM, and J48) are used, and in the second experiment, meta-learning algorithms (Bagging, Boosting, and Random subspace). Finally, the ensemble model is used. The best result was obtained from the ensemble model (Boosting with J48) with the highest accuracy 95.2797% among all the algorithms, followed by Bagging with J48(90.559%) and random subspace with J48(84.2657%). The breast cancer imbalanced dataset was classified into recurrence, and no recurrence with different classified algorithms and the best result was obtained from the ensemble model.

Detecting Fake Job Recruitment with a Machine Learning Approach (머신 러닝 접근 방식을 통한 가짜 채용 탐지)

  • Taghiyev Ilkin;Jae Heung Lee
    • Smart Media Journal
    • /
    • v.12 no.2
    • /
    • pp.36-41
    • /
    • 2023
  • With the advent of applicant tracking systems, online recruitment has become more popular, and recruitment fraud has become a serious problem. This research aims to develop a reliable model to detect recruitment fraud in online recruitment environments to reduce cost losses and enhance privacy. The main contribution of this paper is to provide an automated methodology that leverages insights gained from exploratory analysis of data to distinguish which job postings are fraudulent and which are legitimate. Using EMSCAD, a recruitment fraud dataset provided by Kaggle, we trained and evaluated various single-classifier and ensemble-classifier-based machine learning models, and found that the ensemble classifier, the random forest classifier, performed best with an accuracy of 98.67% and an F1 score of 0.81.

Robust Sentiment Classification of Metaverse Services Using a Pre-trained Language Model with Soft Voting

  • Haein Lee;Hae Sun Jung;Seon Hong Lee;Jang Hyun Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2334-2347
    • /
    • 2023
  • Metaverse services generate text data, data of ubiquitous computing, in real-time to analyze user emotions. Analysis of user emotions is an important task in metaverse services. This study aims to classify user sentiments using deep learning and pre-trained language models based on the transformer structure. Previous studies collected data from a single platform, whereas the current study incorporated the review data as "Metaverse" keyword from the YouTube and Google Play Store platforms for general utilization. As a result, the Bidirectional Encoder Representations from Transformers (BERT) and Robustly optimized BERT approach (RoBERTa) models using the soft voting mechanism achieved a highest accuracy of 88.57%. In addition, the area under the curve (AUC) score of the ensemble model comprising RoBERTa, BERT, and A Lite BERT (ALBERT) was 0.9458. The results demonstrate that the ensemble combined with the RoBERTa model exhibits good performance. Therefore, the RoBERTa model can be applied on platforms that provide metaverse services. The findings contribute to the advancement of natural language processing techniques in metaverse services, which are increasingly important in digital platforms and virtual environments. Overall, this study provides empirical evidence that sentiment analysis using deep learning and pre-trained language models is a promising approach to improving user experiences in metaverse services.

Forecasting of Iron Ore Prices using Machine Learning (머신러닝을 이용한 철광석 가격 예측에 대한 연구)

  • Lee, Woo Chang;Kim, Yang Sok;Kim, Jung Min;Lee, Choong Kwon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.2
    • /
    • pp.57-72
    • /
    • 2020
  • The price of iron ore has continued to fluctuate with high demand and supply from many countries and companies. In this business environment, forecasting the price of iron ore has become important. This study developed the machine learning model forecasting the price of iron ore a one month after the trading events. The forecasting model used distributed lag model and deep learning models such as MLP (Multi-layer perceptron), RNN (Recurrent neural network) and LSTM (Long short-term memory). According to the results of comparing individual models through metrics, LSTM showed the lowest predictive error. Also, as a result of comparing the models using the ensemble technique, the distributed lag and LSTM ensemble model showed the lowest prediction.

Performance comparison on vocal cords disordered voice discrimination via machine learning methods (기계학습에 의한 후두 장애음성 식별기의 성능 비교)

  • Cheolwoo Jo;Soo-Geun Wang;Ickhwan Kwon
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.35-43
    • /
    • 2022
  • This paper studies how to improve the identification rate of laryngeal disability speech data by convolutional neural network (CNN) and machine learning ensemble learning methods. In general, the number of laryngeal dysfunction speech data is small, so even if identifiers are constructed by statistical methods, the phenomenon caused by overfitting depending on the training method can lead to a decrease the identification rate when exposed to external data. In this work, we try to combine results derived from CNN models and machine learning models with various accuracy in a multi-voting manner to ensure improved classification efficiency compared to the original trained models. The Pusan National University Hospital (PNUH) dataset was used to train and validate algorithms. The dataset contains normal voice and voice data of benign and malignant tumors. In the experiment, an attempt was made to distinguish between normal and benign tumors and malignant tumors. As a result of the experiment, the random forest method was found to be the best ensemble method and showed an identification rate of 85%.

Tor Network Website Fingerprinting Using Statistical-Based Feature and Ensemble Learning of Traffic Data (트래픽 데이터의 통계적 기반 특징과 앙상블 학습을 이용한 토르 네트워크 웹사이트 핑거프린팅)

  • Kim, Junho;Kim, Wongyum;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.6
    • /
    • pp.187-194
    • /
    • 2020
  • This paper proposes a website fingerprinting method using ensemble learning over a Tor network that guarantees client anonymity and personal information. We construct a training problem for website fingerprinting from the traffic packets collected in the Tor network, and compare the performance of the website fingerprinting system using tree-based ensemble models. A training feature vector is prepared from the general information, burst, cell sequence length, and cell order that are extracted from the traffic sequence, and the features of each website are represented with a fixed length. For experimental evaluation, we define four learning problems (Wang14, BW, CWT, CWH) according to the use of website fingerprinting, and compare the performance with the support vector machine model using CUMUL feature vectors. In the experimental evaluation, the proposed statistical-based training feature representation is superior to the CUMUL feature representation except for the BW case.

Shield TBM disc cutter replacement and wear rate prediction using machine learning techniques

  • Kim, Yunhee;Hong, Jiyeon;Shin, Jaewoo;Kim, Bumjoo
    • Geomechanics and Engineering
    • /
    • v.29 no.3
    • /
    • pp.249-258
    • /
    • 2022
  • A disc cutter is an excavation tool on a tunnel boring machine (TBM) cutterhead; it crushes and cuts rock mass while the machine excavates using the cutterhead's rotational movement. Disc cutter wear occurs naturally. Thus, along with the management of downtime and excavation efficiency, abrasioned disc cutters need to be replaced at the proper time; otherwise, the construction period could be delayed and the cost could increase. The most common prediction models for TBM performance and for the disc cutter lifetime have been proposed by the Colorado School of Mines and Norwegian University of Science and Technology. However, design parameters of existing models do not well correspond to the field values when a TBM encounters complex and difficult ground conditions in the field. Thus, this study proposes a series of machine learning models to predict the disc cutter lifetime of a shield TBM using the excavation (machine) data during operation which is response to the rock mass. This study utilizes five different machine learning techniques: four types of classification models (i.e., K-Nearest Neighbors (KNN), Support Vector Machine, Decision Tree, and Staking Ensemble Model) and one artificial neural network (ANN) model. The KNN model was found to be the best model among the four classification models, affording the highest recall of 81%. The ANN model also predicted the wear rate of disc cutters reasonably well.