• Title/Summary/Keyword: Random forest classification

Search Result 308, Processing Time 0.027 seconds

A Random Forest Model Based Pollution Severity Classification Scheme of High Voltage Transmission Line Insulators

  • Kannan, K.;Shivakumar, R.;Chandrasekar, S.
    • Journal of Electrical Engineering and Technology
    • /
    • v.11 no.4
    • /
    • pp.951-960
    • /
    • 2016
  • Tower insulators in electric power transmission network play a crucial role in preserving the reliability of the system. Electrical utilities frequently face the problem of flashover of insulators due to pollution deposition on their surface. Several research works based on leakage current (LC) measurement has been already carried out in developing diagnostic techniques for these insulators. Since the LC signal is highly intermittent in nature, estimation of pollution severity based on LC signal measurement over a short period of time will not produce accurate results. Reports on the measurement and analysis of LC signals over a long period of time is scanty. This paper attempts to use Random Forest (RF) classifier, which produces accurate results on large data bases, to analyze the pollution severity of high voltage tower insulators. Leakage current characteristics over a long period of time were measured in the laboratory on porcelain insulator. Pollution experiments were conducted at 11 kV AC voltage. Time domain analysis and wavelet transform technique were used to extract both basic features and histogram features of the LC signal. RF model was trained and tested with a variety of LC signals measured over a lengthy period of time and it is noticed that the proposed RF model based pollution severity classifier is efficient and will be helpful to electrical utilities for real time implementation.

Obesity Level Prediction Based on Data Mining Techniques

  • Alqahtani, Asma;Albuainin, Fatima;Alrayes, Rana;Al muhanna, Noura;Alyahyan, Eyman;Aldahasi, Ezaz
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.103-111
    • /
    • 2021
  • Obesity affects individuals of all gender and ages worldwide; consequently, several studies have performed great works to define factors causing it. This study develops an effective method to trace obesity levels based on supervised data mining techniques such as Random Forest and Multi-Layer Perception (MLP), so as to tackle this universal epidemic. Notably, the dataset was from countries like Mexico, Peru, and Colombia in the 14- 61year age group, with varying eating habits and physical conditions. The data includes 2111 instances and 17 attributes labelled using NObesity, which facilitates categorization of data using Overweight Levels l I and II, Insufficient Weight, Normal Weight, as well as Obesity Type I to III. This study found that the highest accuracy was achieved by Random Forest algorithm in comparison to the MLP algorithm, with an overall classification rate of 96.7%.

A Study of Machine Learning Model for Prediction of Swelling Waves Occurrence on East Sea (동해안 너울성 파도 예측을 위한 머신러닝 모델 연구)

  • Kang, Donghoon;Oh, Sejong
    • The Journal of Korean Institute of Information Technology
    • /
    • v.17 no.9
    • /
    • pp.11-17
    • /
    • 2019
  • In recent years, damage and loss of life and property have been occurred frequently due to swelling waves in the East Sea. Swelling waves are not easy to predict because they are caused by various factors. In this research, we build a model for predicting the swelling waves occurrence in the East Coast of Korea using machine learning technique. We collect historical data of unloading interruption in the Pohang Port, and collect air pressure, wind speed, direction, water temperature data of the offshore Pohang Port. We select important variables for prediction, and test various machine learning prediction algorithms. As a result, tide level, water temperature, and air pressure were selected, and Random Forest model produced best performance. We confirm that Random Forest model shows best performance and it produces 88.86% of accuracy

Research on the modified algorithm for improving accuracy of Random Forest classifier which identifies automatically arrhythmia (부정맥 증상을 자동으로 판별하는 Random Forest 분류기의 정확도 향상을 위한 수정 알고리즘에 대한 연구)

  • Lee, Hyun-Ju;Shin, Dong-Kyoo;Park, Hee-Won;Kim, Soo-Han;Shin, Dong-Il
    • The KIPS Transactions:PartB
    • /
    • v.18B no.6
    • /
    • pp.341-348
    • /
    • 2011
  • ECG(Electrocardiogram), a field of Bio-signal, is generally experimented with classification algorithms most of which are SVM(Support Vector Machine), MLP(Multilayer Perceptron). But this study modified the Random Forest Algorithm along the basis of signal characteristics and comparatively analyzed the accuracies of modified algorithm with those of SVM and MLP to prove the ability of modified algorithm. The R-R interval extracted from ECG is used in this study and the results of established researches which experimented co-equal data are also comparatively analyzed. As a result, modified RF Classifier showed better consequences than SVM classifier, MLP classifier and other researches' results in accuracy category. The Band-pass filter is used to extract R-R interval in pre-processing stage. However, the Wavelet transform, median filter, and finite impulse response filter in addition to Band-pass filter are often used in experiment of ECG. After this study, selection of the filters efficiently deleting the baseline wandering in pre-processing stage and study of the methods correctly extracting the R-R interval are needed.

Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games (데이터마이닝을 활용한 한국프로야구 승패예측모형 수립에 관한 연구)

  • Oh, Younhak;Kim, Han;Yun, Jaesub;Lee, Jong-Seok
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.8-17
    • /
    • 2014
  • In this research, we employed various data mining techniques to build predictive models for win-loss prediction in Korean professional baseball games. The historical data containing information about players and teams was obtained from the official materials that are provided by the KBO website. Using the collected raw data, we additionally prepared two more types of dataset, which are in ratio and binary format respectively. Dividing away-team's records by the records of the corresponding home-team generated the ratio dataset, while the binary dataset was obtained by comparing the record values. We applied seven classification techniques to three (raw, ratio, and binary) datasets. The employed data mining techniques are decision tree, random forest, logistic regression, neural network, support vector machine, linear discriminant analysis, and quadratic discriminant analysis. Among 21(= 3 datasets${\times}$7 techniques) prediction scenarios, the most accurate model was obtained from the random forest technique based on the binary dataset, which prediction accuracy was 84.14%. It was also observed that using the ratio and the binary dataset helped to build better prediction models than using the raw data. From the capability of variable selection in decision tree, random forest, and stepwise logistic regression, we found that annual salary, earned run, strikeout, pitcher's winning percentage, and four balls are important winning factors of a game. This research is distinct from existing studies in that we used three different types of data and various data mining techniques for win-loss prediction in Korean professional baseball games.

Multivariate Outlier Removing for the Risk Prediction of Gas Leakage based Methane Gas (메탄 가스 기반 가스 누출 위험 예측을 위한 다변량 특이치 제거)

  • Dashdondov, Khongorzul;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.12
    • /
    • pp.23-30
    • /
    • 2020
  • In this study, the relationship between natural gas (NG) data and gas-related environmental elements was performed using machine learning algorithms to predict the level of gas leakage risk without directly measuring gas leakage data. The study was based on open data provided by the server using the IoT-based remote control Picarro gas sensor specification. The naturel gas leaks into the air, it is a big problem for air pollution, environment and the health. The proposed method is multivariate outlier removing method based Random Forest (RF) classification for predicting risk of NG leak. After, unsupervised k-means clustering, the experimental dataset has done imbalanced data. Therefore, we focusing our proposed models can predict medium and high risk so best. In this case, we compared the receiver operating characteristic (ROC) curve, accuracy, area under the ROC curve (AUC), and mean standard error (MSE) for each classification model. As a result of our experiments, the evaluation measurements include accuracy, area under the ROC curve (AUC), and MSE; 99.71%, 99.57%, and 0.0016 for MOL_RF respectively.

Facebook Spam Post Filtering based on Instagram-based Transfer Learning and Meta Information of Posts (인스타그램 기반의 전이학습과 게시글 메타 정보를 활용한 페이스북 스팸 게시글 판별)

  • Kim, Junhong;Seo, Deokseong;Kim, Haedong;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.43 no.3
    • /
    • pp.192-202
    • /
    • 2017
  • This study develops a text spam filtering system for Facebook based on two variable categories: keywords learned from Instagram and meta-information of Facebook posts. Since there is no explicit labels for spam/ham posts, we utilize hash tags in Instagram to train classification models. In addition, the filtering accuracy is enhanced by considering meta-information of Facebook posts. To verify the proposed filtering system, we conduct an empirical experiment based on a total of 1,795,067 and 761,861 Facebook and Instagram documents, respectively. Employing random forest as a base classification algorithm, experimental result shows that the proposed filtering system yield 99% and 98% in terms of filtering accuracy and F1-measure, respectively. We expect that the proposed filtering scheme can be applied other web services suffering from massive spam posts but no explicit spam labels are available.

Predicting Administrative Issue Designation in KOSDAQ Market Using Machine Learning Techniques (머신러닝을 활용한 코스닥 관리종목지정 예측)

  • Chae, Seung-Il;Lee, Dong-Joo
    • Asia-Pacific Journal of Business
    • /
    • v.13 no.2
    • /
    • pp.107-122
    • /
    • 2022
  • Purpose - This study aims to develop machine learning models to predict administrative issue designation in KOSDAQ Market using financial data. Design/methodology/approach - Employing four classification techniques including logistic regression, support vector machine, random forest, and gradient boosting to a matched sample of five hundred and thirty-six firms over an eight-year period, the authors develop prediction models and explore the practicality of the models. Findings - The resulting four binary selection models reveal overall satisfactory classification performance in terms of various measures including AUC (area under the receiver operating characteristic curve), accuracy, F1-score, and top quartile lift, while the ensemble models (random forest and gradienct boosting) outperform the others in terms of most measures. Research implications or Originality - Although the assessment of administrative issue potential of firms is critical information to investors and financial institutions, detailed empirical investigation has lagged behind. The current research fills this gap in the literature by proposing parsimonious prediction models based on a few financial variables and validating the applicability of the models.

Comparative Evaluation of Machine Learning Models for Predicting Soccer Injury Types

  • Davronbek Malikov;Jaeho Kim;Jung Kyu Park
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.2_1
    • /
    • pp.257-268
    • /
    • 2024
  • Soccer is type of sport that carries a high risk of injury. Injury is not only cause in the unlucky soccer carrier and also team performance as well as financial effects can be worse since soccer is a team-based game. The duration of recovery from a soccer injury typically relies on its type and severity. Therefore, we conduct this research in order to predict the probability of players injury type using machine learning technologies in this paper. Furthermore, we compare different machine learning models to find the best fit model. This paper utilizes various supervised classification machine learning models, including Decision Tree, Random Forest, K-Nearest Neighbors (KNN), and Naive Bayes. Moreover, based on our finding the KNN and Decision models achieved the highest accuracy rates at 70%, surpassing other models. The Random Forest model followed closely with an accuracy score of 62%. Among the evaluated models, the Naive Bayes model demonstrated the lowest accuracy at 56%. We gathered information about 54 professional soccer players who are playing in the top five European leagues based on their career history. We gathered information about 54 professional soccer players who are playing in the top five European leagues based on their career history.

Ship Type Prediction using Random Forest with Limited Ship Information (제한적 선박 정보와 무작위의 숲 분류기를 이용한 선종 예측)

  • Ho-Kun Jeon;Jae Rim Han
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.106-107
    • /
    • 2022
  • The ship type identification of the surrounding ship is important information for navigators and VTS officers since they can estimate the maneuverability and near-future route of the ships. However, it is more than frequent that the information is not provided due to transmission trouble and seafarers' unfamiliarity with AIS. Thus, this study suggests predicting ship types through the Random Forest classifier after preparing a training and test dataset that contains ship features and types. The AIS data for Ulsan coast in 2018 was used for this study. The method may provide the effect that many navigators and VTS officers discuss and share the experience of predicting ship types.

  • PDF