• Title/Summary/Keyword: random forest model

Search Result 532, Processing Time 0.026 seconds

A Random Forest Algorithm-based Accident Prediction to Prevent Marine Pilot Occupational Accidents

  • Gokhan Camliyurt;Won Sik Kang;Daewon Kim;Sangwon Park;Youngsoo Park
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.415-416
    • /
    • 2022
  • Marine pilot occupational accidents during transfer to/from the ship are at the top of the agenda after several safety campaigns by IMPA and individual attemptsThere is multiple transfer method for the marine pilot, but a most common way is to use the pilot cutter. This paper aims to predict marine pilot occupational accidents before it occurs by using historical data. Since the problem depends on several variables, this paper develops a model by using the random forest method to predict marine pilot accidents before happening with the random forest method by using RStudio software

  • PDF

A Study on Predicting TDI(Trophic Diatom Index) in tributaries of Han river basin using Correlation-based Feature Selection technique and Random Forest algorithm (Correlation-based Feature Selection 기법과 Random Forest 알고리즘을 이용한 한강유역 지류의 TDI 예측 연구)

  • Kim, Minkyu;Yoon, Chun Gyeong;Rhee, Han-Pil;Hwang, Soon-Jin;Lee, Sang-Woo
    • Journal of Korean Society on Water Environment
    • /
    • v.35 no.5
    • /
    • pp.432-438
    • /
    • 2019
  • The purpose of this study is to predict Trophic Diatom Index (TDI) in tributaries of the Han River watershed using the random forest algorithm. The one year (2017) and supplied aquatic ecology health data were used. The data includes water quality(BOD, T-N, $NH_3-N$, T-P, $PO_4-P$, water temperature, DO, pH, conductivity, turbidity), hydraulic factors(water width, average water depth, average velocity of water), and TDI score. Seven factors including water temperature, BOD, T-N, $NH_3-N$, T-P, $PO_4-P$, and average water depth are selected by the Correlation Feature Selection. A TDI prediction model was generated by random forest using the seven factors. To evaluate this model, 2017 data set was used first. As a result of the evaluation, $R^2$, % Difference, NSE(Nash-Sutcliffe Efficiency), RMSE(Root Mean Square Error) and accuracy rate show that this model is compatible with predicting TDI. To be more concrete, $R^2$ is 0.93, % Difference is -0.37, NSE is 0.89, RMSE is 8.22 and accuracy rate is 70.4%. Also, additional evaluation using data set more than 17 times the measured point was performed. The results were similar when the 2017 data set were used. The Wilcoxon Signed Ranks Test shows there was no statistically significant difference between actual and predicted data for the 2017 data set. These results can specify the elements which probably affect aquatic ecology health. Also, these will provide direction relative to water quality management for a watershed that must be continuously preserved.

Factors Influencing Sexual Experiences in Adolescents Using a Random Forest Model: Secondary Data Analysis of the 2019~2021 Korea Youth Risk Behavior Web-based Survey Data (랜덤 포레스트 모델을 활용한 국내 청소년 성경험 영향요인 분석 연구: 2019~2021년 청소년건강행태조사 데이터)

  • Yang, Yoonseok;Kwon, Ju Won;Yang, Youngran
    • Journal of Korean Academy of Nursing
    • /
    • v.54 no.2
    • /
    • pp.193-210
    • /
    • 2024
  • Purpose: The objective of this study was to develop a predictive model for the sexual experiences of adolescents using the random forest method and to identify the "variable importance." Methods: The study utilized data from the 2019 to 2021 Korea Youth Risk Behavior Web-based Survey, which included 86,595 man and 80,504 woman participants. The number of independent variables stood at 44. SPSS was used to conduct Rao-Scott χ2 tests and complex sample t-tests. Modeling was performed using the random forest algorithm in Python. Performance evaluation of each model included assessments of precision, recall, F1-score, receiver operating characteristics curve, and area under the curve calculations derived from the confusion matrix. Results: The prevalence of sexual experiences initially decreased during the COVID-19 pandemic, but later increased. "Variable importance" for predicting sexual experiences, ranked in the top six, included week and weekday sedentary time and internet usage time, followed by ease of cigarette purchase, age at first alcohol consumption, smoking initiation, breakfast consumption, and difficulty purchasing alcohol. Conclusion: Education and support programs for promoting adolescent sexual health, based on the top-ranking important variables, should be integrated with health behavior intervention programs addressing internet usage, smoking, and alcohol consumption. We recommend active utilization of the random forest analysis method to develop high-performance predictive models for effective disease prevention, treatment, and nursing care.

Prediction of fine dust PM10 using a deep neural network model (심층 신경망모형을 사용한 미세먼지 PM10의 예측)

  • Jeon, Seonghyeon;Son, Young Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.2
    • /
    • pp.265-285
    • /
    • 2018
  • In this study, we applied a deep neural network model to predict four grades of fine dust $PM_{10}$, 'Good, Moderate, Bad, Very Bad' and two grades, 'Good or Moderate and Bad or Very Bad'. The deep neural network model and existing classification techniques (such as neural network model, multinomial logistic regression model, support vector machine, and random forest) were applied to fine dust daily data observed from 2010 to 2015 in six major metropolitan areas of Korea. Data analysis shows that the deep neural network model outperforms others in the sense of accuracy.

Accuracy Measurement of Image Processing-Based Artificial Intelligence Models

  • Jong-Hyun Lee;Sang-Hyun Lee
    • International journal of advanced smart convergence
    • /
    • v.13 no.1
    • /
    • pp.212-220
    • /
    • 2024
  • When a typhoon or natural disaster occurs, a significant number of orchard fruits fall. This has a great impact on the income of farmers. In this paper, we introduce an AI-based method to enhance low-quality raw images. Specifically, we focus on apple images, which are being used as AI training data. In this paper, we utilize both a basic program and an artificial intelligence model to conduct a general image process that determines the number of apples in an apple tree image. Our objective is to evaluate high and low performance based on the close proximity of the result to the actual number. The artificial intelligence models utilized in this study include the Convolutional Neural Network (CNN), VGG16, and RandomForest models, as well as a model utilizing traditional image processing techniques. The study found that 49 red apple fruits out of a total of 87 were identified in the apple tree image, resulting in a 62% hit rate after the general image process. The VGG16 model identified 61, corresponding to 88%, while the RandomForest model identified 32, corresponding to 83%. The CNN model identified 54, resulting in a 95% confirmation rate. Therefore, we aim to select an artificial intelligence model with outstanding performance and use a real-time object separation method employing artificial function and image processing techniques to identify orchard fruits. This application can notably enhance the income and convenience of orchard farmers.

Machine Learning Based Domain Classification for Korean Dialog System (기계학습을 이용한 한국어 대화시스템 도메인 분류)

  • Jeong, Young-Seob
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.8
    • /
    • pp.1-8
    • /
    • 2019
  • Dialog system is becoming a new dominant interaction way between human and computer. It allows people to be provided with various services through natural language. The dialog system has a common structure of a pipeline consisting of several modules (e.g., speech recognition, natural language understanding, and dialog management). In this paper, we tackle a task of domain classification for the natural language understanding module by employing machine learning models such as convolutional neural network and random forest. For our dataset of seven service domains, we showed that the random forest model achieved the best performance (F1 score 0.97). As a future work, we will keep finding a better approach for domain classification by investigating other machine learning models.

A Study on Diabetes Management System Based on Logistic Regression and Random Forest

  • ByungJoo Kim
    • International journal of advanced smart convergence
    • /
    • v.13 no.2
    • /
    • pp.61-68
    • /
    • 2024
  • In the quest for advancing diabetes diagnosis, this study introduces a novel two-step machine learning approach that synergizes the probabilistic predictions of Logistic Regression with the classification prowess of Random Forest. Diabetes, a pervasive chronic disease impacting millions globally, necessitates precise and early detection to mitigate long-term complications. Traditional diagnostic methods, while effective, often entail invasive testing and may not fully leverage the patterns hidden in patient data. Addressing this gap, our research harnesses the predictive capability of Logistic Regression to estimate the likelihood of diabetes presence, followed by employing Random Forest to classify individuals into diabetic, pre-diabetic or nondiabetic categories based on the computed probabilities. This methodology not only capitalizes on the strengths of both algorithms-Logistic Regression's proficiency in estimating nuanced probabilities and Random Forest's robustness in classification-but also introduces a refined mechanism to enhance diagnostic accuracy. Through the application of this model to a comprehensive diabetes dataset, we demonstrate a marked improvement in diagnostic precision, as evidenced by superior performance metrics when compared to other machine learning approaches. Our findings underscore the potential of integrating diverse machine learning models to improve clinical decision-making processes, offering a promising avenue for the early and accurate diagnosis of diabetes and potentially other complex diseases.

Enhancing Internet of Things Security with Random Forest-Based Anomaly Detection

  • Ahmed Al Shihimi;Muhammad R Ahmed;Thirein Myo;Badar Al Baroomi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.6
    • /
    • pp.67-76
    • /
    • 2024
  • The Internet of Things (IoT) has revolutionized communication and device operation, but it has also brought significant security challenges. IoT networks are structured into four levels: devices, networks, applications, and services, each with specific security considerations. Personal Area Networks (PANs), Local Area Networks (LANs), and Wide Area Networks (WANs) are the three types of IoT networks, each with unique security requirements. Communication protocols such as Wi-Fi and Bluetooth, commonly used in IoT networks, are susceptible to vulnerabilities and require additional security measures. Apart from physical security, authentication, encryption, software vulnerabilities, DoS attacks, data privacy, and supply chain security pose significant challenges. Ensuring the security of IoT devices and the data they exchange is crucial. This paper utilizes the Random Forest Algorithm from machine learning to detect anomalous data in IoT devices. The dataset consists of environmental data (temperature and humidity) collected from IoT sensors in Oman. The Random Forest Algorithm is implemented and trained using Python, and the accuracy and results of the model are discussed, demonstrating the effectiveness of Random Forest for detecting IoT device data anomalies.

The Predictive QSAR Model for hERG Inhibitors Using Bayesian and Random Forest Classification Method

  • Kim, Jun-Hyoung;Chae, Chong-Hak;Kang, Shin-Myung;Lee, Joo-Yon;Lee, Gil-Nam;Hwang, Soon-Hee;Kang, Nam-Sook
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.4
    • /
    • pp.1237-1240
    • /
    • 2011
  • In this study, we have developed a ligand-based in-silico prediction model to classify chemical structures into hERG blockers using Bayesian and random forest modeling methods. These models were built based on patch clamp experimental results. The findings presented in this work indicate that Laplacian-modified naive Bayesian classification with diverse selection is useful for predicting hERG inhibitors when a large data set is not obtained.

Application and evaluation of machine-learning model for fire accelerant classification from GC-MS data of fire residue

  • Park, Chihyun;Park, Wooyong;Jeon, Sookyung;Lee, Sumin;Lee, Joon-Bae
    • Analytical Science and Technology
    • /
    • v.34 no.5
    • /
    • pp.231-239
    • /
    • 2021
  • Detection of fire accelerants from fire residues is critical to determine whether the case was arson or accidental fire. However, to develop a standardized model for determining the presence or absence of fire accelerants was not easy because of high temperature which cause disappearance or combustion of components of fire accelerants. In this study, logistic regression, random forest, and support vector machine models were trained and evaluated from a total of 728 GC-MS analysis data obtained from actual fire residues. Mean classification accuracies of the three models were 63 %, 81 %, and 84 %, respectively, and in particular, mean AU-PR values of the three models were evaluated as 0.68, 0.86, and 0.86, respectively, showing fine performances of random forest and support vector machine models.