• 제목/요약/키워드: Random Forest Prediction Model

검색결과 300건 처리시간 0.027초

기계학습 알고리즘을 이용한 보행만족도 예측모형 개발 (Developing a Pedestrian Satisfaction Prediction Model Based on Machine Learning Algorithms)

  • 이제승;이현희
    • 국토계획
    • /
    • 제54권3호
    • /
    • pp.106-118
    • /
    • 2019
  • In order to develop pedestrian navigation service that provides optimal pedestrian routes based on pedestrian satisfaction levels, it is required to develop a prediction model that can estimate a pedestrian's satisfaction level given a certain condition. Thus, the aim of the present study is to develop a pedestrian satisfaction prediction model based on three machine learning algorithms: Logistic Regression, Random Forest, and Artificial Neural Network models. The 2009, 2012, 2013, 2014, and 2015 Pedestrian Satisfaction Survey Data in Seoul, Korea are used to train and test the machine learning models. As a result, the Random Forest model shows the best prediction performance among the three (Accuracy: 0.798, Recall: 0.906, Precision: 0.842, F1 Score: 0.873, AUC: 0.795). The performance of Artificial Neural Network is the second (Accuracy: 0.773, Recall: 0.917, Precision: 0.811, F1 Score: 0.868, AUC: 0.738) and Logistic Regression model's performance follows the second (Accuracy: 0.764, Recall: 1.000, Precision: 0.764, F1 Score: 0.868, AUC: 0.575). The precision score of the Random Forest model implies that approximately 84.2% of pedestrians may be satisfied if they walk the areas, suggested by the Random Forest model.

Performance Comparison Analysis of Artificial Intelligence Models for Estimating Remaining Capacity of Lithium-Ion Batteries

  • Kyu-Ha Kim;Byeong-Soo Jung;Sang-Hyun Lee
    • International Journal of Advanced Culture Technology
    • /
    • 제11권3호
    • /
    • pp.310-314
    • /
    • 2023
  • The purpose of this study is to predict the remaining capacity of lithium-ion batteries and evaluate their performance using five artificial intelligence models, including linear regression analysis, decision tree, random forest, neural network, and ensemble model. We is in the study, measured Excel data from the CS2 lithium-ion battery was used, and the prediction accuracy of the model was measured using evaluation indicators such as mean square error, mean absolute error, coefficient of determination, and root mean square error. As a result of this study, the Root Mean Square Error(RMSE) of the linear regression model was 0.045, the decision tree model was 0.038, the random forest model was 0.034, the neural network model was 0.032, and the ensemble model was 0.030. The ensemble model had the best prediction performance, with the neural network model taking second place. The decision tree model and random forest model also performed quite well, and the linear regression model showed poor prediction performance compared to other models. Therefore, through this study, ensemble models and neural network models are most suitable for predicting the remaining capacity of lithium-ion batteries, and decision tree and random forest models also showed good performance. Linear regression models showed relatively poor predictive performance. Therefore, it was concluded that it is appropriate to prioritize ensemble models and neural network models in order to improve the efficiency of battery management and energy systems.

Random Forest를 이용한 남한지역 쌀 수량 예측 연구 (Rice yield prediction in South Korea by using random forest)

  • 김준환;이주석;상완규;신평;조현숙;서명철
    • 한국농림기상학회지
    • /
    • 제21권2호
    • /
    • pp.75-84
    • /
    • 2019
  • 이 연구의 목적은 random forest 를 활용하여 기상요소만을 이용하여 우리나라 전체의 벼 평균수량을 예측하는데 있다. Random forest 는 예측에 사용되는 각 predictor variable 을 분리할 수 있는데 이를 통해 분리된 시계열 상의 추세가 비정상적인 증가형태를 보였다. 이는 결국 예측능력의 저하로 이어지기 때문에 이를 제거할 필요가 있고 본 연구에서는 이동 평균을 이용하여 제거한 후 예측을 하였다. 1991 년부터 2005 년까지의 기상자료와 수량자료를 학습에 사용하였고 2006 년부터 2015 년까지의 자료들을 검증용으로 사용하였다. 학습자료에 대해서는 상당히 정확한 예측 능력을 보여주었으나 검증 자료에서는 그렇지 못하였다. 그 이유를 분석하기 위해 학습 자료와 검증자료에 대해서 각각 변수 중요도를 산출하여 비교한 결과 두 자료 간에 월별 기상 자료에 대한 중요도가 변동되었음을 발견하였다. 이러하 차이가 발생한 이유는 학습자료와 검증 자료에서의 전국적으로 표준이앙기가 이동하여 벼의 생육기간 자체가 변하였기 때문이다. 따라서, 정확한 예측을 위해서는 지역별 파종기 또는 이앙기에 대한 자료가 필요하며 단순히 기상 자료만을 활용한 예측은 어려운 것으로 생긱된다.

랜덤 포레스트 기법을 이용한 건설현장 안전재해 예측 모형 기초 연구 (Basic Study on Safety Accident Prediction Model Using Random Forest in Construction Field)

  • 강경수;류한국
    • 한국건축시공학회:학술대회논문집
    • /
    • 한국건축시공학회 2018년도 추계 학술논문 발표대회
    • /
    • pp.59-60
    • /
    • 2018
  • The purpose of this study is to predict and classify the accident types based on the KOSHA (Korea Occupational Safety & Health Agency) and weather data. We also have an effort to suggest an important management method according to accident types by deriving feature importance. We designed two models based on accident data and weather data (model(a)) and only weather data (model(b)). As a result of random forest method, the model(b) showed a lack of accuracy in prediction. However, the model(a) presented more accurate prediction results than the model(b). Thus we presented safety management plan based on the results. In the future, this study will continue to carry out real time prediction to occurrence types to prevent safety accidents by supplementing the real time accident data and weather data.

  • PDF

A Mixed-effects Height-Diameter Model for Pinus densiflora Trees in Gangwon Province, Korea

  • Lee, Young Jin;Coble, Dean W.;Pyo, Jung Kee;Kim, Sung Ho;Lee, Woo Kyun;Choi, Jung Kee
    • 한국산림과학회지
    • /
    • 제98권2호
    • /
    • pp.178-182
    • /
    • 2009
  • A new mixed-effects model was developed that predicts individual-tree total height for Pinus densiflora trees in Gangwon province as a function of individual-tree diameter (cm). The mixed-effects model contains two random-effects parameters. Maximum likelihood estimation was used to fit the model to 560 height-diameter observations of individual trees measured throughout Gwangwon province in 2007 as part of the National Forest Inventory Program in Korea. The new model is an improvement over fixed-effects models because it can be calibrated to a local area, such as an inventory plot or individual stand. The new model also appears to be an improvement over the Forest Resources Evaluation and Prediction Program for the ten calibration trees used in this study. An example is provided that describes how to estimate the random-effects parameters using ten calibration trees.

영화 관객 수 예측을 위한 기계학습 기법의 성능 평가 연구 (A Study on the Performance Evaluation of Machine Learning for Predicting the Number of Movie Audiences)

  • 정찬미;민대기
    • 한국전자거래학회지
    • /
    • 제25권2호
    • /
    • pp.49-63
    • /
    • 2020
  • 영화 제작에 막대한 비용이 투입되지만 관객수요는 매우 불확실하기 때문에 개선된 수요예측은 수익 개선을 위한 의사결정의 중요 수단으로 활용될 수 있다. 본 연구에서는 영화의 개봉 후 수요를 예측함에 있어 기계학습 기법의 적용 타당성을 예측 성능의 관점에서 검증하였다. 분석결과를 종합하면 다음과 같다. 첫째, 대안변수에 대한 통계적 검증 결과 기본 영화 특성(감독, 배우)과 함께 개봉 후 2주차까지의 스크린수, 상영횟수, 관객수, 주요 배우에 대한 관심도 등 시계열 자료가 수요예측에 유의미한 것을 확인하였다. 둘째, Random Forest Classifier와 SVM(Support Vector Machine) 등 분류 기반 기계학습 기법과 Random Forest Regressor와 k-NN Regressor와 같은 회귀모형 기반 기계학습 기법에 적용하여 예측 성능을 평가한 결과, Random Forest 기법이 우수한 결과를 보였다. 셋째, 누적관객수가 1분위보다 작은 영화에서 회귀모형 기반 기법은 낮은 예측 정확도를 보였으며, 분류기반 기법은 반대로 가장 우수한 결과를 얻었다. 즉, 영화 수요의 분포 특성에 따라서 차별화된 기계학습 기법을 적용하는 것이 필요하다.

머신러닝 기반 신체 계측정보를 이용한 CT 피폭선량 예측모델 비교 (Comparison of CT Exposure Dose Prediction Models Using Machine Learning-based Body Measurement Information)

  • 홍동희
    • 대한방사선기술학회지:방사선기술과학
    • /
    • 제43권6호
    • /
    • pp.503-509
    • /
    • 2020
  • This study aims to develop a patient-specific radiation exposure dose prediction model based on anthropometric data that can be easily measurable during CT examination, and to be used as basic data for DRL setting and radiation dose management system in the future. In addition, among the machine learning algorithms, the most suitable model for predicting exposure doses is presented. The data used in this study were chest CT scan data, and a data set was constructed based on the data including the patient's anthropometric data. In the pre-processing and sample selection of the data, out of the total number of samples of 250 samples, only chest CT scans were performed without using a contrast agent, and 110 samples including height and weight variables were extracted. Of the 110 samples extracted, 66% was used as a training set, and the remaining 44% were used as a test set for verification. The exposure dose was predicted through random forest, linear regression analysis, and SVM algorithm using Orange version 3.26.0, an open software as a machine learning algorithm. Results Algorithm model prediction accuracy was R^2 0.840 for random forest, R^2 0.969 for linear regression analysis, and R^2 0.189 for SVM. As a result of verifying the prediction rate of the algorithm model, the random forest is the highest with R^2 0.986 of the random forest, R^2 0.973 of the linear regression analysis, and R^2 of 0.204 of the SVM, indicating that the model has the best predictive power.

지진 취약성 평가 모델 교차검증: 경주(2016)와 포항(2017) 지진을 대상으로 (A Cross-Validation of SeismicVulnerability Assessment Model: Application to Earthquake of 9.12 Gyeongju and 2017 Pohang)

  • 한지혜;김진수
    • 대한원격탐사학회지
    • /
    • 제37권3호
    • /
    • pp.649-655
    • /
    • 2021
  • 본 연구는 경주시를 대상으로 수행한 선행연구를 바탕으로 도출된 최적의 지진 취약성 평가 모델을 타 지역에 적용하여 그 성능을 교차 검증(cross-validation)하고자 한다. 테스트 지역은 2017 포항지진(Pohang Earthquake)이 발생한 포항시이며, 선행연구와 동일한 영향인자 및 피해현황 관련 데이터셋을 구축하였다. 검증 데이터 셋은 무작위로 추출해 구축하였으며, 경주시의 랜덤 포레스트(random forest, RF) 기반의 모델에 적용하여 예측 정확도를 도출하였다. 경주시의 모델(success) 및 예측(prediction) 정확도는 100%, 94.9%이며, 포항시 검증 데이터 셋을 적용해 예측 정확도를 확인한 결과 70.4%로 나타났다.

제주 실시간 일사량의 기계학습 예측 기법 연구 (A Study on Prediction Techniques through Machine Learning of Real-time Solar Radiation in Jeju)

  • 이영미;배주현;박정근
    • 한국환경과학회지
    • /
    • 제26권4호
    • /
    • pp.521-527
    • /
    • 2017
  • Solar radiation forecasts are important for predicting the amount of ice on road and the potential solar energy. In an attempt to improve solar radiation predictability in Jeju, we conducted machine learning with various data mining techniques such as tree models, conditional inference tree, random forest, support vector machines and logistic regression. To validate machine learning models, the results from the simulation was compared with the solar radiation data observed over Jeju observation site. According to the model assesment, it can be seen that the solar radiation prediction using random forest is the most effective method. The error rate proposed by random forest data mining is 17%.

Default Prediction of Automobile Credit Based on Support Vector Machine

  • Chen, Ying;Zhang, Ruirui
    • Journal of Information Processing Systems
    • /
    • 제17권1호
    • /
    • pp.75-88
    • /
    • 2021
  • Automobile credit business has developed rapidly in recent years, and corresponding default phenomena occur frequently. Credit default will bring great losses to automobile financial institutions. Therefore, the successful prediction of automobile credit default is of great significance. Firstly, the missing values are deleted, then the random forest is used for feature selection, and then the sample data are randomly grouped. Finally, six prediction models of support vector machine (SVM), random forest and k-nearest neighbor (KNN), logistic, decision tree, and artificial neural network (ANN) are constructed. The results show that these six machine learning models can be used to predict the default of automobile credit. Among these six models, the accuracy of decision tree is 0.79, which is the highest, but the comprehensive performance of SVM is the best. And random grouping can improve the efficiency of model operation to a certain extent, especially SVM.