• 제목/요약/키워드: random forest model

검색결과 555건 처리시간 0.032초

Verification of the Suitability of Fine Dust and Air Quality Management Systems Based on Artificial Intelligence Evaluation Models

  • Heungsup Sim
    • 한국컴퓨터정보학회논문지
    • /
    • 제29권8호
    • /
    • pp.165-170
    • /
    • 2024
  • 본 연구는 인공지능 평가 모델을 활용하여 양주시의 대기질 관리 시스템의 정확성을 검증하는 데 목적이 있다. 환경부 미세먼지 공공 데이터와 양주시 대기질 관리 시스템 데이터를 비교하여 미세먼지 데이터의 정합성과 신뢰성을 평가하였다, 이를 위해 데이터의 완전성, 유일성, 유효성, 일관성, 정확성, 무결성을 분석하였다. 데이터의 정합성을 비교하기 위해 탐색적 통계 분석을 활용하였다. 분석 결과, AI 기반 데이터 품질 지수 평가 결과, 두 데이터 세트 간에 통계적으로 유의미한 차이가 없음을 확인하였다. AI 기반 알고리즘 중 랜덤 포레스트 모델이 가장 높은 예측 정확도를 보였으며, ROC 커브와 AUC를 통해 예측 성능을 평가하였다. 특히, 랜덤 포레스트 모델은 대기질 관리 시스템의 최적화에 유용한 모델로 확인되었으며, 미세먼지 데이터의 신뢰성과 적합성을 AI 기반 모델 성능 평가로 활용할 수 있음을 확인하였다.

Simple Graphs for Complex Prediction Functions

  • Huh, Myung-Hoe;Lee, Yong-Goo
    • Communications for Statistical Applications and Methods
    • /
    • 제15권3호
    • /
    • pp.343-351
    • /
    • 2008
  • By supervised learning with p predictors, we frequently obtain a prediction function of the form $y\;=\;f(x_1,...,x_p)$. When $p\;{\geq}\;3$, it is not easy to understand the inner structure of f, except for the case the function is formulated as additive. In this study, we propose to use p simple graphs for visual understanding of complex prediction functions produced by several supervised learning engines such as LOESS, neural networks, support vector machines and random forests.

한국 산불 발생에 대한 확률 시뮬레이션 모델 개발 (Stochastic Simulation Model of Fire Occurrence in the Republic of Korea)

  • 이병두;이요한;이명보
    • 한국산림과학회지
    • /
    • 제100권1호
    • /
    • pp.70-78
    • /
    • 2011
  • 본 연구에서는 국내 과거 산불 자료를 기초로 하여 계절별 산불 발생 확률 시뮬레이션 모델을 개발하였다. 산불 발생 확률 모델은 산불 발생 사건의 시간적 분포가 과거 자료와 부합해야 하므로, 세 단계를 거쳐 생성하였다. 먼저, 산불 기간 중의 산불 발생 일은 베르누이 분포에서 임의로 추출하여 일일 단위로 산불의 발생 여부를 결정하였다. 다음 단계로, 산불이 발생하면 기하학적 다중 분포에서 임으로 추출하여 그 날 하루 중에 발생하는 산불의 수를 결정하였다. 마지막 단계로, 각 산불의 발화 시간은 포아송 분포를 가정하여 하루 중 산불 발생이 가능한 시간중 임의로 추출하여 결정하였다. 산불 발생의 확률적 분포는 과거 산불 발생 자료를 바탕으로 추정하였다. 확률 분포에 대한 중요 계수 값을 구하기 위해 최우도추정법을 이용하였다. 개발된 확률 시뮬레이션 모델에 의해 생성된 일련의 산불 발생 사건들은 과거 산불 통계 자료와 비교할 때 발생 주기 분포, 산불간의 시간 간격, 연간 일어나는 산불 총 건수에서 통계적으로 부합하는 것으로 나타났다. 본 연구의 결과는 향후 산불 관련 자원 활용 및 진화 계획 수립 시에 중요한 보조 자료로 활용될 것으로 기대된다.

기계학습모델을 이용한 저수지 수위 예측 (Reservoir Water Level Forecasting Using Machine Learning Models)

  • 서영민;최은혁;여운기
    • 한국농공학회논문집
    • /
    • 제59권3호
    • /
    • pp.97-110
    • /
    • 2017
  • This study investigates the efficiencies of machine learning models, including artificial neural network (ANN), generalized regression neural network (GRNN), adaptive neuro-fuzzy inference system (ANFIS) and random forest (RF), for reservoir water level forecasting in the Chungju Dam, South Korea. The models' efficiencies are assessed based on model efficiency indices and graphical comparison. The forecasting results of the models are dependent on lead times and the combination of input variables. For lead time t = 1 day, ANFIS1 and ANN6 models yield superior forecasting results to RF6 and GRNN6 models. For lead time t = 5 days, ANN1 and RF6 models produce better forecasting results than ANFIS1 and GRNN3 models. For lead time t = 10 days, ANN3 and RF1 models perform better than ANFIS3 and GRNN3 models. It is found that ANN model yields the best performance for all lead times, in terms of model efficiency and graphical comparison. These results indicate that the optimal combination of input variables and forecasting models depending on lead times should be applied in reservoir water level forecasting, instead of the single combination of input variables and forecasting models for all lead times.

Predicting the Invasion Potential of Pink Muhly (Muhlenbergia capillaris) in South Korea

  • Park, Jeong Soo;Choi, Donghui;Kim, Youngha
    • Proceedings of the National Institute of Ecology of the Republic of Korea
    • /
    • 제1권1호
    • /
    • pp.74-82
    • /
    • 2020
  • Predictions of suitable habitat areas can provide important information pertaining to the risk assessment and management of alien plants at early stage of their establishment. Here, we predict the invasion potential of Muhlenbergia capillaris (pink muhly) in South Korea using five bioclimatic variables. We adopt four models (generalized linear model, generalized additive model, random forest (RF), and artificial neural network) for projection based on 630 presence and 600 pseudo-absence data points. The RF model yielded the highest performance. The presence probability of M. capillaris was highest within an annual temperature range of 12 to 24℃ and with precipitation from 800 to 1,300 mm. The occurrence of M. capillaris was positively associated with the precipitation of the driest quarter. The projection map showed that suitable areas for M. capillaris are mainly concentrated in the southern coastal regions of South Korea, where temperatures and precipitation are higher than in other regions, especially in the winter season. We can conclude that M. capillaris is not considered to be invasive based on a habitat suitability map. However, there is a possibility that rising temperatures and increasing precipitation levels in winter can accelerate the expansion of this plant on the Korean Peninsula.

1D CNN과 기계 학습을 사용한 낙상 검출 (1D CNN and Machine Learning Methods for Fall Detection)

  • 김인경;김대희;노송;이재구
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제10권3호
    • /
    • pp.85-90
    • /
    • 2021
  • 본 논문에서는 고령자를 위한 개별 웨어러블(Wearable) 기기를 이용한 낙상 감지에 대해 논한다. 신뢰할 수 있는 낙상 감지를 위한 저비용 웨어러블 기기를 설계하기 위해서 대표적인 두 가지 모델을 종합적으로 분석하여 제시한다. 기계 학습 모델인 의사결정 나무(Decision Tree), 랜덤 포래스트(Random Forest), SVM(Support Vector Machine)과 심층 학습 모델인 일차원(One-Dimensional) 합성곱 신경망(Convolutional Neural Network)을 사용하여 낙상 감지 학습 능력을 정량화하였다. 또한 입력 데이터에 적용하기 위한 데이터 분할, 전처리, 특징 추출 방법 등을 고려하여 검토된 모델의 유효성을 평가한다. 실험 결과는 전반적인 성능 향상을 보여주며 심층학습 모델의 유효성을 검증한다.

Using Machine Learning Technique for Analytical Customer Loyalty

  • Mohamed M. Abbassy
    • International Journal of Computer Science & Network Security
    • /
    • 제23권8호
    • /
    • pp.190-198
    • /
    • 2023
  • To enhance customer satisfaction for higher profits, an e-commerce sector can establish a continuous relationship and acquire new customers. Utilize machine-learning models to analyse their customer's behavioural evidence to produce their competitive advantage to the e-commerce platform by helping to improve overall satisfaction. These models will forecast customers who will churn and churn causes. Forecasts are used to build unique business strategies and services offers. This work is intended to develop a machine-learning model that can accurately forecast retainable customers of the entire e-commerce customer data. Developing predictive models classifying different imbalanced data effectively is a major challenge in collected data and machine learning algorithms. Build a machine learning model for solving class imbalance and forecast customers. The satisfaction accuracy is used for this research as evaluation metrics. This paper aims to enable to evaluate the use of different machine learning models utilized to forecast satisfaction. For this research paper are selected three analytical methods come from various classifications of learning. Classifier Selection, the efficiency of various classifiers like Random Forest, Logistic Regression, SVM, and Gradient Boosting Algorithm. Models have been used for a dataset of 8000 records of e-commerce websites and apps. Results indicate the best accuracy in determining satisfaction class with both gradient-boosting algorithm classifications. The results showed maximum accuracy compared to other algorithms, including Gradient Boosting Algorithm, Support Vector Machine Algorithm, Random Forest Algorithm, and logistic regression Algorithm. The best model developed for this paper to forecast satisfaction customers and accuracy achieve 88 %.

머신러닝 기법을 활용한 논 순용수량 예측 (Prediction of Net Irrigation Water Requirement in paddy field Based on Machine Learning)

  • 김수진;배승종;장민원
    • 농촌계획
    • /
    • 제28권4호
    • /
    • pp.105-117
    • /
    • 2022
  • This study tested SVM(support vector machine), RF(random forest), and ANN(artificial neural network) machine-learning models that can predict net irrigation water requirements in paddy fields. For the Jeonju and Jeongeup meteorological stations, the net irrigation water requirement was calculated using K-HAS from 1981 to 2021 and set as the label. For each algorithm, twelve models were constructed based on cumulative precipitation, precipitation, crop evapotranspiration, and month. Compared to the CE model, the R2 of the CEP model was higher, and MAE, RMSE, and MSE were lower. Comprehensively considering learning performance and learning time, it is judged that the RF algorithm has the best usability and predictive power of five-days is better than three-days. The results of this study are expected to provide the scientific information necessary for the decision-making of on-site water managers is expected to be possible through the connection with weather forecast data. In the future, if the actual amount of irrigation and supply are measured, it is necessary to develop a learning model that reflects this.

Prediction of Global Industrial Water Demand using Machine Learning

  • Panda, Manas Ranjan;Kim, Yeonjoo
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2022년도 학술발표회
    • /
    • pp.156-156
    • /
    • 2022
  • Explicitly spatially distributed and reliable data on industrial water demand is very much important for both policy makers and researchers in order to carry a region-specific analysis of water resources management. However, such type of data remains scarce particularly in underdeveloped and developing countries. Current research is limited in using different spatially available socio-economic, climate data and geographical data from different sources in accordance to predict industrial water demand at finer resolution. This study proposes a random forest regression (RFR) model to predict the industrial water demand at 0.50× 0.50 spatial resolution by combining various features extracted from multiple data sources. The dataset used here include National Polar-orbiting Partnership (NPP)/Visible Infrared Imaging Radiometer Suite (VIIRS) night-time light (NTL), Global Power Plant database, AQUASTAT country-wise industrial water use data, Elevation data, Gross Domestic Product (GDP), Road density, Crop land, Population, Precipitation, Temperature, and Aridity. Compared with traditional regression algorithms, RF shows the advantages of high prediction accuracy, not requiring assumptions of a prior probability distribution, and the capacity to analyses variable importance. The final RF model was fitted using the parameter settings of ntree = 300 and mtry = 2. As a result, determinate coefficients value of 0.547 is achieved. The variable importance of the independent variables e.g. night light data, elevation data, GDP and population data used in the training purpose of RF model plays the major role in predicting the industrial water demand.

  • PDF

저출생 문제해결을 위한 한자녀 기혼여성의 후속 출산의향 예측: 머신러닝 방법의 적용 (Predicting the Subsequent Childbirth Intention of Married Women with One Child to Solve the Low Birth Rate Problem in Korea: Application of a Machine Learning Method)

  • 전효정
    • 한국보육지원학회지
    • /
    • 제20권2호
    • /
    • pp.127-143
    • /
    • 2024
  • Objective: The purpose of this study is to develop a machine learning model to predict the subsequent childbirth intention of married women with one child, aiming to address the low birth rate problem in Korea, This will be achieved by utilizing data from the 2021 Family and Childbirth Survey conducted by the Korea Institute for Health and Social Affairs. Methods: A prediction model was developed using the Random Forest algorithm to predict the subsequent childbirth intention of married women with one child. This algorithm was chosen for its advantages in prediction and generalization, and its performance was evaluated. Results: The significance of variables influencing the Random Forest prediction model was confirmed. With the exception of the presence or absence of leave before and after childbirth, most variables contributed to predicting the intention to have subsequent childbirth. Notably, variables such as the mother's age, number of children planned at the time of marriage, average monthly household income, spouse's share of childcare burden, mother's weekday housework hours, and presence or absence of spouse's maternity leave emerged as relatively important predictors of subsequent childbirth intention.