• 제목/요약/키워드: Random Forests

검색결과 111건 처리시간 0.024초

Application of a comparative analysis of random forest programming to predict the strength of environmentally-friendly geopolymer concrete

  • Ying Bi;Yeng Yi
    • Steel and Composite Structures
    • /
    • 제50권4호
    • /
    • pp.443-458
    • /
    • 2024
  • The construction industry, one of the biggest producers of greenhouse emissions, is under a lot of pressure as a result of growing worries about how climate change may affect local communities. Geopolymer concrete (GPC) has emerged as a feasible choice for construction materials as a result of the environmental issues connected to the manufacture of cement. The findings of this study contribute to the development of machine learning methods for estimating the properties of eco-friendly concrete, which might be used in lieu of traditional concrete to reduce CO2 emissions in the building industry. In the present work, the compressive strength (fc) of GPC is calculated using random forests regression (RFR) methodology where natural zeolite (NZ) and silica fume (SF) replace ground granulated blast-furnace slag (GGBFS). From the literature, a thorough set of experimental experiments on GPC samples were compiled, totaling 254 data rows. The considered RFR integrated with artificial hummingbird optimization (AHA), black widow optimization algorithm (BWOA), and chimp optimization algorithm (ChOA), abbreviated as ARFR, BRFR, and CRFR. The outcomes obtained for RFR models demonstrated satisfactory performance across all evaluation metrics in the prediction procedure. For R2 metric, the CRFR model gained 0.9988 and 0.9981 in the train and test data set higher than those for BRFR (0.9982 and 0.9969), followed by ARFR (0.9971 and 0.9956). Some other error and distribution metrics depicted a roughly 50% improvement for CRFR respect to ARFR.

Simple Graphs for Complex Prediction Functions

  • Huh, Myung-Hoe;Lee, Yong-Goo
    • Communications for Statistical Applications and Methods
    • /
    • 제15권3호
    • /
    • pp.343-351
    • /
    • 2008
  • By supervised learning with p predictors, we frequently obtain a prediction function of the form $y\;=\;f(x_1,...,x_p)$. When $p\;{\geq}\;3$, it is not easy to understand the inner structure of f, except for the case the function is formulated as additive. In this study, we propose to use p simple graphs for visual understanding of complex prediction functions produced by several supervised learning engines such as LOESS, neural networks, support vector machines and random forests.

PITCHf/x를 이용한 투구의 질 평가 (Evaluating the quality of baseball pitch using PITCHf/x)

  • 박성민;장원철
    • 응용통계연구
    • /
    • 제33권2호
    • /
    • pp.171-184
    • /
    • 2020
  • 미국 메이저리그 야구 경기는 야구공을 추적하는 3대의 고속 카메라를 통해 모든 투구에 대한 궤적 데이터 PITCHf/x를 수집하고 공개한다. 선행 연구에서는 PITCHf/x 데이터를 통해 각 투구의 기대 피루타수를 계산하고 이를 토대로 투구의 질을 평가했다. 다만 기대 피루타수는 경기 득점으로 매번 이어지지 않기 때문에 각 투구가 승리에 기여하는 영향을 직접적으로 평가하지 못한다. 이 논문에서는 득점 기댓값과 득점 가치의 개념을 조합해 투구에 대한 기대 득점 가치를 계산하고 이를 통해 투구의 질을 랜덤 포레스트 모형으로 평가한 뒤, 기대 피루타수를 이용한 투구의 질 평가와 비교 분석한다.

Language Matters: A Systemic Functional Linguistics-Enhanced Machine Learning Framework for Cyberbullying Detection

  • Raghad Altowairgi;Ala Eshamwi;Lobna Hsairi
    • International Journal of Computer Science & Network Security
    • /
    • 제23권9호
    • /
    • pp.192-198
    • /
    • 2023
  • Cyberbullying is a growing problem among adolescents and can have serious psychological and emotional consequences for the victims. In recent years, machine learning techniques have emerged as promising approach for detecting instances of cyberbullying in online communication. This research paper focuses on developing a machine learning models that are able to detect cyberbullying including support vector machines, naïve bayes, and random forests. The study uses a dataset of real-world examples of cyberbullying collected from Twitter and extracts features that represents the ideational metafunction, then evaluates the performance of each algorithm before and after considering the theory of systemic functional linguistics in terms of precision, recall, and F1-score. The result indicates that all three algorithms are effective at detecting cyberbullying with 92% for naïve bayes and an accuracy of 93% for both SVM and random forests. However, the study also highlights the challenges of accurately detecting cyberbullying, particularly given the nuanced and context-dependent nature of online communication. This paper concludes by discussing the implications of these findings for future research and the development of practical tool for cyberbullying prevention and intervention.

의사결정중심 다목적댐 이치수 안전도 기후변화 영향평가 (A decision-centric assessment of flood risk and supply reliability at a multi-purpose reservoir under climate change)

  • 김대하;김은희;이승철;김은지
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2022년도 학술발표회
    • /
    • pp.112-112
    • /
    • 2022
  • 본 연구에서는 2005-2020년 용담댐의 운영방식이 기후변화에 얼마나 취약한 지 홍수위험과 이수 안전도 지표를 중심으로 평가하였다. 유입량 모의를 위해 GR6J 강우-유출 모형을 사용했고, 댐 운영룰 추출을 위해 Random Forests 모형을 관측자료에 적합시켰다. 294개의 추계학적 기후스트레스 시계열을 GR6J 모형에 입력해 일유입량을 모의한 후 Random Forests 모형으로 방류량과 저수량을 추정하여 연최대일방류량과 공급신뢰도를 분석하였다. 공급신뢰도는 평균강수량 변화에 주로 영향을 받는 것으로 나타났지만 연최대방류량은 평균강수량과 강수변동성 변화에 모두 민감하게 반응하는 것을 알 수 있었다. 2021-2040년 용담댐 저수량은 평균강수량 증가로 인해 공급신뢰도는 과도하게 상승할 것으로 전망되었다. 하지만 강수변동성 증가 인해 20년 빈도 연최대방류량은 가파르게 상승해 댐 하류지역의 홍수위험은 더 가중될 것으로 전망되었다.

  • PDF

An Efficient Pedestrian Detection Approach Using a Novel Split Function of Hough Forests

  • Do, Trung Dung;Vu, Thi Ly;Nguyen, Van Huan;Kim, Hakil;Lee, Chongho
    • Journal of Computing Science and Engineering
    • /
    • 제8권4호
    • /
    • pp.207-214
    • /
    • 2014
  • In pedestrian detection applications, one of the most popular frameworks that has received extensive attention in recent years is widely known as a 'Hough forest' (HF). To improve the accuracy of detection, this paper proposes a novel split function to exploit the statistical information of the training set stored in each node during the construction of the forest. The proposed split function makes the trees in the forest more robust to noise and illumination changes. Moreover, the errors of each stage in the training forest are minimized using a global loss function to support trees to track harder training samples. After having the forest trained, the standard HF detector follows up to search for and localize instances in the image. Experimental results showed that the detection performance of the proposed framework was improved significantly with respect to the standard HF and alternating decision forest (ADF) in some public datasets.

Comparison of Frequencies in Order to Estimate of Tree Species Diversity in Caspian Forests of Iran

  • Mirzaei, Mehrdad;Bahnemiry, Atefeh Karimiyan;Abkenar, Kambiz Taheri
    • Journal of Forest and Environmental Science
    • /
    • 제35권1호
    • /
    • pp.1-5
    • /
    • 2019
  • Species diversity is one of the most important indices that used to evaluate the sustainability of forest communities. In the present study, three variables including number of individuals (frequency of species), basal area and volume of tree species were compared to estimate tree species diversity in broadleaves forests of Iran. Based on systematic random design, 30 plots (circle plot, $1000m^2$) was selected. Type of species, number of species, DBH and height of trees were measured. Simpson (1-D), Hill ($N_2$), Shannon-Wiener (H'), Mc Arthur ($N_1$), Smith-Wilson ($E_{var}$) and Margalef ($R_1$) indices used to estimate tree species diversity. Species diversity was calculated in each plot. ANOVA test showed that there was a significant difference between of three variables used for estimation of species diversity. Number of trees variable has more precision than basal area and volume variables to estimate of species diversity. But Duncan test revealed that there were significant difference between of basal area and volume variables with number of trees. Therefore, basal area and volume variables were selected as more suitable variables in order to estimate of biodiversity indices in northern forests of Iran.

랜덤 포레스트를 이용한 한국어 상호참조 해결 (Coreference Resolution for Korean Using Random Forests)

  • 정석원;최맹식;김학수
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제5권11호
    • /
    • pp.535-540
    • /
    • 2016
  • 상호참조 해결은 문서 내에 존재하는 멘션들을 식별하고, 참조하는 멘션끼리 군집화하는 것으로 정보 추출, 사건 추적, 질의응답과 같은 자연어처리 응용에 필수적인 과정이다. 최근에는 기계학습에 기반한 다양한 상호참조 해결 모델들이 제안되었으며, 잘 알려진 것처럼 이런 기계학습 기반 모델들은 상호참조 멘션 태그들이 수동으로 부착된 대량의 학습 데이터를 필요로 한다. 그러나 한국어에서는 기계학습 모델들을 학습할 가용한 공개 데이터가 존재하지 않는다. 그러므로 본 논문에서는 다른 기계학습 모델보다 적은 학습 데이터를 필요로 하는 효율적인 상호참조 해결 모델을 제안한다. 제안 모델은 시브-가이드 자질 기반의 랜덤 포레스트를 사용하여 상호참조하는 멘션들을 구분한다. 야구 뉴스 기사를 이용한 실험에서 제안 모델은 다른 기계학습 모델보다 높은 0.6678의 CoNLL F1-점수를 보였다.

랜덤포레스트를 이용한 기상 환경에 따른 이상기온 분류 (Classification Abnormal temperatures based on Meteorological Environment using Random forests)

  • 김윤수;송광윤;장인홍
    • 통합자연과학논문집
    • /
    • 제17권1호
    • /
    • pp.1-12
    • /
    • 2024
  • Many abnormal climate events are occurring around the world. The cause of abnormal climate is related to temperature. Factors that affect temperature include excessive emissions of carbon and greenhouse gases from a global perspective, and air circulation from a local perspective. Due to the air circulation, many abnormal climate phenomena such as abnormally high temperature and abnormally low temperature are occurring in certain areas, which can cause very serious human damage. Therefore, the problem of abnormal temperature should not be approached only as a case of climate change, but should be studied as a new category of climate crisis. In this study, we proposed a model for the classification of abnormal temperature using random forests based on various meteorological data such as longitudinal observations, yellow dust, ultraviolet radiation from 2018 to 2022 for each region in Korea. Here, the meteorological data had an imbalance problem, so the imbalance problem was solved by oversampling. As a result, we found that the variables affecting abnormal temperature are different in different regions. In particular, the central and southern regions are influenced by high pressure (Mainland China, Siberian high pressure, and North Pacific high pressure) due to their regional characteristics, so pressure-related variables had a significant impact on the classification of abnormal temperature. This suggests that a regional approach can be taken to predict abnormal temperatures from the surrounding meteorological environment. In addition, in the event of an abnormal temperature, it seems that it is possible to take preventive measures in advance according to regional characteristics.

랜덤포레스트의 크기 결정을 위한 간편 진단통계량 (A simple diagnostic statistic for determining the size of random forest)

  • 박철용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권4호
    • /
    • pp.855-863
    • /
    • 2016
  • 이 연구에서는 RF (random forest)의 크기 결정을 위한 간편 진단통계량을 제안한다. 이 방법은 현재까지 생성된 의사결정나무의 1등과 2등인 집단이 무한히 생성된 의사결정나무에서 차지하는 승리표차인 MV (margin of victory)에 근거한다. 따라서 MV가 음수이면 현재의 RF와 무한 RF 사이에 괴리가 생기는 것을 의미한다. 이 연구에서 제안하는 방법은 -MV가 고정된 작은 양수 (예를 들면 0.03)보다 큰 개체의 비율에 근거한다. 이 방법에 의한 적절한 통계량 도출과 함께 이 통계량의 이론적인 분포를 유도한다. 또한 최근에 제안된 진단통계량과 성능을 비교하는 모의실험을 수행한다.