• Title/Summary/Keyword: random forest (RF)

Search Result 185, Processing Time 0.024 seconds

Estimation of Water Quality Index for Coastal Areas in Korea Using GOCI Satellite Data Based on Machine Learning Approaches (GOCI 위성영상과 기계학습을 이용한 한반도 연안 수질평가지수 추정)

  • Jang, Eunna;Im, Jungho;Ha, Sunghyun;Lee, Sanggyun;Park, Young-Gyu
    • Korean Journal of Remote Sensing
    • /
    • v.32 no.3
    • /
    • pp.221-234
    • /
    • 2016
  • In Korea, most industrial parks and major cities are located in coastal areas, which results in serious environmental problems in both coastal land and ocean. In order to effectively manage such problems especially in coastal ocean, water quality should be monitored. As there are many factors that influence water quality, the Korean Government proposed an integrated Water Quality Index (WQI) based on in situmeasurements of ocean parameters(bottom dissolved oxygen, chlorophyll-a concentration, secchi disk depth, dissolved inorganic nitrogen, and dissolved inorganic phosphorus) by ocean division identified based on their ecological characteristics. Field-measured WQI, however, does not provide spatial continuity over vast areas. Satellite remote sensing can be an alternative for identifying WQI for surface water. In this study, two schemes were examined to estimate coastal WQI around Korea peninsula using in situ measurements data and Geostationary Ocean Color Imager (GOCI) satellite imagery from 2011 to 2013 based on machine learning approaches. Scheme 1 calculates WQI using estimated water quality-related factors using GOCI reflectance data, and scheme 2 estimates WQI using GOCI band reflectance data and basic products(chlorophyll-a, suspended sediment, colored dissolved organic matter). Three machine learning approaches including Random Forest (RF), Support Vector Regression (SVR), and a modified regression tree(Cubist) were used. Results show that estimation of secchi disk depth produced the highest accuracy among the ocean parameters, and RF performed best regardless of water quality-related factors. However, the accuracy of WQI from scheme 1 was lower than that from scheme 2 due to the estimation errors inherent from water quality-related factors and the uncertainty of bottom dissolved oxygen. In overall, scheme 2 appears more appropriate for estimating WQI for surface water in coastal areas and chlorophyll-a concentration was identified the most contributing factor to the estimation of WQI.

Estimation of Fractional Urban Tree Canopy Cover through Machine Learning Using Optical Satellite Images (기계학습을 이용한 광학 위성 영상 기반의 도시 내 수목 피복률 추정)

  • Sejeong Bae ;Bokyung Son ;Taejun Sung ;Yeonsu Lee ;Jungho Im ;Yoojin Kang
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.1009-1029
    • /
    • 2023
  • Urban trees play a vital role in urban ecosystems,significantly reducing impervious surfaces and impacting carbon cycling within the city. Although previous research has demonstrated the efficacy of employing artificial intelligence in conjunction with airborne light detection and ranging (LiDAR) data to generate urban tree information, the availability and cost constraints associated with LiDAR data pose limitations. Consequently, this study employed freely accessible, high-resolution multispectral satellite imagery (i.e., Sentinel-2 data) to estimate fractional tree canopy cover (FTC) within the urban confines of Suwon, South Korea, employing machine learning techniques. This study leveraged a median composite image derived from a time series of Sentinel-2 images. In order to account for the diverse land cover found in urban areas, the model incorporated three types of input variables: average (mean) and standard deviation (std) values within a 30-meter grid from 10 m resolution of optical indices from Sentinel-2, and fractional coverage for distinct land cover classes within 30 m grids from the existing level 3 land cover map. Four schemes with different combinations of input variables were compared. Notably, when all three factors (i.e., mean, std, and fractional cover) were used to consider the variation of landcover in urban areas(Scheme 4, S4), the machine learning model exhibited improved performance compared to using only the mean of optical indices (Scheme 1). Of the various models proposed, the random forest (RF) model with S4 demonstrated the most remarkable performance, achieving R2 of 0.8196, and mean absolute error (MAE) of 0.0749, and a root mean squared error (RMSE) of 0.1022. The std variable exhibited the highest impact on model outputs within the heterogeneous land covers based on the variable importance analysis. This trained RF model with S4 was then applied to the entire Suwon region, consistently delivering robust results with an R2 of 0.8702, MAE of 0.0873, and RMSE of 0.1335. The FTC estimation method developed in this study is expected to offer advantages for application in various regions, providing fundamental data for a better understanding of carbon dynamics in urban ecosystems in the future.

Prediction of Non-Genotoxic Carcinogenicity Based on Genetic Profiles of Short Term Exposure Assays

  • Perez, Luis Orlando;Gonzalez-Jose, Rolando;Garcia, Pilar Peral
    • Toxicological Research
    • /
    • v.32 no.4
    • /
    • pp.289-300
    • /
    • 2016
  • Non-genotoxic carcinogens are substances that induce tumorigenesis by non-mutagenic mechanisms and long term rodent bioassays are required to identify them. Recent studies have shown that transcription profiling can be applied to develop early identifiers for long term phenotypes. In this study, we used rat liver expression profiles from the NTP (National Toxicology Program, Research Triangle Park, USA) DrugMatrix Database to construct a gene classifier that can distinguish between non-genotoxic carcinogens and other chemicals. The model was based on short term exposure assays (3 days) and the training was limited to oxidative stressors, peroxisome proliferators and hormone modulators. Validation of the predictor was performed on independent toxicogenomic data (TG-GATEs, Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System, Osaka, Japan). To build our model we performed Random Forests together with a recursive elimination algorithm (VarSelRF). Gene set enrichment analysis was employed for functional interpretation. A total of 770 microarrays comprising 96 different compounds were analyzed and a predictor of 54 genes was built. Prediction accuracy was 0.85 in the training set, 0.87 in the test set and increased with increasing concentration in the validation set: 0.6 at low dose, 0.7 at medium doses and 0.81 at high doses. Pathway analysis revealed gene prominence of cellular respiration, energy production and lipoprotein metabolism. The biggest target of toxicogenomics is accurately predict the toxicity of unknown drugs. In this analysis, we presented a classifier that can predict non-genotoxic carcinogenicity by using short term exposure assays. In this approach, dose level is critical when evaluating chemicals at early time points.

A Smart Farm Environment Optimization and Yield Prediction Platform based on IoT and Deep Learning (IoT 및 딥 러닝 기반 스마트 팜 환경 최적화 및 수확량 예측 플랫폼)

  • Choi, Hokil;Ahn, Heuihak;Jeong, Yina;Lee, Byungkwan
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.6
    • /
    • pp.672-680
    • /
    • 2019
  • This paper proposes "A Smart Farm Environment Optimization and Yield Prediction Platform based on IoT and Deep Learning" which gathers bio-sensor data from farms, diagnoses the diseases of growing crops, and predicts the year's harvest. The platform collects all the information currently available such as weather and soil microbes, optimizes the farm environment so that the crops can grow well, diagnoses the crop's diseases by using the leaves of the crops being grown on the farm, and predicts this year's harvest by using all the information on the farm. The result shows that the average accuracy of the AEOM is about 15% higher than that of the RF and about 8% higher than the GBD. Although data increases, the accuracy is reduced less than that of the RF or GBD. The linear regression shows that the slope of accuracy is -3.641E-4 for the ReLU, -4.0710E-4 for the Sigmoid, and -7.4534E-4 for the step function. Therefore, as the amount of test data increases, the ReLU is more accurate than the other two activation functions. This paper is a platform for managing the entire farm and, if introduced to actual farms, will greatly contribute to the development of smart farms in Korea.

Modeling of Vegetation Phenology Using MODIS and ASOS Data (MODIS와 ASOS 자료를 이용한 식물계절 모델링)

  • Kim, Geunah;Youn, Youjeong;Kang, Jonggu;Choi, Soyeon;Park, Ganghyun;Chun, Junghwa;Jang, Keunchang;Won, Myoungsoo;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_1
    • /
    • pp.627-646
    • /
    • 2022
  • Recently, the seriousness of climate change-related problems caused by global warming is growing, and the average temperature is also rising. As a result, it is affecting the environment in which various temperature-sensitive creatures and creatures live, and changes in the ecosystem are also being detected. Seasons are one of the important factors influencing the types, distribution, and growth characteristics of creatures living in the area. Among the most popular and easily recognized plant seasonal phenomena among the indicators of the climate change impact evaluation, the blooming day of flower and the peak day of autumn leaves were modeled. The types of plants used in the modeling were forsythia and cherry trees, which can be seen as representative plants of spring, and maple and ginkgo, which can be seen as representative plants of autumn. Weather data used to perform modeling were temperature, precipitation, and solar radiation observed through the ASOS Observatory of the Korea Meteorological Administration. As satellite data, MODIS NDVI was used for modeling, and it has a correlation coefficient of about -0.2 for the flowering date and 0.3 for the autumn leaves peak date. As the model used, the model was established using multiple regression models, which are linear models, and Random Forest, which are nonlinear models. In addition, the predicted values estimated by each model were expressed as isopleth maps using spatial interpolation techniques to express the trend of plant seasonal changes from 2003 to 2020. It is believed that using NDVI with high spatio-temporal resolution in the future will increase the accuracy of plant phenology modeling.

Factors analysis of the cyanobacterial dominance in the four weirs installed in of Nakdong River (낙동강의 중·하류 4개보에서 남조류 우점 환경 요인 분석)

  • Kim, Sung jin;Chung, Se woong;Park, Hyung seok;Cho, Young cheol;Lee, Hee suk
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.413-413
    • /
    • 2019
  • 하천과 호수에서 남조류의 이상 과잉증식 문제(이하 녹조문제)는 담수생태계의 생물다양성을 감소시키며, 음용수의 이취미 원인물질을 발생시켜 물 이용에 장해가 된다. 또한 독소를 생산하는 유해남조류가 대량 증식할 경우에는 가축이나 인간의 건강에 치명적 해를 끼치기도 한다. 그 동안 국내에서 녹조문제는 댐 저수지와 하구호와 같은 정체수역에서 간헐적으로 문제를 일으켰으나, 4대강사업(2010-2011)으로 16개의 보가 설치된 이후 낙동강, 금강, 영산강 등 대하천에서도 광범위하게 발생되고 있어 중요한 사회적 환경적 이슈로 대두되었다. 한편, 대하천에 설치된 보 구간에서 빈번히 발생하는 녹조현상의 원인에 대해서는 전 지구적 기온상승에 따른 기후변화의 영향이라는 주장과 유역으로부터 영양염류의 과도한 유입, 가뭄에 따른 유량감소, 보 설치에 따른 체류시간 증가 등 다양한 의견이 제시되고 있으나, 대상 유역과 수체의 특성에 따라 녹조 발생의 원인이 상이하거나 또는 다양한 요인이 복합적으로 작용하기 때문에 보편적 해석(universal interpretation)이 어려운 것이 현실이다. 따라서 각 수계별, 보별 녹조현상에 대한 정확한 원인분석과 효과적인 대책 마련을 위해서는 집중된 실험자료와 데이터마이닝 기법에 근거로 한 보다 과학적이고 객관적인 접근이 이루어져야 한다. 본 연구에서는 2012년 보 설치 이후 남조류에 의한 녹조현상이 빈번히 발생하고 있는 낙동강 4개보(강정고령보, 달성보, 합천창녕보, 창녕함안보)를 대상으로 집중적인 현장조사와 실험분석을 수행하고, 수집된 기상, 수문, 수질, 조류 자료에 대해 통계분석과 다양한 데이터모델링 기법을 적용하여 보별 남조류 우점 환경조건과 이를 제어하기 위한 주요 조절변수를 규명하는데 있다. 연구대상 보 별 수질과 식물플랑크톤의 정성 및 정량 실험은 2017년 5월부터 2018년 11월까지 2년에 걸쳐 실시하였으며, 남조류 세포수 밀도와 환경요인과의 상관성 분석을 실시하고, 단계적 다중회귀모델(Step-wise Multiple Linear Regressions, SMLR), 랜덤포레스트(Random Forests, RF) 모델과 재귀적 변수 제거 기법(Recursive Feature Elimination using Random Forest, RFE-RF)을 이용한 변수중요도 평가, 의사결정나무(Decision Tree, DT), 주성분분석(Principal Component Analysis, PCA) 기법 등 다양한 모수적 및 비모수적 데이터마이닝 결과를 바탕으로 각 보별 남 조류 우점 환경요인을 종합적으로 해석하였다.

  • PDF

Experimental Comparison of Network Intrusion Detection Models Solving Imbalanced Data Problem (데이터의 불균형성을 제거한 네트워크 침입 탐지 모델 비교 분석)

  • Lee, Jong-Hwa;Bang, Jiwon;Kim, Jong-Wouk;Choi, Mi-Jung
    • KNOM Review
    • /
    • v.23 no.2
    • /
    • pp.18-28
    • /
    • 2020
  • With the development of the virtual community, the benefits that IT technology provides to people in fields such as healthcare, industry, communication, and culture are increasing, and the quality of life is also improving. Accordingly, there are various malicious attacks targeting the developed network environment. Firewalls and intrusion detection systems exist to detect these attacks in advance, but there is a limit to detecting malicious attacks that are evolving day by day. In order to solve this problem, intrusion detection research using machine learning is being actively conducted, but false positives and false negatives are occurring due to imbalance of the learning dataset. In this paper, a Random Oversampling method is used to solve the unbalance problem of the UNSW-NB15 dataset used for network intrusion detection. And through experiments, we compared and analyzed the accuracy, precision, recall, F1-score, training and prediction time, and hardware resource consumption of the models. Based on this study using the Random Oversampling method, we develop a more efficient network intrusion detection model study using other methods and high-performance models that can solve the unbalanced data problem.

Applicability of Image Classification Using Deep Learning in Small Area : Case of Agricultural Lands Using UAV Image (딥러닝을 이용한 소규모 지역의 영상분류 적용성 분석 : UAV 영상을 이용한 농경지를 대상으로)

  • Choi, Seok-Keun;Lee, Soung-Ki;Kang, Yeon-Bin;Seong, Seon-Kyeong;Choi, Do-Yeon;Kim, Gwang-Ho
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.38 no.1
    • /
    • pp.23-33
    • /
    • 2020
  • Recently, high-resolution images can be easily acquired using UAV (Unmanned Aerial Vehicle), so that it is possible to produce small area observation and spatial information at low cost. In particular, research on the generation of cover maps in crop production areas is being actively conducted for monitoring the agricultural environment. As a result of comparing classification performance by applying RF(Random Forest), SVM(Support Vector Machine) and CNN(Convolutional Neural Network), deep learning classification method has many advantages in image classification. In particular, land cover classification using satellite images has the advantage of accuracy and time of classification using satellite image data set and pre-trained parameters. However, UAV images have different characteristics such as satellite images and spatial resolution, which makes it difficult to apply them. In order to solve this problem, we conducted a study on the application of deep learning algorithms that can be used for analyzing agricultural lands where UAV data sets and small-scale composite cover exist in Korea. In this study, we applied DeepLab V3 +, FC-DenseNet (Fully Convolutional DenseNets) and FRRN-B (Full-Resolution Residual Networks), the semantic image classification of the state-of-art algorithm, to UAV data set. As a result, DeepLab V3 + and FC-DenseNet have an overall accuracy of 97% and a Kappa coefficient of 0.92, which is higher than the conventional classification. The applicability of the cover classification using UAV images of small areas is shown.

Prediction of cyanobacteria harmful algal blooms in reservoir using machine learning and deep learning (머신러닝과 딥러닝을 이용한 저수지 유해 남조류 발생 예측)

  • Kim, Sang-Hoon;Park, Jun Hyung;Kim, Byunghyun
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1167-1181
    • /
    • 2021
  • In relation to the algae bloom, four types of blue-green algae that emit toxic substances are designated and managed as harmful Cyanobacteria, and prediction information using a physical model is being also published. However, as algae are living organisms, it is difficult to predict according to physical dynamics, and not easy to consider the effects of numerous factors such as weather, hydraulic, hydrology, and water quality. Therefore, a lot of researches on algal bloom prediction using machine learning have been recently conducted. In this study, the characteristic importance of water quality factors affecting the occurrence of Cyanobacteria harmful algal blooms (CyanoHABs) were analyzed using the random forest (RF) model for Bohyeonsan Dam and Yeongcheon Dam located in Yeongcheon-si, Gyeongsangbuk-do and also predicted the occurrence of harmful blue-green algae using the machine learning and deep learning models and evaluated their accuracy. The water temperature and total nitrogen (T-N) were found to be high in common, and the occurrence prediction of CyanoHABs using artificial neural network (ANN) also predicted the actual values closely, confirming that it can be used for the reservoirs that require the prediction of harmful cyanobacteria for algal management in the future.

Development of machine learning prediction model for weight loss rate of chestnut (Castanea crenata) according to knife peeling process (밤의 칼날식 박피공정에 따른 머신 러닝 기반 중량감모율 예측 모델 개발)

  • Tae Hyong Kim;Ah-Na Kim;Ki Hyun Kwon
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.4
    • /
    • pp.236-244
    • /
    • 2024
  • A representative problem in domestic chestnut industry is the high loss of flesh due to excessive knife peeling in order to increase the peeling rate, resulting in a decrease in production efficiency. In this study, a prediction model for weight loss rate of chestnut by stage of knife peeling process was developed as undergarment study to optimize conditions of the machine. 51 control conditions of the two-stage blade peeler used in the experiment were derived and repeated three times to obtain a total of 153 data. Machine learning(ML) models including artificial neural network (ANN) and random forest (RF) were implemented to predict the weight loss rate by chestnut peel stage (after 1st peeling, 2nd peeling, and after final discharge). The performance of the models were evaluated by calculating the values of coefficient of determination (R), normalized root mean square error (nRMSE), and mean absolute error (MAE). After all peeling stages, RF model have better prediction accuracy with higher R values and low prediction error with lower nRMSE and MAE values, compared to ANN model. The final selected RF prediction model showed excellent performance with insignificant error between the experimental and predicted values. As a result, the proposed model can be useful to set optimum condition of knife peeling for the purpose of minimizing the weight loss of domestic chestnut flesh with maximizing peeling rate.