• Title/Summary/Keyword: random forest (RF)

Search Result 185, Processing Time 0.022 seconds

Estimation of Chlorophyll-a Concentration in Nakdong River Using Machine Learning-Based Satellite Data and Water Quality, Hydrological, and Meteorological Factors (머신러닝 기반 위성영상과 수질·수문·기상 인자를 활용한 낙동강의 Chlorophyll-a 농도 추정)

  • Soryeon Park;Sanghun Son;Jaegu Bae;Doi Lee;Dongju Seo;Jinsoo Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.655-667
    • /
    • 2023
  • Algal bloom outbreaks are frequently reported around the world, and serious water pollution problems arise every year in Korea. It is necessary to protect the aquatic ecosystem through continuous management and rapid response. Many studies using satellite images are being conducted to estimate the concentration of chlorophyll-a (Chl-a), an indicator of algal bloom occurrence. However, machine learning models have recently been used because it is difficult to accurately calculate Chl-a due to the spectral characteristics and atmospheric correction errors that change depending on the water system. It is necessary to consider the factors affecting algal bloom as well as the satellite spectral index. Therefore, this study constructed a dataset by considering water quality, hydrological and meteorological factors, and sentinel-2 images in combination. Representative ensemble models random forest and extreme gradient boosting (XGBoost) were used to predict the concentration of Chl-a in eight weirs located on the Nakdong river over the past five years. R-squared score (R2), root mean square errors (RMSE), and mean absolute errors (MAE) were used as model evaluation indicators, and it was confirmed that R2 of XGBoost was 0.80, RMSE was 6.612, and MAE was 4.457. Shapley additive expansion analysis showed that water quality factors, suspended solids, biochemical oxygen demand, dissolved oxygen, and the band ratio using red edge bands were of high importance in both models. Various input data were confirmed to help improve model performance, and it seems that it can be applied to domestic and international algal bloom detection.

Projecting the Potential Distribution of Abies koreana in Korea Under the Climate Change Based on RCP Scenarios (RCP 기후변화 시나리오에 따른 우리나라 구상나무 잠재 분포 변화 예측)

  • Koo, Kyung Ah;Kim, Jaeuk;Kong, Woo-seok;Jung, Huicheul;Kim, Geunhan
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.19 no.6
    • /
    • pp.19-30
    • /
    • 2016
  • The projection of climate-related range shift is critical information for conservation planning of Korean fir (Abies koreana E. H. Wilson). We first modeled the distribution of Korean fir under current climate condition using five single-model species distribution models (SDMs) and the pre-evaluation weighted ensemble method and then predicted the distributions under future climate conditions projected with HadGEM2-AO under four $CO_2$ emission scenarios, the Representative Concentration Pathways (RCP) 2.6, 4.5, 6.0 and 8.5. We also investigated the predictive uncertainty stemming from five individual algorithms and four $CO_2$ emission scenarios for better interpretation of SDM projections. Five individual algorithms were Generalized linear model (GLM), Generalized additive model (GAM), Multivariate adaptive regression splines (MARS), Generalized boosted model (GBM) and Random forest (RF). The results showed high variations of model performances among individual SDMs and the wide range of diverging predictions of future distributions of Korean fir in response to RCPs. The ensemble model presented the highest predictive accuracy (TSS = 0.97, AUC = 0.99) and predicted that the climate habitat suitability of Korean fir would increase under climate changes. Accordingly, the fir distribution could expand under future climate conditions. Increasing precipitation may account for increases in the distribution of Korean fir. Increasing precipitation compensates the negative effects of increasing temperature. However, the future distribution of Korean fir is also affected by other ecological processes, such as interactions with co-existing species, adaptation and dispersal limitation, and other environmental factors, such as extreme weather events and land-use changes. Therefore, we need further ecological research and to develop mechanistic and process-based distribution models for improving the predictive accuracy.

An Application of Support Vector Machines to Customer Loyalty Classification of Korean Retailing Company Using R Language

  • Nguyen, Phu-Thien;Lee, Young-Chan
    • The Journal of Information Systems
    • /
    • v.26 no.4
    • /
    • pp.17-37
    • /
    • 2017
  • Purpose Customer Loyalty is the most important factor of customer relationship management (CRM). Especially in retailing industry, where customers have many options of where to spend their money. Classifying loyal customers through customers' data can help retailing companies build more efficient marketing strategies and gain competitive advantages. This study aims to construct classification models of distinguishing the loyal customers within a Korean retailing company using data mining techniques with R language. Design/methodology/approach In order to classify retailing customers, we used combination of support vector machines (SVMs) and other classification algorithms of machine learning (ML) with the support of recursive feature elimination (RFE). In particular, we first clean the dataset to remove outlier and impute the missing value. Then we used a RFE framework for electing most significant predictors. Finally, we construct models with classification algorithms, tune the best parameters and compare the performances among them. Findings The results reveal that ML classification techniques can work well with CRM data in Korean retailing industry. Moreover, customer loyalty is impacted by not only unique factor such as net promoter score but also other purchase habits such as expensive goods preferring or multi-branch visiting and so on. We also prove that with retailing customer's dataset the model constructed by SVMs algorithm has given better performance than others. We expect that the models in this study can be used by other retailing companies to classify their customers, then they can focus on giving services to these potential vip group. We also hope that the results of this ML algorithm using R language could be useful to other researchers for selecting appropriate ML algorithms.

Prediction of Daily PM10 Concentration for Air Korea Stations Using Artificial Intelligence with LDAPS Weather Data, MODIS AOD, and Chinese Air Quality Data

  • Jeong, Yemin;Youn, Youjeong;Cho, Subin;Kim, Seoyeon;Huh, Morang;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.4
    • /
    • pp.573-586
    • /
    • 2020
  • PM (particulate matter) is of interest to everyone because it can have adverse effects on human health by the infiltration from respiratory to internal organs. To date, many studies have made efforts for the prediction of PM10 and PM2.5 concentrations. Unlike previous studies, we conducted the prediction of tomorrow's PM10 concentration for the Air Korea stations using Chinese PM10 data in addition to the satellite AOD and weather variables. We constructed 230,639 matchups from the raw data over 3 million and built an RF (random forest) model from the matchups to cope with the complexity and nonlinearity. The validation statistics from the blind test showed excellent accuracy with the RMSE (root mean square error) of 9.905 ㎍/㎥ and the CC (correlation coefficient) of 0.918. Moreover, our prediction model showed a stable performance without the dependency on seasons or the degree of PM10 concentration. However, part of coastal areas had a relatively low accuracy, which implies that a dedicated model for coastal areas will be necessary. Additional input variables such as wind direction, precipitation, and air stability should also be incorporated into the prediction model as future work.

Discriminant analysis of grain flours for rice paper using fluorescence hyperspectral imaging system and chemometric methods

  • Seo, Youngwook;Lee, Ahyeong;Kim, Bal-Geum;Lim, Jongguk
    • Korean Journal of Agricultural Science
    • /
    • v.47 no.3
    • /
    • pp.633-644
    • /
    • 2020
  • Rice paper is an element of Vietnamese cuisine that can be used to wrap vegetables and meat. Rice and starch are the main ingredients of rice paper and their mixing ratio is important for quality control. In a commercial factory, assessment of food safety and quantitative supply is a challenging issue. A rapid and non-destructive monitoring system is therefore necessary in commercial production systems to ensure the food safety of rice and starch flour for the rice paper wrap. In this study, fluorescence hyperspectral imaging technology was applied to classify grain flours. Using the 3D hyper cube of fluorescence hyperspectral imaging (fHSI, 420 - 730 nm), spectral and spatial data and chemometric methods were applied to detect and classify flours. Eight flours (rice: 4, starch: 4) were prepared and hyperspectral images were acquired in a 5 (L) × 5 (W) × 1.5 (H) cm container. Linear discriminant analysis (LDA), partial least square discriminant analysis (PLSDA), support vector machine (SVM), classification and regression tree (CART), and random forest (RF) with a few preprocessing methods (multivariate scatter correction [MSC], 1st and 2nd derivative and moving average) were applied to classify grain flours and the accuracy was compared using a confusion matrix (accuracy and kappa coefficient). LDA with moving average showed the highest accuracy at A = 0.9362 (K = 0.9270). 1D convolutional neural network (CNN) demonstrated a classification result of A = 0.94 and showed improved classification results between mimyeon flour (MF)1 and MF2 of 0.72 and 0.87, respectively. In this study, the potential of non-destructive detection and classification of grain flours using fHSI technology and machine learning methods was demonstrated.

Potential Impact of Climate Change on Distribution of Hedera rhombea in the Korean Peninsula (기후변화에 따른 송악의 잠재서식지 분포 변화 예측)

  • Park, Seon Uk;Koo, Kyung Ah;Seo, Changwan;Kong, Woo-Seok
    • Journal of Climate Change Research
    • /
    • v.7 no.3
    • /
    • pp.325-334
    • /
    • 2016
  • We projected the distribution of Hedera rhombea, an evergreen broad-leaved climbing plant, under current climate conditions and predicted its future distributions under global warming. Inaddition, weexplained model uncertainty by employing 9 single Species Distribution model (SDM)s to model the distribution of Hedera rhombea. 9 single SDMs were constructed with 736 presence/absence data and 3 temperature and 3 precipitation data. Uncertainty of each SDM was assessed with TSS (Ture Skill Statistics) and AUC (the Area under the curve) value of ROC (receiver operating characteristic) analyses. To reduce model uncertainty, we combined 9 single SDMs weighted by TSS and resulted in an ensemble forecast, a TSS weighted ensemble. We predicted future distributions of Hedera rhombea under future climate conditions for the period of 2050 (2040~2060), which were estimated with HadGEM2-AO. RF (Random Forest), GBM (Generalized Boosted Model) and TSS weighted ensemble model showed higher prediction accuracies (AUC > 0.95, TSS > 0.80) than other SDMs. Based on the projections of TSS weighted ensemble, potential habitats under current climate conditions showed a discrepancy with actual habitats, especially in the northern distribution limit. The observed northern boundary of Hedera rhombea is Ulsan in the eastern Korean Peninsula, but the projected limit was eastern coast of Gangwon province. Geomorphological conditions and the dispersal limitations mediated by birds, the lack of bird habitats at eastern coast of Gangwon Province, account for such discrepancy. In general, potential habitats of Hedera rhombea expanded under future climate conditions, but the extent of expansions depend on RCP scenarios. Potential Habitat of Hedera rhombea expanded into Jeolla-inland area under RCP 4.5, and into Chungnam and Wonsan under RCP 8.5. Our results would be fundamental information for understanding the potential effects of climate change on the distribution of Hedera rhombea.

Landslide susceptibility assessment using feature selection-based machine learning models

  • Liu, Lei-Lei;Yang, Can;Wang, Xiao-Mi
    • Geomechanics and Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-16
    • /
    • 2021
  • Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.

Predicting the Potential Distribution of an Invasive Species, Solenopsis invicta Buren (Hymenoptera: Formicidae), under Climate Change using Species Distribution Models

  • SUNG, Sunyong;KWON, Yong-Su;LEE, Dong Kun;CHO, Youngho
    • Entomological Research
    • /
    • v.48 no.6
    • /
    • pp.505-513
    • /
    • 2018
  • The red imported fire ant is considered one of the most notorious invasive species because of its adverse impact on both humans and ecosystems. Public concern regarding red imported fire ants has been increasing, as they have been found seven times in South Korea. Even if red imported fire ants are not yet colonized in South Korea, a proper quarantine plan is necessary to prevent their widespread distribution. As a basis for quarantine planning, we modeled the potential distribution of the red imported fire ant under current climate conditions using six different species distribution models (SDMs) and then selected the random forest (RF) model for modeling the potential distribution under climate change. We acquired occurrence data from the Global Biodiversity Information Facility (GBIF) and bioclimatic data from WorldClim. We modeled at the global scale to project the potential distribution under the current climate and then applied models at the local scale to project the potential distribution of the red imported fire ant under climate change. Modeled results successfully represent the current distribution of red imported fire ants. The potential distribution area for red imported fire ants increased to include major harbors and airports in South Korea under the climate change scenario (RCP 8.5). Thus, we are able to provide a potential distribution of red imported fire ant that is necessary to establish a proper quarantine plan for their management to minimize adverse impacts of climate change.

Comparative characteristic of ensemble machine learning and deep learning models for turbidity prediction in a river (딥러닝과 앙상블 머신러닝 모형의 하천 탁도 예측 특성 비교 연구)

  • Park, Jungsu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.35 no.1
    • /
    • pp.83-91
    • /
    • 2021
  • The increased turbidity in rivers during flood events has various effects on water environmental management, including drinking water supply systems. Thus, prediction of turbid water is essential for water environmental management. Recently, various advanced machine learning algorithms have been increasingly used in water environmental management. Ensemble machine learning algorithms such as random forest (RF) and gradient boosting decision tree (GBDT) are some of the most popular machine learning algorithms used for water environmental management, along with deep learning algorithms such as recurrent neural networks. In this study GBDT, an ensemble machine learning algorithm, and gated recurrent unit (GRU), a recurrent neural networks algorithm, are used for model development to predict turbidity in a river. The observation frequencies of input data used for the model were 2, 4, 8, 24, 48, 120 and 168 h. The root-mean-square error-observations standard deviation ratio (RSR) of GRU and GBDT ranges between 0.182~0.766 and 0.400~0.683, respectively. Both models show similar prediction accuracy with RSR of 0.682 for GRU and 0.683 for GBDT. The GRU shows better prediction accuracy when the observation frequency is relatively short (i.e., 2, 4, and 8 h) where GBDT shows better prediction accuracy when the observation frequency is relatively long (i.e. 48, 120, 160 h). The results suggest that the characteristics of input data should be considered to develop an appropriate model to predict turbidity.

Real-time prediction on the slurry concentration of cutter suction dredgers using an ensemble learning algorithm

  • Han, Shuai;Li, Mingchao;Li, Heng;Tian, Huijing;Qin, Liang;Li, Jinfeng
    • International conference on construction engineering and project management
    • /
    • 2020.12a
    • /
    • pp.463-481
    • /
    • 2020
  • Cutter suction dredgers (CSDs) are widely used in various dredging constructions such as channel excavation, wharf construction, and reef construction. During a CSD construction, the main operation is to control the swing speed of cutter to keep the slurry concentration in a proper range. However, the slurry concentration cannot be monitored in real-time, i.e., there is a "time-lag effect" in the log of slurry concentration, making it difficult for operators to make the optimal decision on controlling. Concerning this issue, a solution scheme that using real-time monitored indicators to predict current slurry concentration is proposed in this research. The characteristics of the CSD monitoring data are first studied, and a set of preprocessing methods are presented. Then we put forward the concept of "index class" to select the important indices. Finally, an ensemble learning algorithm is set up to fit the relationship between the slurry concentration and the indices of the index classes. In the experiment, log data over seven days of a practical dredging construction is collected. For comparison, the Deep Neural Network (DNN), Long Short Time Memory (LSTM), Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and the Bayesian Ridge algorithm are tried. The results show that our method has the best performance with an R2 of 0.886 and a mean square error (MSE) of 5.538. This research provides an effective way for real-time predicting the slurry concentration of CSDs and can help to improve the stationarity and production efficiency of dredging construction.

  • PDF