• Title/Summary/Keyword: support vector regression machine

Search Result 386, Processing Time 0.033 seconds

GeoAI-Based Forest Fire Susceptibility Assessment with Integration of Forest and Soil Digital Map Data

  • Kounghoon Nam;Jong-Tae Kim;Chang-Ju Lee;Gyo-Cheol Jeong
    • The Journal of Engineering Geology
    • /
    • v.34 no.1
    • /
    • pp.107-115
    • /
    • 2024
  • This study assesses forest fire susceptibility in Gangwon-do, South Korea, which hosts the largest forested area in the nation and constitutes ~21% of the country's forested land. With 81% of its terrain forested, Gangwon-do is particularly susceptible to wildfires, as evidenced by the fact that seven out of the ten most extensive wildfires in Korea have occurred in this region, with significant ecological and economic implications. Here, we analyze 480 historical wildfire occurrences in Gangwon-do between 2003 and 2019 using 17 predictor variables of wildfire occurrence. We utilized three machine learning algorithms—random forest, logistic regression, and support vector machine—to construct wildfire susceptibility prediction models and identify the best-performing model for Gangwon-do. Forest and soil map data were integrated as important indicators of wildfire susceptibility and enhanced the precision of the three models in identifying areas at high risk of wildfires. Of the three models examined, the random forest model showed the best predictive performance, with an area-under-the-curve value of 0.936. The findings of this study, especially the maps generated by the models, are expected to offer important guidance to local governments in formulating effective management and conservation strategies. These strategies aim to ensure the sustainable preservation of forest resources and to enhance the well-being of communities situated in areas adjacent to forests. Furthermore, the outcomes of this study are anticipated to contribute to the safeguarding of forest resources and biodiversity and to the development of comprehensive plans for forest resource protection, biodiversity conservation, and environmental management.

Estimation of Water Quality Index for Coastal Areas in Korea Using GOCI Satellite Data Based on Machine Learning Approaches (GOCI 위성영상과 기계학습을 이용한 한반도 연안 수질평가지수 추정)

  • Jang, Eunna;Im, Jungho;Ha, Sunghyun;Lee, Sanggyun;Park, Young-Gyu
    • Korean Journal of Remote Sensing
    • /
    • v.32 no.3
    • /
    • pp.221-234
    • /
    • 2016
  • In Korea, most industrial parks and major cities are located in coastal areas, which results in serious environmental problems in both coastal land and ocean. In order to effectively manage such problems especially in coastal ocean, water quality should be monitored. As there are many factors that influence water quality, the Korean Government proposed an integrated Water Quality Index (WQI) based on in situmeasurements of ocean parameters(bottom dissolved oxygen, chlorophyll-a concentration, secchi disk depth, dissolved inorganic nitrogen, and dissolved inorganic phosphorus) by ocean division identified based on their ecological characteristics. Field-measured WQI, however, does not provide spatial continuity over vast areas. Satellite remote sensing can be an alternative for identifying WQI for surface water. In this study, two schemes were examined to estimate coastal WQI around Korea peninsula using in situ measurements data and Geostationary Ocean Color Imager (GOCI) satellite imagery from 2011 to 2013 based on machine learning approaches. Scheme 1 calculates WQI using estimated water quality-related factors using GOCI reflectance data, and scheme 2 estimates WQI using GOCI band reflectance data and basic products(chlorophyll-a, suspended sediment, colored dissolved organic matter). Three machine learning approaches including Random Forest (RF), Support Vector Regression (SVR), and a modified regression tree(Cubist) were used. Results show that estimation of secchi disk depth produced the highest accuracy among the ocean parameters, and RF performed best regardless of water quality-related factors. However, the accuracy of WQI from scheme 1 was lower than that from scheme 2 due to the estimation errors inherent from water quality-related factors and the uncertainty of bottom dissolved oxygen. In overall, scheme 2 appears more appropriate for estimating WQI for surface water in coastal areas and chlorophyll-a concentration was identified the most contributing factor to the estimation of WQI.

Prediction of Amyloid β-Positivity with both MRI Parameters and Cognitive Function Using Machine Learning (뇌 MRI와 인지기능평가를 이용한 아밀로이드 베타 양성 예측 연구)

  • Hye Jin Park;Ji Young Lee;Jin-Ju Yang;Hee-Jin Kim;Young Seo Kim;Ji Young Kim;Yun Young Choi
    • Journal of the Korean Society of Radiology
    • /
    • v.84 no.3
    • /
    • pp.638-652
    • /
    • 2023
  • Purpose To investigate the MRI markers for the prediction of amyloid β (Aβ)-positivity in mild cognitive impairment (MCI) and Alzheimer's disease (AD), and to evaluate the differences in MRI markers between Aβ-positive (Aβ [+]) and -negative groups using the machine learning (ML) method. Materials and Methods This study included 139 patients with MCI and AD who underwent amyloid PET-CT and brain MRI. Patients were divided into Aβ (+) (n = 84) and Aβ-negative (n = 55) groups. Visual analysis was performed with the Fazekas scale of white matter hyperintensity (WMH) and cerebral microbleeds (CMB) scores. The WMH volume and regional brain volume were quantitatively measured. The multivariable logistic regression and ML using support vector machine, and logistic regression were used to identify the best MRI predictors of Aβ-positivity. Results The Fazekas scale of WMH (p = 0.02) and CMB scores (p = 0.04) were higher in Aβ (+). The volumes of hippocampus, entorhinal cortex, and precuneus were smaller in Aβ (+) (p < 0.05). The third ventricle volume was larger in Aβ (+) (p = 0.002). The logistic regression of ML showed a good accuracy (81.1%) with mini-mental state examination (MMSE) and regional brain volumes. Conclusion The application of ML using the MMSE, third ventricle, and hippocampal volume is helpful in predicting Aβ-positivity with a good accuracy.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Practical evaluation of encrypted traffic classification based on a combined method of entropy estimation and neural networks

  • Zhou, Kun;Wang, Wenyong;Wu, Chenhuang;Hu, Teng
    • ETRI Journal
    • /
    • v.42 no.3
    • /
    • pp.311-323
    • /
    • 2020
  • Encrypted traffic classification plays a vital role in cybersecurity as network traffic encryption becomes prevalent. First, we briefly introduce three traffic encryption mechanisms: IPsec, SSL/TLS, and SRTP. After evaluating the performances of support vector machine, random forest, naïve Bayes, and logistic regression for traffic classification, we propose the combined approach of entropy estimation and artificial neural networks. First, network traffic is classified as encrypted or plaintext with entropy estimation. Encrypted traffic is then further classified using neural networks. We propose using traffic packet's sizes, packet's inter-arrival time, and direction as the neural network's input. Our combined approach was evaluated with the dataset obtained from the Canadian Institute for Cybersecurity. Results show an improved precision (from 1 to 7 percentage points), and some application classification metrics improved nearly by 30 percentage points.

Application of Machine Learning to Predict Web-warping in Flexible Roll Forming Process (머신러닝을 활용한 가변 롤포밍 공정 web-warping 예측모델 개발)

  • Woo, Y.Y.;Moon, Y.H.
    • Transactions of Materials Processing
    • /
    • v.29 no.5
    • /
    • pp.282-289
    • /
    • 2020
  • Flexible roll forming is an advanced sheet-metal-forming process that allows the production of parts with various cross-sections. During the flexible process, material is subjected to three-dimensional deformation such as transverse bending, inhomogeneous elongations, or contraction. Because of the effects of process variables on the quality of the roll-formed products, the approaches used to investigate the roll-forming process have been largely dependent on experience and trial- and-error methods. Web-warping is one of the major shape defects encountered in flexible roll forming. In this study, an SVR model was developed to predict the web-warping during the flexible roll forming process. In the development of the SVR model, three process parameters, namely the forming-roll speed condition, leveling-roll height, and bend angle were considered as the model inputs, and the web-warping height was used as the response variable for three blank shapes; rectangular, concave, and convex shape. MATLAB software was used to train the SVR model and optimize three hyperparameters (λ, ε, and γ). To evaluate the SVR model performance, the statistical analysis was carried out based on the three indicators: the root-mean-square error, mean absolute error, and relative root-mean-square error.

Predictive Models for Sasang Constitution Types Using Genetic Factors (유전지표를 활용한 사상체질 분류모델)

  • Ban, Hyo-Jeong;Lee, Siwoo;Jin, Hee-Jeong
    • Journal of Sasang Constitutional Medicine
    • /
    • v.32 no.2
    • /
    • pp.10-21
    • /
    • 2020
  • Objectives Genome-wide association studies(GWAS) is a useful method to identify genetic associations for various phenotypes. The purpose of this study was to develop predictive models for Sasang constitution types using genetic factors. Methods The genotypes of the 1,999 subjects was performed using Axiom Precision Medicine Research Array (PMRA) by Life Technologies. All participants were prescribed Sasang Constitution-specific herbal remedies for the treatment, and showed improvement of original symptoms as confirmed by Korean medicine doctor. The genotypes were imputed by using the IMPUTE program. Association analysis was conducted using a logistic regression model to discover Single Nucleotide Polymorphism (SNP), adjusting for age, sex, and BMI. Results & Conclusions We developed models to predict Korean medicine constitution types using identified genectic factors and sex, age, BMI using Random Forest (RF), Support Vector Machine (SVM), and Neural Network (NN). Each maximum Area Under the Curve (AUC) of Teaeum, Soeum, Soyang is 0.894, 0.868, 0.767, respectively. Each AUC of the models increased by 6~17% more than that of models except for genetic factors. By developing the predictive models, we confirmed usefulness of genetic factors related with types. It demonstrates a mechanism for more accurate prediction through genetic factors related with type.

Default Prediction for Real Estate Companies with Imbalanced Dataset

  • Dong, Yuan-Xiang;Xiao, Zhi;Xiao, Xue
    • Journal of Information Processing Systems
    • /
    • v.10 no.2
    • /
    • pp.314-333
    • /
    • 2014
  • When analyzing default predictions in real estate companies, the number of non-defaulted cases always greatly exceeds the defaulted ones, which creates the two-class imbalance problem. This lowers the ability of prediction models to distinguish the default sample. In order to avoid this sample selection bias and to improve the prediction model, this paper applies a minority sample generation approach to create new minority samples. The logistic regression, support vector machine (SVM) classification, and neural network (NN) classification use an imbalanced dataset. They were used as benchmarks with a single prediction model that used a balanced dataset corrected by the minority samples generation approach. Instead of using prediction-oriented tests and the overall accuracy, the true positive rate (TPR), the true negative rate (TNR), G-mean, and F-score are used to measure the performance of default prediction models for imbalanced dataset. In this paper, we describe an empirical experiment that used a sampling of 14 default and 315 non-default listed real estate companies in China and report that most results using single prediction models with a balanced dataset generated better results than an imbalanced dataset.

Spatial Prediction of Soil Carbon Using Terrain Analysis in a Steep Mountainous Area and the Associated Uncertainties (지형분석을 이용한 산지토양 탄소의 분포 예측과 불확실성)

  • Jeong, Gwanyong
    • Journal of The Geomorphological Association of Korea
    • /
    • v.23 no.3
    • /
    • pp.67-78
    • /
    • 2016
  • Soil carbon(C) is an essential property for characterizing soil quality. Understanding spatial patterns of soil C is particularly limited for mountain areas. This study aims to predict the spatial pattern of soil C using terrain analysis in a steep mountainous area. Specifically, model performances and prediction uncertainties were investigated based on the number of resampling repetitions. Further, important predictors for soil C were also identified. Finally, the spatial distribution of uncertainty was analyzed. A total of 91 soil samples were collected via conditioned latin hypercube sampling and a digital soil C map was developed using support vector regression which is one of the powerful machine learning methods. Results showed that there were no distinct differences of model performances depending on the number of repetitions except for 10-fold cross validation. For soil C, elevation and surface curvature were selected as important predictors by recursive feature elimination. Soil C showed higher values in higher elevation and concave slopes. The spatial pattern of soil C might possibly reflect lateral movement of water and materials along the surface configuration of the study area. The higher values of uncertainty in higher elevation and concave slopes might be related to geomorphological characteristics of the research area and the sampling design. This study is believed to provide a better understanding of the relationship between geomorphology and soil C in the mountainous ecosystem.

Improvement of WRF-Hydro streamflow prediction using Machine Learning Methods (머신러닝기법을 이용한 WRF-Hydro 하천수 흐름 예측 개선)

  • Cho, Kyeungwoo;Kim, Yeonjoo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2019.05a
    • /
    • pp.115-115
    • /
    • 2019
  • 하천수 흐름예측에 대한 연구는 대부분 WRF-Hydro와 같은 과정기반 모델링 시스템을 이용한다. 과정기반 모델링 시스템은 물리적 현상을 일반화한 수식으로 구성되어있다. 일반화된 수식은 불확실성을 내포하고 있으며 지역적 특성도 반영하지 못한다. 특히 수식에 사용되는 입력자료는 측정값으로 오차가 존재한다. 따라서 과정기반 모델링 시스템 예측결과는 계통오차와 우연오차가 존재한다. 현재 매개변수 보정을 통해 예측결과를 개선하는 방법을 사용하고 있으나 한계가 있다. 본 연구는 이러한 한계를 극복하기 위해 상호보완적인 Data-driven 모델을 구축하여 과정기반 모델링 시스템 결과를 개선하고자 하였다. Data-driven 모델 구축을 위해 머신러닝 기법인 instance-based weighting(IBW)과 support vector regression(SVR)을 사용하였다. 구축된 Data-driven 모델은 한반도 지역 주요 저수지 및 호수의 하천수 흐름예측을 통해 검증하였다. 검증을 위해 과정기반 모델링 시스템으로 WRF-Hydro를 구동하였다. 입력자료는 기상청의 국지수치예측모델자료(LDAPS), HydroSHEDS의 수치표고모델자료(DEM), 국가지리정보원의 저수지 및 호수 연속수치지형도를 사용하였다. 본 연구를 통해 구축된 Data-driven모델은 기존 과정기반 모델링 시스템의 오류수정 한계를 머신러닝을 이용하여 개선할 수 있는 가능성을 제시하였다.

  • PDF