• Title/Summary/Keyword: support vector regression.

Search Result 554, Processing Time 0.026 seconds

Classification Analysis for the Prediction of Underground Cultural Assets (매장문화재 예측을 위한 통계적 분류 분석)

  • Yu, Hye-Kyung;Lee, Jin-Young;Na, Jong-Hwa
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.14 no.3
    • /
    • pp.106-113
    • /
    • 2009
  • Various statistical classification methods have been used to establish prediction model of underground cultural assets in our country. Among them, linear discriminant analysis, logistic regression, decision tree, neural network, and support vector machines are used in this paper. We introduced the basic concepts of above-mentioned classification methods and applied these to the analyses of real data of I city. As a results, five different prediction models are suggested. And also model comparisons are executed by suggesting correct classification rates of the fitted models. To see the applicability of the suggested models for a new data set, simulations are carried out. R packages and programs are used in real data analyses and simulations. Especially, the detailed executing processes by R are provided for the other analyser of related area.

A Supervised Feature Selection Method for Malicious Intrusions Detection in IoT Based on Genetic Algorithm

  • Saman Iftikhar;Daniah Al-Madani;Saima Abdullah;Ammar Saeed;Kiran Fatima
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.3
    • /
    • pp.49-56
    • /
    • 2023
  • Machine learning methods diversely applied to the Internet of Things (IoT) field have been successful due to the enhancement of computer processing power. They offer an effective way of detecting malicious intrusions in IoT because of their high-level feature extraction capabilities. In this paper, we proposed a novel feature selection method for malicious intrusion detection in IoT by using an evolutionary technique - Genetic Algorithm (GA) and Machine Learning (ML) algorithms. The proposed model is performing the classification of BoT-IoT dataset to evaluate its quality through the training and testing with classifiers. The data is reduced and several preprocessing steps are applied such as: unnecessary information removal, null value checking, label encoding, standard scaling and data balancing. GA has applied over the preprocessed data, to select the most relevant features and maintain model optimization. The selected features from GA are given to ML classifiers such as Logistic Regression (LR) and Support Vector Machine (SVM) and the results are evaluated using performance evaluation measures including recall, precision and f1-score. Two sets of experiments are conducted, and it is concluded that hyperparameter tuning has a significant consequence on the performance of both ML classifiers. Overall, SVM still remained the best model in both cases and overall results increased.

Intelligent prediction of engineered cementitious composites with limestone calcined clay cement (LC3-ECC) compressive strength based on novel machine learning techniques

  • Enming Li;Ning Zhang;Bin Xi;Vivian WY Tam;Jiajia Wang;Jian Zhou
    • Computers and Concrete
    • /
    • v.32 no.6
    • /
    • pp.577-594
    • /
    • 2023
  • Engineered cementitious composites with calcined clay limestone cement (LC3-ECC) as a kind of green, low-carbon and high toughness concrete, has recently received significant investigation. However, the complicated relationship between potential influential factors and LC3-ECC compressive strength makes the prediction of LC3-ECC compressive strength difficult. Regarding this, the machine learning-based prediction models for the compressive strength of LC3-ECC concrete is firstly proposed and developed. Models combine three novel meta-heuristic algorithms (golden jackal optimization algorithm, butterfly optimization algorithm and whale optimization algorithm) with support vector regression (SVR) to improve the accuracy of prediction. A new dataset about LC3-ECC compressive strength was integrated based on 156 data from previous studies and used to develop the SVR-based models. Thirteen potential factors affecting the compressive strength of LC3-ECC were comprehensively considered in the model. The results show all hybrid SVR prediction models can reach the Coefficient of determination (R2) above 0.95 for the testing set and 0.97 for the training set. Radar and Taylor plots also show better overall prediction performance of the hybrid SVR models than several traditional machine learning techniques, which confirms the superiority of the three proposed methods. The successful development of this predictive model can provide scientific guidance for LC3-ECC materials and further apply to such low-carbon, sustainable cement-based materials.

A comparison of ATR-FTIR and Raman spectroscopy for the non-destructive examination of terpenoids in medicinal plants essential oils

  • Rahul Joshi;Sushma Kholiya;Himanshu Pandey;Ritu Joshi;Omia Emmanuel;Ameeta Tewari;Taehyun Kim;Byoung-Kwan Cho
    • Korean Journal of Agricultural Science
    • /
    • v.50 no.4
    • /
    • pp.675-696
    • /
    • 2023
  • Terpenoids, also referred to as terpenes, are a large family of naturally occurring chemical compounds present in the essential oils extracted from medicinal plants. In this study, a nondestructive methodology was created by combining ATR-FT-IR (attenuated total reflectance-Fourier transform infrared), and Raman spectroscopy for the terpenoids assessment in medicinal plants essential oils from ten different geographical locations. Partial least squares regression (PLSR) and support vector regression (SVR) were used as machine learning methodologies. However, a deep learning based model called as one-dimensional convolutional neural network (1D CNN) were also developed for models comparison. With a correlation coefficient (R2) of 0.999 and a lowest RMSEP (root mean squared error of prediction) of 0.006% for the prediction datasets, the SVR model created for FT-IR spectral data outperformed both the PLSR and 1 D CNN models. On the other hand, for the classification of essential oils derived from plants collected from various geographical regions, the created SVM (support vector machine) classification model for Raman spectroscopic data obtained an overall classification accuracy of 0.997% which was superior than the FT-IR (0.986%) data. Based on the results we propose that FT-IR spectroscopy, when coupled with the SVR model, has a significant potential for the non-destructive identification of terpenoids in essential oils compared with destructive chemical analysis methods.

Research Trend analysis for Seismic Data Interpolation Methods using Machine Learning (머신러닝을 사용한 탄성파 자료 보간법 기술 연구 동향 분석)

  • Bae, Wooram;Kwon, Yeji;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.23 no.3
    • /
    • pp.192-207
    • /
    • 2020
  • We acquire seismic data with regularly or irregularly missing traces, due to economic, environmental, and mechanical problems. Since these missing data adversely affect the results of seismic data processing and analysis, we need to reconstruct the missing data before subsequent processing. However, there are economic and temporal burdens to conducting further exploration and reconstructing missing parts. Many researchers have been studying interpolation methods to accurately reconstruct missing data. Recently, various machine learning technologies such as support vector regression, autoencoder, U-Net, ResNet, and generative adversarial network (GAN) have been applied in seismic data interpolation. In this study, by reviewing these studies, we found that not only neural network models, but also support vector regression models that have relatively simple structures can interpolate missing parts of seismic data effectively. We expect that future research can improve the interpolation performance of these machine learning models by using open-source field data, data augmentation, transfer learning, and regularization based on conventional interpolation technologies.

A study on the development of severity-adjusted mortality prediction model for discharged patient with acute stroke using machine learning (머신러닝을 이용한 급성 뇌졸중 퇴원 환자의 중증도 보정 사망 예측 모형 개발에 관한 연구)

  • Baek, Seol-Kyung;Park, Jong-Ho;Kang, Sung-Hong;Park, Hye-Jin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.126-136
    • /
    • 2018
  • The purpose of this study was to develop a severity-adjustment model for predicting mortality in acute stroke patients using machine learning. Using the Korean National Hospital Discharge In-depth Injury Survey from 2006 to 2015, the study population with disease code I60-I63 (KCD 7) were extracted for further analysis. Three tools were used for the severity-adjustment of comorbidity: the Charlson Comorbidity Index (CCI), the Elixhauser comorbidity index (ECI), and the Clinical Classification Software (CCS). The severity-adjustment models for mortality prediction in patients with acute stroke were developed using logistic regression, decision tree, neural network, and support vector machine methods. The most common comorbid disease in stroke patients were hypertension, uncomplicated (43.8%) in the ECI, and essential hypertension (43.9%) in the CCS. Among the CCI, ECI, and CCS, CCS had the highest AUC value. CCS was confirmed as the best severity correction tool. In addition, the AUC values for variables of CCS including main diagnosis, gender, age, hospitalization route, and existence of surgery were 0.808 for the logistic regression analysis, 0.785 for the decision tree, 0.809 for the neural network and 0.830 for the support vector machine. Therefore, the best predictive power was achieved by the support vector machine technique. The results of this study can be used in the establishment of health policy in the future.

A Study on the Number of Domestic Food Delivery Services (국내 배달음식 이용건수 분석 및 예측)

  • Kwon, Jaeyoung;Kim, Sinae;Park, Eungee;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.5
    • /
    • pp.977-990
    • /
    • 2015
  • Food delivery services are well developed in the Republic of Korea, The increase of one person households and the success of app applications influence delivery services these days. We consider a prediction model for the food delivery service based on weather and dates to predict the number of food delivery services in 2014 using various data mining techniques. We use linear regression, random forest, gradient boosting, support vector machines, neural networks, and logistic regression to find the best prediction model. There are four categories of food delivery services and we consider two methods. For the first method, we estimate the total number of delivery services and the posterior probabilities of each delivery service. For the second method, we use different models for each category and combine them to estimate the total number of delivery services. The neural network and linear regression model perform best in the first method, this is followed by the neural network which is the best for the second method. The result shows that we can estimate the number of deliveries accurately based on dates and weather information.

Performance Analysis of Photovoltaic Power System in Saudi Arabia (사우디아라비아 태양광 발전 시스템의 성능 분석)

  • Oh, Wonwook;Kang, Soyeon;Chan, Sung-Il
    • Journal of the Korean Solar Energy Society
    • /
    • v.37 no.1
    • /
    • pp.81-90
    • /
    • 2017
  • We have analyzed the performance of 58 kWp photovoltaic (PV) power systems installed in Jeddah, Saudi Arabia. Performance ratio (PR) of 3 PV systems with 3 desert-type PV modules using monitoring data for 1 year showed 85.5% on average. Annual degradation rate of 5 individual modules achieved 0.26%, the regression model using monitoring data for the specified interval of one year showed 0.22%. Root mean square error (RMSE) of 6 big data analysis models for power output prediction in May 2016 was analyzed 2.94% using a support vector regression model.

A cautionary note on the use of Cook's distance

  • Kim, Myung Geun
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.3
    • /
    • pp.317-324
    • /
    • 2017
  • An influence measure known as Cook's distance has been used for judging the influence of each observation on the least squares estimate of the parameter vector. The distance does not reflect the distributional property of the change in the least squares estimator of the regression coefficients due to case deletions: the distribution has a covariance matrix of rank one and thus it has a support set determined by a line in the multidimensional Euclidean space. As a result, the use of Cook's distance may fail to correctly provide information about influential observations, and we study some reasons for the failure. Three illustrative examples will be provided, in which the use of Cook's distance fails to give the right information about influential observations or it provides the right information about the most influential observation. We will seek some reasons for the wrong or right provision of information.

A Study on the Sentiment analysis of Google Play Store App Comment Based on WPM(Word Piece Model) (WPM(Word Piece Model)을 활용한 구글 플레이스토어 앱의 댓글 감정 분석 연구)

  • Park, jae Hoon;Koo, Myong-wan
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.291-295
    • /
    • 2016
  • 본 논문에서는 한국어 기본 유니트 단위로 WPM을 활용한 구글 플레이 스토어 앱의 댓글 감정분석을 수행하였다. 먼저 자동 띄어쓰기 시스템을 적용한 후, 어절단위, 형태소 분석기, WPM을 각각 적용하여 모델을 생성하고, 로지스틱 회귀(Logistic Regression), 소프트맥스 회귀(Softmax Regression), 서포트 벡터머신(Support Vector Machine, SVM)등의 알고리즘을 이용하여 댓글 감정(긍정과 부정)을 비교 분석하였다. 그 결과 어절단위, 형태소 분석기보다 WPM이 최대 25%의 향상된 결과를 얻었다. 또한 분류 과정에서 로지스틱회귀, 소프트맥스 회귀보다는 SVM 성능이 우수했으며, SVM의 기본 파라미터({'kernel':('linear'), 'c':[4]})보다 최적의 파라미터를 적용({'kernel': ('linear','rbf', 'sigmoid', 'poly'), 'C':[0.01, 0.1, 1.4.5]} 하였을 때, 최대 91%의 성능이 나타났다.

  • PDF