• Title/Summary/Keyword: Support Vector Machine-Recursive Feature Elimination

Search Result 13, Processing Time 0.026 seconds

Gene Selection Based on Support Vector Machine using Bootstrap (붓스트랩 방법을 활용한 SVM 기반 유전자 선택 기법)

  • Song, Seuck-Heun;Kim, Kyoung-Hee;Park, Chang-Yi;Koo, Ja-Yong
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.3
    • /
    • pp.531-540
    • /
    • 2007
  • The recursive feature elimination for support vector machine is known to be useful in selecting relevant genes. Since the criterion for choosing relevant genes is the absolute value of a coefficient, the recursive feature elimination may suffer from a scaling problem. We propose a modified version of the recursive feature elimination algorithm using bootstrap. In our method, the criterion for determining relevant genes is the absolute value of a coefficient divided by its standard error, which accounts for statistical variability of the coefficient. Through numerical examples, we illustrate that our method is effective in gene selection.

Combining Support Vector Machine Recursive Feature Elimination and Intensity-dependent Normalization for Gene Selection in RNAseq (RNAseq 빅데이터에서 유전자 선택을 위한 밀집도-의존 정규화 기반의 서포트-벡터 머신 병합법)

  • Kim, Chayoung
    • Journal of Internet Computing and Services
    • /
    • v.18 no.5
    • /
    • pp.47-53
    • /
    • 2017
  • In past few years, high-throughput sequencing, big-data generation, cloud computing, and computational biology are revolutionary. RNA sequencing is emerging as an attractive alternative to DNA microarrays. And the methods for constructing Gene Regulatory Network (GRN) from RNA-Seq are extremely lacking and urgently required. Because GRN has obtained substantial observation from genomics and bioinformatics, an elementary requirement of the GRN has been to maximize distinguishable genes. Despite of RNA sequencing techniques to generate a big amount of data, there are few computational methods to exploit the huge amount of the big data. Therefore, we have suggested a novel gene selection algorithm combining Support Vector Machines and Intensity-dependent normalization, which uses log differential expression ratio in RNAseq. It is an extended variation of support vector machine recursive feature elimination (SVM-RFE) algorithm. This algorithm accomplishes minimum relevancy with subsets of Big-Data, such as NCBI-GEO. The proposed algorithm was compared to the existing one which uses gene expression profiling DNA microarrays. It finds that the proposed algorithm have provided as convenient and quick method than previous because it uses all functions in R package and have more improvement with regard to the classification accuracy based on gene ontology and time consuming in terms of Big-Data. The comparison was performed based on the number of genes selected in RNAseq Big-Data.

Variable Selection of Feature Pattern using SVM-based Criterion with Q-Learning in Reinforcement Learning (SVM-기반 제약 조건과 강화학습의 Q-learning을 이용한 변별력이 확실한 특징 패턴 선택)

  • Kim, Chayoung
    • Journal of Internet Computing and Services
    • /
    • v.20 no.4
    • /
    • pp.21-27
    • /
    • 2019
  • Selection of feature pattern gathered from the observation of the RNA sequencing data (RNA-seq) are not all equally informative for identification of differential expressions: some of them may be noisy, correlated or irrelevant because of redundancy in Big-Data sets. Variable selection of feature pattern aims at differential expressed gene set that is significantly relevant for a special task. This issues are complex and important in many domains, for example. In terms of a computational research field of machine learning, selection of feature pattern has been studied such as Random Forest, K-Nearest and Support Vector Machine (SVM). One of most the well-known machine learning algorithms is SVM, which is classical as well as original. The one of a member of SVM-criterion is Support Vector Machine-Recursive Feature Elimination (SVM-RFE), which have been utilized in our research work. We propose a novel algorithm of the SVM-RFE with Q-learning in reinforcement learning for better variable selection of feature pattern. By comparing our proposed algorithm with the well-known SVM-RFE combining Welch' T in published data, our result can show that the criterion from weight vector of SVM-RFE enhanced by Q-learning has been improved by an off-policy by a more exploratory scheme of Q-learning.

Development of suspended solid concentration measurement technique based on multi-spectral satellite imagery in Nakdong River using machine learning model (기계학습모형을 이용한 다분광 위성 영상 기반 낙동강 부유 물질 농도 계측 기법 개발)

  • Kwon, Siyoon;Seo, Il Won;Beak, Donghae
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.2
    • /
    • pp.121-133
    • /
    • 2021
  • Suspended Solids (SS) generated in rivers are mainly introduced from non-point pollutants or appear naturally in the water body, and are an important water quality factor that may cause long-term water pollution by being deposited. However, the conventional method of measuring the concentration of suspended solids is labor-intensive, and it is difficult to obtain a vast amount of data via point measurement. Therefore, in this study, a model for measuring the concentration of suspended solids based on remote sensing in the Nakdong River was developed using Sentinel-2 data that provides high-resolution multi-spectral satellite images. The proposed model considers the spectral bands and band ratios of various wavelength bands using a machine learning model, Support Vector Regression (SVR), to overcome the limitation of the existing remote sensing-based regression equations. The optimal combination of variables was derived using the Recursive Feature Elimination (RFE) and weight coefficients for each variable of SVR. The results show that the 705nm band belonging to the red-edge wavelength band was estimated as the most important spectral band, and the proposed SVR model produced the most accurate measurement compared with the previous regression equations. By using the RFE, the SVR model developed in this study reduces the variable dependence compared to the existing regression equations based on the single spectral band or band ratio and provides more accurate prediction of spatial distribution of suspended solids concentration.

An Application of Support Vector Machines to Customer Loyalty Classification of Korean Retailing Company Using R Language

  • Nguyen, Phu-Thien;Lee, Young-Chan
    • The Journal of Information Systems
    • /
    • v.26 no.4
    • /
    • pp.17-37
    • /
    • 2017
  • Purpose Customer Loyalty is the most important factor of customer relationship management (CRM). Especially in retailing industry, where customers have many options of where to spend their money. Classifying loyal customers through customers' data can help retailing companies build more efficient marketing strategies and gain competitive advantages. This study aims to construct classification models of distinguishing the loyal customers within a Korean retailing company using data mining techniques with R language. Design/methodology/approach In order to classify retailing customers, we used combination of support vector machines (SVMs) and other classification algorithms of machine learning (ML) with the support of recursive feature elimination (RFE). In particular, we first clean the dataset to remove outlier and impute the missing value. Then we used a RFE framework for electing most significant predictors. Finally, we construct models with classification algorithms, tune the best parameters and compare the performances among them. Findings The results reveal that ML classification techniques can work well with CRM data in Korean retailing industry. Moreover, customer loyalty is impacted by not only unique factor such as net promoter score but also other purchase habits such as expensive goods preferring or multi-branch visiting and so on. We also prove that with retailing customer's dataset the model constructed by SVMs algorithm has given better performance than others. We expect that the models in this study can be used by other retailing companies to classify their customers, then they can focus on giving services to these potential vip group. We also hope that the results of this ML algorithm using R language could be useful to other researchers for selecting appropriate ML algorithms.

Classification method for failure modes of RC columns based on key characteristic parameters

  • Yu, Bo;Yu, Zecheng;Li, Qiming;Li, Bing
    • Structural Engineering and Mechanics
    • /
    • v.84 no.1
    • /
    • pp.1-16
    • /
    • 2022
  • An efficient and accurate classification method for failure modes of reinforced concrete (RC) columns was proposed based on key characteristic parameters. The weight coefficients of seven characteristic parameters for failure modes of RC columns were determined first based on the support vector machine-recursive feature elimination. Then key characteristic parameters for classifying flexure, flexure-shear and shear failure modes of RC columns were selected respectively. Subsequently, a support vector machine with key characteristic parameters (SVM-K) was proposed to classify three types of failure modes of RC columns. The optimal parameters of SVM-K were determined by using the ten-fold cross-validation and the grid-search algorithm based on 270 sets of available experimental data. Results indicate that the proposed SVM-K has high overall accuracy, recall and precision (e.g., accuracy>95%, recall>90%, precision>90%), which means that the proposed SVM-K has superior performance for classification of failure modes of RC columns. Based on the selected key characteristic parameters for different types of failure modes of RC columns, the accuracy of SVM-K is improved and the decision function of SVM-K is simplified by reducing the dimensions and number of support vectors.

Runoff Prediction from Machine Learning Models Coupled with Empirical Mode Decomposition: A case Study of the Grand River Basin in Canada

  • Parisouj, Peiman;Jun, Changhyun;Nezhad, Somayeh Moghimi;Narimani, Roya
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.136-136
    • /
    • 2022
  • This study investigates the possibility of coupling empirical mode decomposition (EMD) for runoff prediction from machine learning (ML) models. Here, support vector regression (SVR) and convolutional neural network (CNN) were considered for ML algorithms. Precipitation (P), minimum temperature (Tmin), maximum temperature (Tmax) and their intrinsic mode functions (IMF) values were used for input variables at a monthly scale from Jan. 1973 to Dec. 2020 in the Grand river basin, Canada. The support vector machine-recursive feature elimination (SVM-RFE) technique was applied for finding the best combination of predictors among input variables. The results show that the proposed method outperformed the individual performance of SVR and CNN during the training and testing periods in the study area. According to the correlation coefficient (R), the EMD-SVR model outperformed the EMD-CNN model in both training and testing even though the CNN indicated a better performance than the SVR before using IMF values. The EMD-SVR model showed higher improvement in R value (38.7%) than that from the EMD-CNN model (7.1%). It should be noted that the coupled models of EMD-SVR and EMD-CNN represented much higher accuracy in runoff prediction with respect to the considered evaluation indicators, including root mean square error (RMSE) and R values.

  • PDF

Landslide susceptibility assessment using feature selection-based machine learning models

  • Liu, Lei-Lei;Yang, Can;Wang, Xiao-Mi
    • Geomechanics and Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-16
    • /
    • 2021
  • Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The large number of inputs or conditioning factors for these models, however, can reduce the computation efficiency and increase the difficulty in collecting data. Feature selection is a good tool to address this problem by selecting the most important features among all factors to reduce the size of the input variables. However, two important questions need to be solved: (1) how do feature selection methods affect the performance of machine learning models? and (2) which feature selection method is the most suitable for a given machine learning model? This paper aims to address these two questions by comparing the predictive performance of 13 feature selection-based machine learning (FS-ML) models and 5 ordinary machine learning models on LSA. First, five commonly used machine learning models (i.e., logistic regression, support vector machine, artificial neural network, Gaussian process and random forest) and six typical feature selection methods in the literature are adopted to constitute the proposed models. Then, fifteen conditioning factors are chosen as input variables and 1,017 landslides are used as recorded data. Next, feature selection methods are used to obtain the importance of the conditioning factors to create feature subsets, based on which 13 FS-ML models are constructed. For each of the machine learning models, a best optimized FS-ML model is selected according to the area under curve value. Finally, five optimal FS-ML models are obtained and applied to the LSA of the studied area. The predictive abilities of the FS-ML models on LSA are verified and compared through the receive operating characteristic curve and statistical indicators such as sensitivity, specificity and accuracy. The results showed that different feature selection methods have different effects on the performance of LSA machine learning models. FS-ML models generally outperform the ordinary machine learning models. The best FS-ML model is the recursive feature elimination (RFE) optimized RF, and RFE is an optimal method for feature selection.

Classification of Fall Crops Using Unmanned Aerial Vehicle Based Image and Support Vector Machine Model - Focusing on Idam-ri, Goesan-gun, Chungcheongbuk-do - (무인기 기반 영상과 SVM 모델을 이용한 가을수확 작물 분류 - 충북 괴산군 이담리 지역을 중심으로 -)

  • Jeong, Chan-Hee;Go, Seung-Hwan;Park, Jong-Hwa
    • Journal of Korean Society of Rural Planning
    • /
    • v.28 no.1
    • /
    • pp.57-69
    • /
    • 2022
  • Crop classification is very important for estimating crop yield and figuring out accurate cultivation area. The purpose of this study is to classify crops harvested in fall in Idam-ri, Goesan-gun, Chungcheongbuk-do by using unmanned aerial vehicle (UAV) images and support vector machine (SVM) model. The study proceeded in the order of image acquisition, variable extraction, model building, and evaluation. First, RGB and multispectral image were acquired on September 13, 2021. Independent variables which were applied to Farm-Map, consisted gray level co-occurrence matrix (GLCM)-based texture characteristics by using RGB images, and multispectral reflectance data. The crop classification model was built using texture characteristics and reflectance data, and finally, accuracy evaluation was performed using the error matrix. As a result of the study, the classification model consisted of four types to compare the classification accuracy according to the combination of independent variables. The result of four types of model analysis, recursive feature elimination (RFE) model showed the highest accuracy with an overall accuracy (OA) of 88.64%, Kappa coefficient of 0.84. UAV-based RGB and multispectral images effectively classified cabbage, rice and soybean when the SVM model was applied. The results of this study provided capacity usefully in classifying crops using single-period images. These technologies are expected to improve the accuracy and efficiency of crop cultivation area surveys by supplementing additional data learning, and to provide basic data for estimating crop yields.

A MA-plot-based Feature Selection by MRMR in SVM-RFE in RNA-Sequencing Data

  • Kim, Chayoung
    • The Journal of Korean Institute of Information Technology
    • /
    • v.16 no.12
    • /
    • pp.25-30
    • /
    • 2018
  • It is extremely lacking and urgently required that the method of constructing the Gene Regulatory Network (GRN) from RNA-Sequencing data (RNA-Seq) because of Big-Data and GRN in Big-Data has obtained substantial observation as the interactions among relevant featured genes and their regulations. We propose newly the computational comparative feature patterns selection method by implementing a minimum-redundancy maximum-relevancy (MRMR) filter the support vector machine-recursive feature elimination (SVM-RFE) with Intensity-dependent normalization (DEGSEQ) as a preprocessor for emphasizing equal preciseness in RNA-seq in Big-Data. We found out the proposed algorithm might be more scalable and convenient because of all libraries in R package and be more improved in terms of the time consuming in Big-Data and minimum-redundancy maximum-relevancy of a set of feature patterns at the same time.