• Title/Summary/Keyword: Sequential Forward Selection

Search Result 23, Processing Time 0.02 seconds

Wine Quality Assessment Using a Decision Tree with the Features Recommended by the Sequential Forward Selection

  • Lee, Seunghan;Kang, Kyungtae;Noh, Dong Kun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.2
    • /
    • pp.81-87
    • /
    • 2017
  • Nowadays wine is increasingly enjoyed by a wider range of consumers, and wine certification and quality assessment are key elements in supporting the wine industry to develop new technologies for both wine making and selling processes. There have been many attempts to construct a more methodical approach to the assessment of wines, but most of them rely on objective decision rather than subjective judgement. In this paper, we propose a data mining approach to predict human wine taste preferences that is based on easily available analytical tests at the certification step. We used sequential forward selection and decision tree for this purpose. Experiments with the wine quality dataset from the UC Irvine Machine Learning Repository demonstrate the accuracies of 76.7% and 78.7% for red and white wines respectively.

Comparison of Feature Selection Processes for Image Retrieval Applications

  • Choi, Young-Mee;Choo, Moon-Won
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.12
    • /
    • pp.1544-1548
    • /
    • 2011
  • A process of choosing a subset of original features, so called feature selection, is considered as a crucial preprocessing step to image processing applications. There are already large pools of techniques developed for machine learning and data mining fields. In this paper, basically two methods, non-feature selection and feature selection, are investigated to compare their predictive effectiveness of classification. Color co-occurrence feature is used for defining image features. Standard Sequential Forward Selection algorithm are used for feature selection to identify relevant features and redundancy among relevant features. Four color spaces, RGB, YCbCr, HSV, and Gaussian space are considered for computing color co-occurrence features. Gray-level image feature is also considered for the performance comparison reasons. The experimental results are presented.

Feature Selection Algorithm for Intrusions Detection System using Sequential Forward Search and Random Forest Classifier

  • Lee, Jinlee;Park, Dooho;Lee, Changhoon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.5132-5148
    • /
    • 2017
  • Cyber attacks are evolving commensurate with recent developments in information security technology. Intrusion detection systems collect various types of data from computers and networks to detect security threats and analyze the attack information. The large amount of data examined make the large number of computations and low detection rates problematic. Feature selection is expected to improve the classification performance and provide faster and more cost-effective results. Despite the various feature selection studies conducted for intrusion detection systems, it is difficult to automate feature selection because it is based on the knowledge of security experts. This paper proposes a feature selection technique to overcome the performance problems of intrusion detection systems. Focusing on feature selection, the first phase of the proposed system aims at constructing a feature subset using a sequential forward floating search (SFFS) to downsize the dimension of the variables. The second phase constructs a classification model with the selected feature subset using a random forest classifier (RFC) and evaluates the classification accuracy. Experiments were conducted with the NSL-KDD dataset using SFFS-RF, and the results indicated that feature selection techniques are a necessary preprocessing step to improve the overall system performance in systems that handle large datasets. They also verified that SFFS-RF could be used for data classification. In conclusion, SFFS-RF could be the key to improving the classification model performance in machine learning.

Feature Selection for Multi-Class Genre Classification using Gaussian Mixture Model (Gaussian Mixture Model을 이용한 다중 범주 분류를 위한 특징벡터 선택 알고리즘)

  • Moon, Sun-Kuk;Choi, Tack-Sung;Park, Young-Cheol;Youn, Dae-Hee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.10C
    • /
    • pp.965-974
    • /
    • 2007
  • In this paper, we proposed the feature selection algorithm for multi-class genre classification. In our proposed algorithm, we developed GMM separation score based on Gaussian mixture model for measuring separability between two genres. Additionally, we improved feature subset selection algorithm based on sequential forward selection for multi-class genre classification. Instead of setting criterion as entire genre separability measures, we set criterion as worst genre separability measure for each sequential selection step. In order to assess the performance proposed algorithm, we extracted various features which represent characteristics such as timbre, rhythm, pitch and so on. Then, we investigate classification performance by GMM classifier and k-NN classifier for selected features using conventional algorithm and proposed algorithm. Proposed algorithm showed improved performance in classification accuracy up to 10 percent for classification experiments of low dimension feature vector especially.

Automated Classification of Audio Genre using Sequential Forward Selection Method

  • Lee Jong Hak;Yoon Won lung;Lee Kang Kyu;Park Kyu Sik
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.768-771
    • /
    • 2004
  • In this paper, we propose a content-based audio genre classification algorithm that automatically classifies the query audio into five genres such as Classic, Hiphop, Jazz, Rock, Speech using digital signal processing approach. From the 20 second query audio file, 54 dimensional feature vectors, including Spectral Centroid, Rolloff, Flux, LPC, MFCC, is extracted from each query audio. For the classification algorithm, k-NN, Gaussian, GMM classifier is used. In order to choose optimum features from the 54 dimension feature vectors, SFS (Sequential Forward Selection) method is applied to draw 10 dimension optimum features and these are used for the genre classification algorithm. From the experimental result, we verify the superior performance of the SFS method that provides near $90{\%}$ success rate for the genre classification which means $10{\%}$-$20{\%}$ improvements over the previous methods

  • PDF

Speech Feature Selection of Normal and Autistic children using Filter and Wrapper Approach

  • Akhtar, Muhammed Ali;Ali, Syed Abbas;Siddiqui, Maria Andleeb
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.5
    • /
    • pp.129-132
    • /
    • 2021
  • Two feature selection approaches are analyzed in this study. First Approach used in this paper is Filter Approach which comprises of correlation technique. It provides two reduced feature sets using positive and negative correlation. Secondly Approach used in this paper is the wrapper approach which comprises of Sequential Forward Selection technique. The reduced feature set obtained by positive correlation results comprises of Rate of Acceleration, Intensity and Formant. The reduced feature set obtained by positive correlation results comprises of Rasta PLP, Log energy, Log power and Zero Crossing Rate. Pitch, Rate of Acceleration, Log Power, MFCC, LPCC is the reduced feature set yield as a result of Sequential Forwarding Selection.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.

Genetic Algorithm Based Feature Selection Method Development for Pattern Recognition (패턴 인식문제를 위한 유전자 알고리즘 기반 특징 선택 방법 개발)

  • Park Chang-Hyun;Kim Ho-Duck;Yang Hyun-Chang;Sim Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.4
    • /
    • pp.466-471
    • /
    • 2006
  • IAn important problem of pattern recognition is to extract or select feature set, which is included in the pre-processing stage. In order to extract feature set, Principal component analysis has been usually used and SFS(Sequential Forward Selection) and SBS(Sequential Backward Selection) have been used as a feature selection method. This paper applies genetic algorithm which is a popular method for nonlinear optimization problem to the feature selection problem. So, we call it Genetic Algorithm Feature Selection(GAFS) and this algorithm is compared to other methods in the performance aspect.

Analysis of Voice Quality Features and Their Contribution to Emotion Recognition (음성감정인식에서 음색 특성 및 영향 분석)

  • Lee, Jung-In;Choi, Jeung-Yoon;Kang, Hong-Goo
    • Journal of Broadcast Engineering
    • /
    • v.18 no.5
    • /
    • pp.771-774
    • /
    • 2013
  • This study investigates the relationship between voice quality measurements and emotional states, in addition to conventional prosodic and cepstral features. Open quotient, harmonics-to-noise ratio, spectral tilt, spectral sharpness, and band energy were analyzed as voice quality features, and prosodic features related to fundamental frequency and energy are also examined. ANOVA tests and Sequential Forward Selection are used to evaluate significance and verify performance. Classification experiments show that using the proposed features increases overall accuracy, and in particular, errors between happy and angry decrease. Results also show that adding voice quality features to conventional cepstral features leads to increase in performance.