• Title/Summary/Keyword: feature reduction

Search Result 595, Processing Time 0.026 seconds

Development of a Clustering Model for Automatic Knowledge Classification (지식 분류의 자동화를 위한 클러스터링 모형 연구)

  • 정영미;이재윤
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.2
    • /
    • pp.203-230
    • /
    • 2001
  • The purpose of this study is to develop a document clustering model for automatic classification of knowledge. Two test collections of newspaper article texts and journal article abstracts are built for the clustering experiment. Various feature reduction criteria as well as term weighting methods are applied to the term sets of the test collections, and cosine and Jaccard coefficients are used as similarity measures. The performances of complete linkage and K-means clustering algorithms are compared using different feature selection methods and various term weights. It was found that complete linkage clustering outperforms K-means algorithm and feature reduction up to almost 10% of the total feature sets does not lower the performance of document clustering to any significant extent.

  • PDF

A Study on Trend Sharing in Segmental-feature HMM (분절 특징 은닉 마코프 모델에서의 경향 공유에 관한 연구)

  • 윤영선
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.7
    • /
    • pp.641-647
    • /
    • 2002
  • In this paper, we propose the reduction method of the number of parameters in the segmental-feature HMM using trend quantization method. The proposed method shares the trend information of the polynomial trajectories by quantization. The trajectory is obtained by the sequence of feature vectors of speech signals and can be divided by trend and location information. The trend indicates the variation of consequent frame features, while the location points to the positional difference of the trajectories. Since the trend occupies the large portion of SFHMM, if the trend is shared, the number of parameters maybe decreases. To exploit the proposed system the experiments are performed on TIMIT corpus. The experimental results show that the performance of the proposed system is roughly similar to that of previous system. Therefore, the proposed system can be considered one of parameter reduction method.

Face Recognition Using A New Methodology For Independent Component Analysis (새로운 독립 요소 해석 방법론에 의한 얼굴 인식)

  • 류재흥;고재흥
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2000.11a
    • /
    • pp.305-309
    • /
    • 2000
  • In this paper, we presents a new methodology for face recognition after analysing conventional ICA(Independent Component Analysis) based approach. In the literature we found that ICA based methods have followed the same procedure without any exception, first PCA(Principal Component Analysis) has been used for feature extraction, next ICA learning method has been applied for feature enhancement in the reduced dimension. However, it is contradiction that features are extracted using higher order moments depend on variance, the second order statistics. It is not considered that a necessary component can be located in the discarded feature space. In the new methodology, features are extracted using the magnitude of kurtosis(4-th order central moment or cumulant). This corresponds to the PCA based feature extraction using eigenvalue(2nd order central moment or variance). The synergy effect of PCA and ICA can be achieved if PCA is used for noise reduction filter. ICA methodology is analysed using SVD(Singular Value Decomposition). PCA does whitening and noise reduction. ICA performs the feature extraction. Simulation results show the effectiveness of the methodology compared to the conventional ICA approach.

  • PDF

Comparative Analysis of Dimensionality Reduction Techniques for Advanced Ransomware Detection with Machine Learning (기계학습 기반 랜섬웨어 공격 탐지를 위한 효과적인 특성 추출기법 비교분석)

  • Kim Han Seok;Lee Soo Jin
    • Convergence Security Journal
    • /
    • v.23 no.1
    • /
    • pp.117-123
    • /
    • 2023
  • To detect advanced ransomware attacks with machine learning-based models, the classification model must train learning data with high-dimensional feature space. And in this case, a 'curse of dimension' phenomenon is likely to occur. Therefore, dimensionality reduction of features must be preceded in order to increase the accuracy of the learning model and improve the execution speed while avoiding the 'curse of dimension' phenomenon. In this paper, we conducted classification of ransomware by applying three machine learning models and two feature extraction techniques to two datasets with extremely different dimensions of feature space. As a result of the experiment, the feature dimensionality reduction techniques did not significantly affect the performance improvement in binary classification, and it was the same even when the dimension of featurespace was small in multi-class clasification. However, when the dataset had high-dimensional feature space, LDA(Linear Discriminant Analysis) showed quite excellent performance.

Effective Dimensionality Reduction of Payload-Based Anomaly Detection in TMAD Model for HTTP Payload

  • Kakavand, Mohsen;Mustapha, Norwati;Mustapha, Aida;Abdullah, Mohd Taufik
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.8
    • /
    • pp.3884-3910
    • /
    • 2016
  • Intrusion Detection System (IDS) in general considers a big amount of data that are highly redundant and irrelevant. This trait causes slow instruction, assessment procedures, high resource consumption and poor detection rate. Due to their expensive computational requirements during both training and detection, IDSs are mostly ineffective for real-time anomaly detection. This paper proposes a dimensionality reduction technique that is able to enhance the performance of IDSs up to constant time O(1) based on the Principle Component Analysis (PCA). Furthermore, the present study offers a feature selection approach for identifying major components in real time. The PCA algorithm transforms high-dimensional feature vectors into a low-dimensional feature space, which is used to determine the optimum volume of factors. The proposed approach was assessed using HTTP packet payload of ISCX 2012 IDS and DARPA 1999 dataset. The experimental outcome demonstrated that our proposed anomaly detection achieved promising results with 97% detection rate with 1.2% false positive rate for ISCX 2012 dataset and 100% detection rate with 0.06% false positive rate for DARPA 1999 dataset. Our proposed anomaly detection also achieved comparable performance in terms of computational complexity when compared to three state-of-the-art anomaly detection systems.

A Novel Approach of Feature Extraction for Analog Circuit Fault Diagnosis Based on WPD-LLE-CSA

  • Wang, Yuehai;Ma, Yuying;Cui, Shiming;Yan, Yongzheng
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.6
    • /
    • pp.2485-2492
    • /
    • 2018
  • The rapid development of large-scale integrated circuits has brought great challenges to the circuit testing and diagnosis, and due to the lack of exact fault models, inaccurate analog components tolerance, and some nonlinear factors, the analog circuit fault diagnosis is still regarded as an extremely difficult problem. To cope with the problem that it's difficult to extract fault features effectively from masses of original data of the nonlinear continuous analog circuit output signal, a novel approach of feature extraction and dimension reduction for analog circuit fault diagnosis based on wavelet packet decomposition, local linear embedding algorithm, and clone selection algorithm (WPD-LLE-CSA) is proposed. The proposed method can identify faulty components in complicated analog circuits with a high accuracy above 99%. Compared with the existing feature extraction methods, the proposed method can significantly reduce the quantity of features with less time spent under the premise of maintaining a high level of diagnosing rate, and also the ratio of dimensionality reduction was discussed. Several groups of experiments are conducted to demonstrate the efficiency of the proposed method.

Rough Entropy-based Knowledge Reduction using Rough Set Theory (러프집합 이론을 이용한 러프 엔트로피 기반 지식감축)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.12 no.6
    • /
    • pp.223-229
    • /
    • 2014
  • In an attempt to retrieve useful information for an efficient decision in the large knowledge system, it is generally necessary and important for a refined feature selection. Rough set has difficulty in generating optimal reducts and classifying boundary objects. In this paper, we propose quick reduction algorithm generating optimal features by rough entropy analysis for condition and decision attributes to improve these restrictions. We define a new conditional information entropy for efficient feature extraction and describe procedure of feature selection to classify the significance of features. Through the simulation of 5 datasets from UCI storage, we compare our feature selection approach based on rough set theory with the other selection theories. As the result, our modeling method is more efficient than the previous theories in classification accuracy for feature selection.

Discriminative Manifold Learning Network using Adversarial Examples for Image Classification

  • Zhang, Yuan;Shi, Biming
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.5
    • /
    • pp.2099-2106
    • /
    • 2018
  • This study presents a novel approach of discriminative feature vectors based on manifold learning using nonlinear dimension reduction (DR) technique to improve loss function, and combine with the Adversarial examples to regularize the object function for image classification. The traditional convolutional neural networks (CNN) with many new regularization approach has been successfully used for image classification tasks, and it achieved good results, hence it costs a lot of Calculated spacing and timing. Significantly, distrinct from traditional CNN, we discriminate the feature vectors for objects without empirically-tuned parameter, these Discriminative features intend to remain the lower-dimensional relationship corresponding high-dimension manifold after projecting the image feature vectors from high-dimension to lower-dimension, and we optimize the constrains of the preserving local features based on manifold, which narrow the mapped feature information from the same class and push different class away. Using Adversarial examples, improved loss function with additional regularization term intends to boost the Robustness and generalization of neural network. experimental results indicate that the approach based on discriminative feature of manifold learning is not only valid, but also more efficient in image classification tasks. Furthermore, the proposed approach achieves competitive classification performances for three benchmark datasets : MNIST, CIFAR-10, SVHN.

Wind-induced vibration characteristics and parametric analysis of large hyperbolic cooling towers with different feature sizes

  • Ke, Shitang;Ge, Yaojun;Zhao, Lin;Tamura, Yukio
    • Structural Engineering and Mechanics
    • /
    • v.54 no.5
    • /
    • pp.891-908
    • /
    • 2015
  • For a systematic study on wind-induced vibration characteristics of large hyperbolic cooling towers with different feature sizes, the pressure measurement tests are finished on the rigid body models of three representative cooling towers with the height of 155 m, 177 m and 215 m respectively. Combining the refined frequency-domain algorithm of wind-induced responses, the wind-induced average response, resonant response, background response, coupling response and wind vibration coefficients of large cooling towers with different feature sizes are obtained. Based on the calculating results, the parametric analysis on wind-induced vibration of cooling towers is carried out, e.g. the feature sizes, damping ratio and the interference effect of surrounding buildings. The discussion shows that the increase of feature sizes makes wind-induced average response and fluctuating response larger correspondingly, and the proportion of resonant response also gradually increased, but it has little effect on the wind vibration coefficient. The increase of damping ratio makes resonant response and the wind vibration coefficient decreases obviously, which brings about no effect on average response and background response. The interference effect of surrounding buildings makes the fluctuating response and wind vibration coefficient increased significantly, furthermore, the increase ranges of resonant response is greater than background response.

Relevancy contemplation in medical data analytics and ranking of feature selection algorithms

  • P. Antony Seba;J. V. Bibal Benifa
    • ETRI Journal
    • /
    • v.45 no.3
    • /
    • pp.448-461
    • /
    • 2023
  • This article performs a detailed data scrutiny on a chronic kidney disease (CKD) dataset to select efficient instances and relevant features. Data relevancy is investigated using feature extraction, hybrid outlier detection, and handling of missing values. Data instances that do not influence the target are removed using data envelopment analysis to enable reduction of rows. Column reduction is achieved by ranking the attributes through feature selection methodologies, namely, extra-trees classifier, recursive feature elimination, chi-squared test, analysis of variance, and mutual information. These methodologies are ranked via Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) using weight optimization to identify the optimal features for model building from the CKD dataset to facilitate better prediction while diagnosing the severity of the disease. An efficient hybrid ensemble and novel similarity-based classifiers are built using the pruned dataset, and the results are thereafter compared with random forest, AdaBoost, naive Bayes, k-nearest neighbors, and support vector machines. The hybrid ensemble classifier yields a better prediction accuracy of 98.31% for the features selected by extra tree classifier (ETC), which is ranked as the best by TOPSIS.