• Title/Summary/Keyword: Feature Filtering

Search Result 317, Processing Time 0.019 seconds

Feature Filtering Methods for Web Documents Clustering (웹 문서 클러스터링에서의 자질 필터링 방법)

  • Park Heum;Kwon Hyuk-Chul
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.489-498
    • /
    • 2006
  • Clustering results differ according to the datasets and the performance worsens even while using web documents which are manually processed by an indexer, because although representative clusters for a feature can be obtained by statistical feature selection methods, irrelevant features(i.e., non-obvious features and those appearing in general documents) are not eliminated. Those irrelevant features should be eliminated for improving clustering performance. Therefore, this paper proposes three feature-filtering algorithms which consider feature values per document set, together with distribution, frequency, and weights of features per document set: (l) features filtering algorithm in a document (FFID), (2) features filtering algorithm in a document matrix (FFIM), and (3) a hybrid method combining both FFID and FFIM (HFF). We have tested the clustering performance by feature selection using term frequency and expand co link information, and by feature filtering using the above methods FFID, FFIM, HFF methods. According to the results of our experiments, HFF had the best performance, whereas FFIM performed better than FFID.

Feature Vector Processing for Speech Emotion Recognition in Noisy Environments (잡음 환경에서의 음성 감정 인식을 위한 특징 벡터 처리)

  • Park, Jeong-Sik;Oh, Yung-Hwan
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.77-85
    • /
    • 2010
  • This paper proposes an efficient feature vector processing technique to guard the Speech Emotion Recognition (SER) system against a variety of noises. In the proposed approach, emotional feature vectors are extracted from speech processed by comb filtering. Then, these extracts are used in a robust model construction based on feature vector classification. We modify conventional comb filtering by using speech presence probability to minimize drawbacks due to incorrect pitch estimation under background noise conditions. The modified comb filtering can correctly enhance the harmonics, which is an important factor used in SER. Feature vector classification technique categorizes feature vectors into either discriminative vectors or non-discriminative vectors based on a log-likelihood criterion. This method can successfully select the discriminative vectors while preserving correct emotional characteristics. Thus, robust emotion models can be constructed by only using such discriminative vectors. On SER experiment using an emotional speech corpus contaminated by various noises, our approach exhibited superior performance to the baseline system.

  • PDF

Feature point extraction using scale-space filtering and Tracking algorithm based on comparing texturedness similarity (스케일-스페이스 필터링을 통한 특징점 추출 및 질감도 비교를 적용한 추적 알고리즘)

  • Park, Yong-Hee;Kwon, Oh-Seok
    • Journal of Internet Computing and Services
    • /
    • v.6 no.5
    • /
    • pp.85-95
    • /
    • 2005
  • This study proposes a method of feature point extraction using scale-space filtering and a feature point tracking algorithm based on a texturedness similarity comparison, With well-defined operators one can select a scale parameter for feature point extraction; this affects the selection and localization of the feature points and also the performance of the tracking algorithm. This study suggests a feature extraction method using scale-space filtering, With a change in the camera's point of view or movement of an object in sequential images, the window of a feature point will have an affine transform. Traditionally, it is difficult to measure the similarity between correspondence points, and tracking errors often occur. This study also suggests a tracking algorithm that expands Shi-Tomasi-Kanade's tracking algorithm with texturedness similarity.

  • PDF

A Novel Statistical Feature Selection Approach for Text Categorization

  • Fattah, Mohamed Abdel
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1397-1409
    • /
    • 2017
  • For text categorization task, distinctive text features selection is important due to feature space high dimensionality. It is important to decrease the feature space dimension to decrease processing time and increase accuracy. In the current study, for text categorization task, we introduce a novel statistical feature selection approach. This approach measures the term distribution in all collection documents, the term distribution in a certain category and the term distribution in a certain class relative to other classes. The proposed method results show its superiority over the traditional feature selection methods.

Filtering of Filter-Bank Energies for Robust Speech Recognition

  • Jung, Ho-Young
    • ETRI Journal
    • /
    • v.26 no.3
    • /
    • pp.273-276
    • /
    • 2004
  • We propose a novel feature processing technique which can provide a cepstral liftering effect in the log-spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance-based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log-spectral domain corresponding to the cepstral liftering. The proposed method performs a high-pass filtering based on the decorrelation of filter-bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature.

  • PDF

Adult Contents Filtering using Voice Information and DTW (음성 정보와 DTW 알고리즘을 활용한 성인 컨텐츠 필터링)

  • Cho, Jung-Ik;Lee, Yill-Byung
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2008.04a
    • /
    • pp.432-434
    • /
    • 2008
  • This paper deals with the DTW algorithm for the filtering contents, in order to improve the filtering performance rate. Contents filtering is the technology that confirm the identification of contents by using the feature of voice. Such technique is classified into general contents and adults contents. This proposed method extracts the information of voice contribute to improvement of filtering contents. In other words, We proposed filtering identification rate can be improved by using DTW algorithm. As a result, the proposed method is utilized improvement of filtering contents. Finally, we provide contents examples to test the accuracy of the proposed feature. Consequently, We know that the difference of characteristic between general contents and adults contents. In the future, We utilize this to improve filtering performance rate.

  • PDF

Document Classification of Small Size Documents Using Extended Relief-F Algorithm (확장된 Relief-F 알고리즘을 이용한 소규모 크기 문서의 자동분류)

  • Park, Heum
    • The KIPS Transactions:PartB
    • /
    • v.16B no.3
    • /
    • pp.233-238
    • /
    • 2009
  • This paper presents an approach to the classifications of small size document using the instance-based feature filtering Relief-F algorithm. In the document classifications, we have not always good classification performances of small size document included a few features. Because total number of feature in the document set is large, but feature count of each document is very small relatively, so the similarities between documents are very low when we use general assessment of similarity and classifiers. Specially, in the cases of the classification of web document in the directory service and the classification of the sectors that cannot connect with the original file after recovery hard-disk, we have not good classification performances. Thus, we propose the Extended Relief-F(ERelief-F) algorithm using instance-based feature filtering algorithm Relief-F to solve problems of Relief-F as preprocess of classification. For the performance comparison, we tested information gain, odds ratio and Relief-F for feature filtering and getting those feature values, and used kNN and SVM classifiers. In the experimental results, the Extended Relief-F(ERelief-F) algorithm, compared with the others, performed best for all of the datasets and reduced many irrelevant features from document sets.

Removing Non-informative Features by Robust Feature Wrapping Method for Microarray Gene Expression Data (유전자 알고리즘과 Feature Wrapping을 통한 마이크로어레이 데이타 중복 특징 소거법)

  • Lee, Jae-Sung;Kim, Dae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.8
    • /
    • pp.463-478
    • /
    • 2008
  • Due to the high dimensional problem, typically machine learning algorithms have relied on feature selection techniques in order to perform effective classification in microarray gene expression datasets. However, the large number of features compared to the number of samples makes the task of feature selection computationally inprohibitive and prone to errors. One of traditional feature selection approach was feature filtering; measuring one gene per one step. Then feature filtering was an univariate approach that cannot validate multivariate correlations. In this paper, we proposed a function for measuring both class separability and correlations. With this approach, we solved the problem related to feature filtering approach.

A Feature Vector Generation Technique through Gradient Correction of an Outline in the Mouth Region (입 영역에서 외곽선의 기울기 보정을 통한 특징벡터 생성 기법)

  • Park, Jung Hwan;Jung, Jong Jin;Kim, Guk Boh
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.10
    • /
    • pp.1141-1149
    • /
    • 2014
  • Recently, various methods to effectively eliminate the noise are researched in image processing techniques. However, the conventional noise filtering techniques, which remove most of the noise, are less efficient for remained noise detection after filtering due to exploiting no face feature information. In this paper, we proposed a feature vector generation technique in the mouth region by distinguishing and revising the remained noise through gradient correction, when the outline is extracted after performing noise filtering.

Forensic Decision of Median Filtering by Pixel Value's Gradients of Digital Image (디지털 영상의 픽셀값 경사도에 의한 미디언 필터링 포렌식 판정)

  • RHEE, Kang Hyeon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.6
    • /
    • pp.79-84
    • /
    • 2015
  • In a distribution of digital image, there is a serious problem that is a distribution of the altered image by a forger. For the problem solution, this paper proposes a median filtering (MF) image forensic decision algorithm using a feature vector according to the pixel value's gradients. In the proposed algorithm, AR (Autoregressive) coefficients are computed from pixel value' gradients of original image then 1th~6th order coefficients to be six feature vector. And the reconstructed image is produced by the solution of Poisson's equation with the gradients. From the difference image between original and its reconstructed image, four feature vector (Average value, Max. value and the coordinate i,j of Max. value) is extracted. Subsequently, Two kinds of the feature vector combined to 10 Dim. feature vector that is used in the learning of a SVM (Support Vector Machine) classification for MF (Median Filtering) detector of the altered image. On the proposed algorithm of the median filtering detection, compare to MFR (Median Filter Residual) scheme that had the same 10 Dim. feature vectors, the performance is excellent at Unaltered, Averaging filtering ($3{\times}3$) and JPEG (QF=90) images, and less at Gaussian filtering ($3{\times}3$) image. However, in the measured performances of all items, AUC (Area Under Curve) by the sensitivity and 1-specificity is approached to 1. Thus, it is confirmed that the grade evaluation of the proposed algorithm is 'Excellent (A)'.