• Title, Summary, Keyword: Dimensionality Reduction

Search Result 171, Processing Time 0.033 seconds

Dimensionality reduction for pattern recognition based on difference of distribution among classes

  • Nishimura, Masaomi;Hiraoka, Kazuyuki;Mishima, Taketoshi
    • Proceedings of the IEEK Conference
    • /
    • /
    • pp.1670-1673
    • /
    • 2002
  • For pattern recognition on high-dimensional data, such as images, the dimensionality reduction as a preprocessing is effective. By dimensionality reduction, we can (1) reduce storage capacity or amount of calculation, and (2) avoid "the curse of dimensionality" and improve classification performance. Popular tools for dimensionality reduction are Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Independent Component Analysis (ICA) recently. Among them, only LDA takes the class labels into consideration. Nevertheless, it, has been reported that, the classification performance with ICA is better than that with LDA because LDA has restriction on the number of dimensions after reduction. To overcome this dilemma, we propose a new dimensionality reduction technique based on an information theoretic measure for difference of distribution. It takes the class labels into consideration and still it does not, have restriction on number of dimensions after reduction. Improvement of classification performance has been confirmed experimentally.

  • PDF

Data Visualization using Linear and Non-linear Dimensionality Reduction Methods

  • Kim, Junsuk;Youn, Joosang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.12
    • /
    • pp.21-26
    • /
    • 2018
  • As the large amount of data can be efficiently stored, the methods extracting meaningful features from big data has become important. Especially, the techniques of converting high- to low-dimensional data are crucial for the 'Data visualization'. In this study, principal component analysis (PCA; linear dimensionality reduction technique) and Isomap (non-linear dimensionality reduction technique) are introduced and applied to neural big data obtained by the functional magnetic resonance imaging (fMRI). First, we investigate how much the physical properties of stimuli are maintained after the dimensionality reduction processes. We moreover compared the amount of residual variance to quantitatively compare the amount of information that was not explained. As result, the dimensionality reduction using Isomap contains more information than the principal component analysis. Our results demonstrate that it is necessary to consider not only linear but also nonlinear characteristics in the big data analysis.

A Novel Speech/Music Discrimination Using Feature Dimensionality Reduction

  • Keum, Ji-Soo;Lee, Hyon-Soo;Hagiwara, Masafumi
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.10 no.1
    • /
    • pp.7-11
    • /
    • 2010
  • In this paper, we propose an improved speech/music discrimination method based on a feature combination and dimensionality reduction approach. To improve discrimination ability, we use a feature based on spectral duration analysis and employ the hierarchical dimensionality reduction (HDR) method to reduce the effect of correlated features. Through various kinds of experiments on speech and music, it is shown that the proposed method showed high discrimination results when compared with conventional methods.

Design of Gas Identification System with Hierarchically Identifiable Rule base using GAS and Rough Sets (유전알고리즘과 러프집합을 이용한 계층적 식별 규칙을 갖는 가스 식별 시스템의 설계)

  • Haibo, Zhao;Bang, Young-Keun;Lee, Chul-Heui
    • Journal of Industrial Technology
    • /
    • v.31 no.B
    • /
    • pp.37-43
    • /
    • 2011
  • In pattern analysis, dimensionality reduction and reasonable identification rule generation are very important parts. This paper performed effectively the dimensionality reduction by grouping the sensors of which the measured patterns are similar each other, where genetic algorithms were used for combination optimization. To identify the gas type, this paper constructed the hierarchically identifiable rule base with two frames by using rough set theory. The first frame is to accept measurement characteristics of each sensor and the other one is to reflect the identification patterns of each group. Thus, the proposed methods was able to accomplish effectively dimensionality reduction as well as accurate gas identification. In simulation, this paper demonstrated the effectiveness of the proposed methods by identifying five types of gases.

  • PDF

Dimensionality Reduction of RNA-Seq Data

  • Al-Turaiki, Isra
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.31-36
    • /
    • 2021
  • RNA sequencing (RNA-Seq) is a technology that facilitates transcriptome analysis using next-generation sequencing (NSG) tools. Information on the quantity and sequences of RNA is vital to relate our genomes to functional protein expression. RNA-Seq data are characterized as being high-dimensional in that the number of variables (i.e., transcripts) far exceeds the number of observations (e.g., experiments). Given the wide range of dimensionality reduction techniques, it is not clear which is best for RNA-Seq data analysis. In this paper, we study the effect of three dimensionality reduction techniques to improve the classification of the RNA-Seq dataset. In particular, we use PCA, SVD, and SOM to obtain a reduced feature space. We built nine classification models for a cancer dataset and compared their performance. Our experimental results indicate that better classification performance is obtained with PCA and SOM. Overall, the combinations PCA+KNN, SOM+RF, and SOM+KNN produce preferred results.

Boosting Multifactor Dimensionality Reduction Using Pre-evaluation

  • Hong, Yingfu;Lee, Sangbum;Oh, Sejong
    • ETRI Journal
    • /
    • v.38 no.1
    • /
    • pp.206-215
    • /
    • 2016
  • The detection of gene-gene interactions during genetic studies of common human diseases is important, and the technique of multifactor dimensionality reduction (MDR) has been widely applied to this end. However, this technique is not free from the "curse of dimensionality" -that is, it works well for two- or three-way interactions but requires a long execution time and extensive computing resources to detect, for example, a 10-way interaction. Here, we propose a boosting method to reduce MDR execution time. With the use of pre-evaluation measurements, gene sets with low levels of interaction can be removed prior to the application of MDR. Thus, the problem space is decreased and considerable time can be saved in the execution of MDR.

Dimensionality Reduction in Speech Recognition by Principal Component Analysis (음성인식에서 주 성분 분석에 의한 차원 저감)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.9
    • /
    • pp.1299-1305
    • /
    • 2013
  • In this paper, we investigate a method of reducing the computational cost in speech recognition by dimensionality reduction of MFCC feature vectors. Eigendecomposition of the feature vectors renders linear transformation of the vectors in such a way that puts the vector components in order of variances. The first component has the largest variance and hence serves as the most important one in relevant pattern classification. Therefore, we might consider a method of reducing the computational cost and achieving no degradation of the recognition performance at the same time by dimensionality reduction through exclusion of the least-variance components. Experimental results show that the MFCC components might be reduced by about half without significant adverse effect on the recognition error rate.

An Effective Method for Dimensionality Reduction in High-Dimensional Space (고차원 공간에서 효과적인 차원 축소 기법)

  • Jeong Seung-Do;Kim Sang-Wook;Choi Byung-Uk
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.4
    • /
    • pp.88-102
    • /
    • 2006
  • In multimedia information retrieval, multimedia data are represented as vectors in high dimensional space. To search these vectors effectively, a variety of indexing methods have been proposed. However, the performance of these indexing methods degrades dramatically with increasing dimensionality, which is known as the dimensionality curse. To resolve the dimensionality curse, dimensionality reduction methods have been proposed. They map feature vectors in high dimensional space into the ones in low dimensional space before indexing the data. This paper proposes a method for dimensionality reduction based on a function approximating the Euclidean distance, which makes use of the norm and angle components of a vector. First, we identify the causes of the errors in angle estimation for approximating the Euclidean distance, and discuss basic directions to reduce those errors. Then, we propose a novel method for dimensionality reduction that composes a set of subvectors from a feature vector and maintains only the norm and the estimated angle for every subvector. The selection of a good reference vector is important for accurate estimation of the angle component. We present criteria for being a good reference vector, and propose a method that chooses a good reference vector by using Levenberg-Marquardt algorithm. Also, we define a novel distance function, and formally prove that the distance function lower-bounds the Euclidean distance. This implies that our approach does not incur any false dismissals in reducing the dimensionality effectively. Finally, we verify the superiority of the proposed method via performance evaluation with extensive experiments.

EFMDR-Fast: An Application of Empirical Fuzzy Multifactor Dimensionality Reduction for Fast Execution

  • Leem, Sangseob;Park, Taesung
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.37.1-37.3
    • /
    • 2018
  • Gene-gene interaction is a key factor for explaining missing heritability. Many methods have been proposed to identify gene-gene interactions. Multifactor dimensionality reduction (MDR) is a well-known method for the detection of gene-gene interactions by reduction from genotypes of single-nucleotide polymorphism combinations to a binary variable with a value of high risk or low risk. This method has been widely expanded to own a specific objective. Among those expansions, fuzzy-MDR uses the fuzzy set theory for the membership of high risk or low risk and increases the detection rates of gene-gene interactions. Fuzzy-MDR is expanded by a maximum likelihood estimator as a new membership function in empirical fuzzy MDR (EFMDR). However, EFMDR is relatively slow, because it is implemented by R script language. Therefore, in this study, we implemented EFMDR using RCPP ($c^{{+}{+}}$ package) for faster executions. Our implementation for faster EFMDR, called EMMDR-Fast, is about 800 times faster than EFMDR written by R script only.