• Title/Summary/Keyword: dimension reduction method

Search Result 252, Processing Time 0.02 seconds

Audio Fingerprint Retrieval Method Based on Feature Dimension Reduction and Feature Combination

  • Zhang, Qiu-yu;Xu, Fu-jiu;Bai, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.2
    • /
    • pp.522-539
    • /
    • 2021
  • In order to solve the problems of the existing audio fingerprint method when extracting audio fingerprints from long speech segments, such as too large fingerprint dimension, poor robustness, and low retrieval accuracy and efficiency, a robust audio fingerprint retrieval method based on feature dimension reduction and feature combination is proposed. Firstly, the Mel-frequency cepstral coefficient (MFCC) and linear prediction cepstrum coefficient (LPCC) of the original speech are extracted respectively, and the MFCC feature matrix and LPCC feature matrix are combined. Secondly, the feature dimension reduction method based on information entropy is used for column dimension reduction, and the feature matrix after dimension reduction is used for row dimension reduction based on energy feature dimension reduction method. Finally, the audio fingerprint is constructed by using the feature combination matrix after dimension reduction. When speech's user retrieval, the normalized Hamming distance algorithm is used for matching retrieval. Experiment results show that the proposed method has smaller audio fingerprint dimension and better robustness for long speech segments, and has higher retrieval efficiency while maintaining a higher recall rate and precision rate.

Dimension Reduction Methods on High Dimensional Streaming Data with Concept Drift (개념 변동 고차원 스트리밍 데이터에 대한 차원 감소 방법)

  • Park, Cheong Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.8
    • /
    • pp.361-368
    • /
    • 2016
  • While dimension reduction methods on high dimensional data have been widely studied, research on dimension reduction methods for high dimensional streaming data with concept drift is limited. In this paper, we review incremental dimension reduction methods and propose a method to apply dimension reduction efficiently in order to improve classification performance on high dimensional streaming data with concept drift.

MBRDR: R-package for response dimension reduction in multivariate regression

  • Heesung Ahn;Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.179-189
    • /
    • 2024
  • In multivariate regression with a high-dimensional response Y ∈ ℝr and a relatively low-dimensional predictor X ∈ ℝp (where r ≥ 2), the statistical analysis of such data presents significant challenges due to the exponential increase in the number of parameters as the dimension of the response grows. Most existing dimension reduction techniques primarily focus on reducing the dimension of the predictors (X), not the dimension of the response variable (Y). Yoo and Cook (2008) introduced a response dimension reduction method that preserves information about the conditional mean E(Y | X). Building upon this foundational work, Yoo (2018) proposed two semi-parametric methods, principal response reduction (PRR) and principal fitted response reduction (PFRR), then expanded these methods to unstructured principal fitted response reduction (UPFRR) (Yoo, 2019). This paper reviews these four response dimension reduction methodologies mentioned above. In addition, it introduces the implementation of the mbrdr package in R. The mbrdr is a unique tool in the R community, as it is specifically designed for response dimension reduction, setting it apart from existing dimension reduction packages that focus solely on predictors.

Incremental Linear Discriminant Analysis for Streaming Data Using the Minimum Squared Error Solution (스트리밍 데이터에 대한 최소제곱오차해를 통한 점층적 선형 판별 분석 기법)

  • Lee, Gyeong-Hoon;Park, Cheong Hee
    • Journal of KIISE
    • /
    • v.45 no.1
    • /
    • pp.69-75
    • /
    • 2018
  • In the streaming data where data samples arrive sequentially in time, it is difficult to apply the dimension reduction method based on batch learning. Therefore an incremental dimension reduction method for the application to streaming data has been studied. In this paper, we propose an incremental linear discriminant analysis method using the least squared error solution. Instead of computing scatter matrices directly, the proposed method incrementally updates the projective direction for dimension reduction by using the information of a new incoming sample. The experimental results demonstrate that the proposed method is more efficient compared with previously proposed incremental dimension reduction methods.

A Classification Method Using Data Reduction

  • Uhm, Daiho;Jun, Sung-Hae;Lee, Seung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.1
    • /
    • pp.1-5
    • /
    • 2012
  • Data reduction has been used widely in data mining for convenient analysis. Principal component analysis (PCA) and factor analysis (FA) methods are popular techniques. The PCA and FA reduce the number of variables to avoid the curse of dimensionality. The curse of dimensionality is to increase the computing time exponentially in proportion to the number of variables. So, many methods have been published for dimension reduction. Also, data augmentation is another approach to analyze data efficiently. Support vector machine (SVM) algorithm is a representative technique for dimension augmentation. The SVM maps original data to a feature space with high dimension to get the optimal decision plane. Both data reduction and augmentation have been used to solve diverse problems in data analysis. In this paper, we compare the strengths and weaknesses of dimension reduction and augmentation for classification and propose a classification method using data reduction for classification. We will carry out experiments for comparative studies to verify the performance of this research.

Method-Free Permutation Predictor Hypothesis Tests in Sufficient Dimension Reduction

  • Lee, Kyungjin;Oh, Suji;Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.4
    • /
    • pp.291-300
    • /
    • 2013
  • In this paper, we propose method-free permutation predictor hypothesis tests in the context of sufficient dimension reduction. Different from an existing method-free bootstrap approach, predictor hypotheses are evaluated based on p-values; therefore, usual statistical practitioners should have a potential preference. Numerical studies validate the developed theories, and real data application is provided.

Tutorial: Methodologies for sufficient dimension reduction in regression

  • Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.2
    • /
    • pp.105-117
    • /
    • 2016
  • In the paper, as a sequence of the first tutorial, we discuss sufficient dimension reduction methodologies used to estimate central subspace (sliced inverse regression, sliced average variance estimation), central mean subspace (ordinary least square, principal Hessian direction, iterative Hessian transformation), and central $k^{th}$-moment subspace (covariance method). Large-sample tests to determine the structural dimensions of the three target subspaces are well derived in most of the methodologies; however, a permutation test (which does not require large-sample distributions) is introduced. The test can be applied to the methodologies discussed in the paper. Theoretical relationships among the sufficient dimension reduction methodologies are also investigated and real data analysis is presented for illustration purposes. A seeded dimension reduction approach is then introduced for the methodologies to apply to large p small n regressions.

Comparative Study of Dimension Reduction Methods for Highly Imbalanced Overlapping Churn Data

  • Lee, Sujee;Koo, Bonhyo;Jung, Kyu-Hwan
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.4
    • /
    • pp.454-462
    • /
    • 2014
  • Retention of possible churning customer is one of the most important issues in customer relationship management, so companies try to predict churn customers using their large-scale high-dimensional data. This study focuses on dealing with large data sets by reducing the dimensionality. By using six different dimension reduction methods-Principal Component Analysis (PCA), factor analysis (FA), locally linear embedding (LLE), local tangent space alignment (LTSA), locally preserving projections (LPP), and deep auto-encoder-our experiments apply each dimension reduction method to the training data, build a classification model using the mapped data and then measure the performance using hit rate to compare the dimension reduction methods. In the result, PCA shows good performance despite its simplicity, and the deep auto-encoder gives the best overall performance. These results can be explained by the characteristics of the churn prediction data that is highly correlated and overlapped over the classes. We also proposed a simple out-of-sample extension method for the nonlinear dimension reduction methods, LLE and LTSA, utilizing the characteristic of the data.

Note on response dimension reduction for multivariate regression

  • Yoo, Jae Keun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.5
    • /
    • pp.519-526
    • /
    • 2019
  • Response dimension reduction in a sufficient dimension reduction (SDR) context has been widely ignored until Yoo and Cook (Computational Statistics and Data Analysis, 53, 334-343, 2008) founded theories for it and developed an estimation approach. Recent research in SDR shows that a semi-parametric approach can outperform conventional non-parametric SDR methods. Yoo (Statistics: A Journal of Theoretical and Applied Statistics, 52, 409-425, 2018) developed a semi-parametric approach for response reduction in Yoo and Cook (2008) context, and Yoo (Journal of the Korean Statistical Society, 2019) completes the semi-parametric approach by proposing an unstructured method. This paper theoretically discusses and provides insightful remarks on three versions of semi-parametric approaches that can be useful for statistical practitioners. It is also possible to avoid numerical instability by presenting the results for an orthogonal transformation of the response variables.

DR-LSTM: Dimension reduction based deep learning approach to predict stock price

  • Ah-ram Lee;Jae Youn Ahn;Ji Eun Choi;Kyongwon Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.213-234
    • /
    • 2024
  • In recent decades, increasing research attention has been directed toward predicting the price of stocks in financial markets using deep learning methods. For instance, recurrent neural network (RNN) is known to be competitive for datasets with time-series data. Long short term memory (LSTM) further improves RNN by providing an alternative approach to the gradient loss problem. LSTM has its own advantage in predictive accuracy by retaining memory for a longer time. In this paper, we combine both supervised and unsupervised dimension reduction methods with LSTM to enhance the forecasting performance and refer to this as a dimension reduction based LSTM (DR-LSTM) approach. For a supervised dimension reduction method, we use methods such as sliced inverse regression (SIR), sparse SIR, and kernel SIR. Furthermore, principal component analysis (PCA), sparse PCA, and kernel PCA are used as unsupervised dimension reduction methods. Using datasets of real stock market index (S&P 500, STOXX Europe 600, and KOSPI), we present a comparative study on predictive accuracy between six DR-LSTM methods and time series modeling.