• Title/Summary/Keyword: Speech/Music Classification

Search Result 28, Processing Time 0.019 seconds

Adaptive Kernel Function of SVM for Improving Speech/Music Classification of 3GPP2 SMV

  • Lim, Chung-Soo;Chang, Joon-Hyuk
    • ETRI Journal
    • /
    • v.33 no.6
    • /
    • pp.871-879
    • /
    • 2011
  • Because a wide variety of multimedia services are provided through personal wireless communication devices, the demand for efficient bandwidth utilization becomes stronger. This demand naturally results in the introduction of the variable bitrate speech coding concept. One exemplary work is the selectable mode vocoder (SMV) that supports speech/music classification. However, because it has severe limitations in its classification performance, a couple of works to improve speech/music classification by introducing support vector machines (SVMs) have been proposed. While these approaches significantly improved classification accuracy, they did not consider correlations commonly found in speech and music frames. In this paper, we propose a novel and orthogonal approach to improve the speech/music classification of SMV codec by adaptively tuning SVMs based on interframe correlations. According to the experimental results, the proposed algorithm yields improved results in classifying speech and music within the SMV framework.

Speech/Music Classification Based on the Higher-Order Moments of Subband Energy

  • Seo, Jiin Soo
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.7
    • /
    • pp.737-744
    • /
    • 2018
  • This paper presents a study on the performance of the higher-order moments for speech/music classification. For a successful speech/music classifier, extracting features that allow direct access to the relevant speech or music specific information is crucial. In addition to the conventional variance-based features, we utilize the higher-order moments of features, such as skewness and kurtosis. Moreover, we investigate the subband decomposition parameters in extracting features, which improves classification accuracy. Experiments on two speech/music datasets, which are publicly available, were performed and show that the higher-order moment features can improve classification accuracy when combined with the conventional variance-based features.

Improvement of Speech/Music Classification Based on RNN in EVS Codec for Hearing Aids (EVS 코덱에서 보청기를 위한 RNN 기반의 음성/음악 분류 성능 향상)

  • Kang, Sang-Ick;Lee, Sang Min
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.11 no.2
    • /
    • pp.143-146
    • /
    • 2017
  • In this paper, a novel approach is proposed to improve the performance of speech/music classification using the recurrent neural network (RNN) in the enhanced voice services (EVS) of 3GPP for hearing aids. Feature vectors applied to the RNN are selected from the relevant parameters of the EVS for efficient speech/music classification. The performance of the proposed algorithm is evaluated under various conditions and large speech/music data. The proposed algorithm yields better results compared with the conventional scheme implemented in the EVS.

Speech/Music Signal Classification Based on Spectrum Flux and MFCC For Audio Coder (오디오 부호화기를 위한 스펙트럼 변화 및 MFCC 기반 음성/음악 신호 분류)

  • Sangkil Lee;In-Sung Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.5
    • /
    • pp.239-246
    • /
    • 2023
  • In this paper, we propose an open-loop algorithm to classify speech and music signals using the spectral flux parameters and Mel Frequency Cepstral Coefficients(MFCC) parameters for the audio coder. To increase responsiveness, the MFCC was used as a short-term feature parameter and spectral fluxes were used as a long-term feature parameters to improve accuracy. The overall voice/music signal classification decision is made by combining the short-term classification method and the long-term classification method. The Gaussian Mixed Model (GMM) was used for pattern recognition and the optimal GMM parameters were extracted using the Expectation Maximization (EM) algorithm. The proposed long-term and short-term combined speech/music signal classification method showed an average classification error rate of 1.5% on various audio sound sources, and improved the classification error rate by 0.9% compared to the short-term single classification method and 0.6% compared to the long-term single classification method. The proposed speech/music signal classification method was able to improve the classification error rate performance by 9.1% in percussion music signals with attacks and 5.8% in voice signals compared to the Unified Speech Audio Coding (USAC) audio classification method.

Analysis and Implementation of Speech/Music Classification for 3GPP2 SMV Codec Employing SVM Based on Discriminative Weight Training (SMV코덱의 음성/음악 분류 성능 향상을 위한 최적화된 가중치를 적용한 입력벡터 기반의 SVM 구현)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk;Cho, Ki-Ho;Kim, Nam-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.471-476
    • /
    • 2009
  • In this paper, we apply a discriminative weight training to a support vector machine (SVM) based speech/music classification for the selectable mode vocoder (SMV) of 3GPP2. In our approach, the speech/music decision rule is expressed as the SVM discriminant function by incorporating optimally weighted features of the SMV based on a minimum classification error (MCE) method which is different from the previous work in that different weights are assigned to each the feature of SMV. The performance of the proposed approach is evaluated under various conditions and yields better results compared with the conventional scheme in the SVM.

Efficient Implementation of SVM-Based Speech/Music Classification on Embedded Systems (SVM 기반 음성/음악 분류기의 효율적인 임베디드 시스템 구현)

  • Lim, Chung-Soo;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.8
    • /
    • pp.461-467
    • /
    • 2011
  • Accurate classification of input signals is the key prerequisite for variable bit-rate coding, which has been introduced in order to effectively utilize limited communication bandwidth. Especially, recent surge of multimedia services elevate the importance of speech/music classification. Among many speech/music classifier, the ones based on support vector machine (SVM) have a strong selling point, high classification accuracy, but their computational complexity and memory requirement hinder their way into actual implementations. Therefore, techniques that reduce the computational complexity and the memory requirement is inevitable, particularly for embedded systems. We first analyze implementation of an SVM-based classifier on embedded systems in terms of execution time and energy consumption, and then propose two techniques that alleviate the implementation requirements: One is a technique that removes support vectors that have insignificant contribution to the final classification, and the other is to skip processing some of input signals by virtue of strong correlations in speech/music frames. These are post-processing techniques that can work with any other optimization techniques applied during the training phase of SVM. With experiments, we validate the proposed algorithms from the perspectives of classification accuracy, execution time, and energy consumption.

Enhancement of Speech/Music Classification for 3GPP2 SMV Codec Employing Discriminative Weight Training (변별적 가중치 학습을 이용한 3GPP2 SVM의 실시간 음성/음악 분류 성능 향상)

  • Kang, Sang-Ick;Chang, Joon-Hyuk;Lee, Seong-Ro
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.6
    • /
    • pp.319-324
    • /
    • 2008
  • In this paper, we propose a novel approach to improve the performance of speech/music classification for the selectable mode vocoder (SMV) of 3GPP2 using the discriminative weight training which is based on the minimum classification error (MCE) algorithm. We first present an effective analysis of the features and the classification method adopted in the conventional SMV. And then proposed the speech/music decision rule is expressed as the geometric mean of optimally weighted features which are selected from the SMV. The performance of the proposed algorithm is evaluated under various conditions and yields better results compared with the conventional scheme of the SMV.

Improving SVM with Second-Order Conditional MAP for Speech/Music Classification (음성/음악 분류 향상을 위한 2차 조건 사후 최대 확률기법 기반 SVM)

  • Lim, Chung-Soo;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.5
    • /
    • pp.102-108
    • /
    • 2011
  • Support vector machines are well known for their outstanding performance in pattern recognition fields. One example of their applications is music/speech classification for a standardized codec such as 3GPP2 selectable mode vocoder. In this paper, we propose a novel scheme that improves the speech/music classification of support vector machines based on the second-order conditional maximum a priori. While conventional support vector machine optimization techniques apply during training phase, the proposed technique can be adopted in classification phase. In this regard, the proposed approach can be developed and employed in parallel with conventional optimizations, resulting in synergistic boost in classification performance. According to experimental results, the proposed algorithm shows its compatibility and potential for improving the performance of support vector machines.

Fine-tuning SVM for Enhancing Speech/Music Classification (SVM의 미세조정을 통한 음성/음악 분류 성능향상)

  • Lim, Chung-Soo;Song, Ji-Hyun;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.2
    • /
    • pp.141-148
    • /
    • 2011
  • Support vector machines have been extensively studied and utilized in pattern recognition area for years. One of interesting applications of this technique is music/speech classification for a standardized codec such as 3GPP2 selectable mode vocoder. In this paper, we propose a novel approach that improves the speech/music classification of support vector machines. While conventional support vector machine optimization techniques apply during training phase, the proposed technique can be adopted in classification phase. In this regard, the proposed approach can be developed and employed in parallel with conventional optimizations, resulting in synergistic boost in classification performance. We first analyze the impact of kernel width parameter on the classifications made by support vector machines. From this analysis, we observe that we can fine-tune outputs of support vector machines with the kernel width parameter. To make the most of this capability, we identify strong correlation among neighboring input frames, and use this correlation information as a guide to adjusting kernel width parameter. According to the experimental results, the proposed algorithm is found to have potential for improving the performance of support vector machines.

Analysis and Implementation of Speech/Music Classification for 3GPP2 SMV Codec Based on Support Vector Machine (SMV코덱의 음성/음악 분류 성능 향상을 위한 Support Vector Machine의 적용)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.6
    • /
    • pp.142-147
    • /
    • 2008
  • In this paper, we propose a novel a roach to improve the performance of speech/music classification for the selectable mode vocoder (SMV) of 3GPP2 using the support vector machine (SVM). The SVM makes it possible to build on an optimal hyperplane that is separated without the error where the distance between the closest vectors and the hyperplane is maximal. We first present an effective analysis of the features and the classification method adopted in the conventional SMV. And then feature vectors which are a lied to the SVM are selected from relevant parameters of the SMV for the efficient speech/music classification. The performance of the proposed algorithm is evaluated under various conditions and yields better results compared with the conventional scheme of the SMV.