• Title/Summary/Keyword: cepstral

Search Result 293, Processing Time 0.023 seconds

Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition (화자인식을 위한 주파수 워핑 기반 특징 및 주파수-시간 특징 평가)

  • Choi, Young Ho;Ban, Sung Min;Kim, Kyung-Wha;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.3-10
    • /
    • 2015
  • In this paper, different frequency scales in cepstral feature extraction are evaluated for the text-independent speaker recognition. To this end, mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), and bilinear warped frequency cepstral coefficients (BWFCCs) are applied to the speaker recognition experiment. In addition, the spectro-temporal features extracted by the cepstral-time matrix (CTM) are examined as an alternative to the delta and delta-delta features. Experiments on the NIST speaker recognition evaluation (SRE) 2004 task are carried out using the Gaussian mixture model-universal background model (GMM-UBM) method and the joint factor analysis (JFA) method, both based on the ALIZE 3.0 toolkit. Experimental results using both the methods show that BWFCC with appropriate warping factor yields better performance than MFCC and LFCC. It is also shown that the feature set including the spectro-temporal information based on the CTM outperforms the conventional feature set including the delta and delta-delta features.

Spectral and Cepstral Analyses of Esophageal Speakers (식도발성화자 음성의 spectral & cepstral 분석)

  • Shim, Hee-Jeong;Jang, Hyo-Ryung;Shin, Hee-Baek;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.47-54
    • /
    • 2014
  • The purpose of this study was to analyze spectral versus cepstral measurements in esophageal speakers. The comparison between the measurements in thirteen male esophageal speakers was compared with the control group of thirteen normal speakers using the sustained vowel /a/. The main results can be summarized as below: (a) the CPP and L/H ratio of the esophageal group were significantly lower than those of the control group (b) the CPP was significantly correlated with the spectral parameters such as jitter, shimmer, NHR and VTI, and (c) the ROC analysis showed that the threshold of 10.25dB for the CPP achieved a good classification for esophageal speakers, with 100% perfect sensitivity and specificity. Thus, it was known that cepstral-based acoustic measures such as CPP, may be more reliable predictors than other spectral-based acoustic measures such as jitter and shimmer. And it was found that cepstral-based acoustic measures were effective in distinguishing esophageal voice quality from normal voice quality. This research will contribute to establishing a baseline related to speech characteristics in voice rehabilitation with laryngectomees.

Performance assessments of feature vectors and classification algorithms for amphibian sound classification (양서류 울음 소리 식별을 위한 특징 벡터 및 인식 알고리즘 성능 분석)

  • Park, Sangwook;Ko, Kyungdeuk;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.6
    • /
    • pp.401-406
    • /
    • 2017
  • This paper presents the performance assessment of several key algorithms conducted for amphibian species sound classification. Firstly, 9 target species including endangered species are defined and a database of their sounds is built. For performance assessment, three feature vectors such as MFCC (Mel Frequency Cepstral Coefficient), RCGCC (Robust Compressive Gammachirp filterbank Cepstral Coefficient), and SPCC (Subspace Projection Cepstral Coefficient), and three classifiers such as GMM(Gaussian Mixture Model), SVM(Support Vector Machine), DBN-DNN(Deep Belief Network - Deep Neural Network) are considered. In addition, i-vector based classification system which is widely used for speaker recognition, is used to assess for this task. Experimental results indicate that, SPCC-SVM achieved the best performance with 98.81 % while other methods also attained good performance with above 90 %.

Performance Improvements for Silence Feature Normalization Method by Using Filter Bank Energy Subtraction (필터 뱅크 에너지 차감을 이용한 묵음 특징 정규화 방법의 성능 향상)

  • Shen, Guanghu;Choi, Sook-Nam;Chung, Hyun-Yeol
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.7C
    • /
    • pp.604-610
    • /
    • 2010
  • In this paper we proposed FSFN (Filter bank sub-band energy subtraction based CLSFN) method to improve the recognition performance of the existing CLSFN (Cepstral distance and Log-energy based Silence Feature Normalization). The proposed FSFN reduces the energy of noise components in filter bank sub-band domain when extracting the features from speech data. This leads to extract the enhanced cepstral features and thus improves the accuracy of speech/silence classification using the enhanced cepstral features. Therefore, it can be expected to get improved performance comparing with the existing CLSFN. Experimental results conducted on Aurora 2.0 DB showed that our proposed FSFN method improves the averaged word accuracy of 2% comparing with the conventional CLSFN method, and FSFN combined with CMVN (Cepstral Mean and Variance Normalization) also showed the best recognition performance comparing with others.

Robust Speech Parameters for the Emotional Speech Recognition (감정 음성 인식을 위한 강인한 음성 파라메터)

  • Lee, Guehyun;Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.6
    • /
    • pp.681-686
    • /
    • 2012
  • This paper studied the speech parameters less affected by the human emotion for the development of the robust emotional speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient, root-cepstral coefficient, PLP coefficient and frequency warped mel-cepstral coefficient in the vocal tract length normalization method were used as feature parameters. And CMS (Cepstral Mean Subtraction) and SBR(Signal Bias Removal) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using frequency warped RASTA mel-cepstral coefficient in the vocal tract length normalized method, its derivatives and CMS as a signal bias removal showed the best performance.

Optimally Weighted Cepstral Distance Measure for Speech Recognition (음성 인식을 위한 최적 가중 켑스트랄 거리 측정 방법)

  • 김원구
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06c
    • /
    • pp.133-137
    • /
    • 1994
  • In this paper, a method for designing an optimal weight function for the weighted cepstral distance measure is proposed. A conventional weight function or cepstral lifter is obtained eperimentally depending on the spectral components to be emphasized. The proposed method minimizes the error between word reference patterns and the traning data. To compare the proposed optimal weight function with conventional function, speech recognition systems based on Dpynamic Time Warping and Hidden Markov Models were constructed to conduct speaker independent isolated word necogination eperiment. Results show that the proposed method gives better performance than conventional weight functions.

  • PDF

Detection of Laryngeal Pathology in Speech Using Multilayer Perceptron Neural Networks (다층 퍼셉트론 신경회로망을 이용한 후두 질환 음성 식별)

  • Kang Hyun Min;Kim Yoo Shin;Kim Hyung Soon
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.115-118
    • /
    • 2002
  • Neural networks have been known to have great discriminative power in pattern classification problems. In this paper, the multilayer perceptron neural networks are employed to automatically detect laryngeal pathology in speech. Also new feature parameters are introduced which can reflect the periodicity of speech and its perturbation. These parameters and cepstral coefficients are used as input of the multilayer perceptron neural networks. According to the experiment using Korean disordered speech database, incorporation of new parameters with cepstral coefficients outperforms the case with only cepstral coefficients.

  • PDF

Harmonic Structure Features for Robust Speaker Diarization

  • Zhou, Yu;Suo, Hongbin;Li, Junfeng;Yan, Yonghong
    • ETRI Journal
    • /
    • v.34 no.4
    • /
    • pp.583-590
    • /
    • 2012
  • In this paper, we present a new approach for speaker diarization. First, we use the prosodic information calculated on the original speech to resynthesize the new speech data utilizing the spectrum modeling technique. The resynthesized data is modeled with sinusoids based on pitch, vibration amplitude, and phase bias. Then, we use the resynthesized speech data to extract cepstral features and integrate them with the cepstral features from original speech for speaker diarization. At last, we show how the two streams of cepstral features can be combined to improve the robustness of speaker diarization. Experiments carried out on the standardized datasets (the US National Institute of Standards and Technology Rich Transcription 04-S multiple distant microphone conditions) show a significant improvement in diarization error rate compared to the system based on only the feature stream from original speech.

Zero-Crossing-Based Source Direction Estimation Using a Cepstral Prefiltering Technique (영교차점과 켑스트럼 전처리 기술을 이용한 반향환경에서의 음원방향 추정)

  • Park, Yong-Jin;Lee, Soo-Yeon;Park, Hyung-Min
    • MALSORI
    • /
    • no.67
    • /
    • pp.121-133
    • /
    • 2008
  • To estimate directions of multi-sound sources, we consider an approach based on zero crossings which provided more robust results to diffuse noise than the conventional cross-correlation-based method [6][7]. In reverberant environments, the performance of source direction estimation can be improved by using signal components through direct paths from sources to microphones. Since a cepstral prefiltering technique [8] removes the effect of reverberation, we propose a source direction estimation method which can find out intervals of the direct-path components by comparing original and cepstral-prefiltered envelopes. Simulations demonstrate that the proposed method can improve the performance of source direction estimation in reverberant environments.

  • PDF

Snorer-Dependent Snore Recognition Using LPC Cepstral Coefficients (LPC 켑스트럼 계수를 이용한 특정인의 코골이 인식)

  • 최호선;장원규;이경중
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.52 no.9
    • /
    • pp.554-559
    • /
    • 2003
  • In this paper the possibility of snorer-dependent snore recognition using cepstral coefficients was suggested. We assumed that snore and speech sounds have some similarities and we used cepstral coefficients which are widely used for speech recognition. Snoring data were acquired from 18 persons including 5 patients diagnosed as snore patient. To evaluate the performance of proposed method, the distance ratio based on LPC cepstral coefficients was selected as an index for snorer-dependent snore recognition. As a result, distance ratio of 3 was selected as optimal value showing the most efficient snorer-dependent snore recognition, which is high accuracy of 95.05% on average. In conclusion, the proposed method showed the possibilities to be applied in clinical applications for snorer-dependent snore recognition.