• Title/Summary/Keyword: cepstral

Search Result 293, Processing Time 0.024 seconds

Performance Improvement of SPLICE-based Noise Compensation for Robust Speech Recognition (강인한 음성인식을 위한 SPLICE 기반 잡음 보상의 성능향상)

  • Kim, Hyung-Soon;Kim, Doo-Hee
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.263-277
    • /
    • 2003
  • One of major problems in speech recognition is performance degradation due to the mismatch between the training and test environments. Recently, Stereo-based Piecewise LInear Compensation for Environments (SPLICE), which is frame-based bias removal algorithm for cepstral enhancement using stereo training data and noisy speech model as a mixture of Gaussians, was proposed and showed good performance in noisy environments. In this paper, we propose several methods to improve the conventional SPLICE. First we apply Cepstral Mean Subtraction (CMS) as a preprocessor to SPLICE, instead of applying it as a postprocessor. Secondly, to compensate residual distortion after SPLICE processing, two-stage SPLICE is proposed. Thirdly we employ phonetic information for training SPLICE model. According to experiments on the Aurora 2 database, proposed method outperformed the conventional SPLICE and we achieved a 50% decrease in word error rate over the Aurora baseline system.

  • PDF

Speech/Music Discrimination Using Multi-dimensional MMCD (다차원 MMCD를 이용한 음성/음악 판별)

  • Choi, Mu-Yeol;Song, Hwa-Jeon;Park, Seul-Han;Kim, Hyung-Soon
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.142-145
    • /
    • 2006
  • Discrimination between speech and music is important in many multimedia applications. Previously we proposed a new parameter for speech/music discrimination, the mean of minimum cepstral distances (MMCD), and it outperformed the conventional parameters. One weakness of it is that its performance depends on range of candidate frames to compute the minimum cepstral distance, which requires the optimal selection of the range experimentally. In this paper, to alleviate the problem, we propose a multi-dimensional MMCD parameter which consists of multiple MMCDs with different ranges of candidate frames. Experimental results show that the multi-dimensional MMCD parameter yields an error rate reduction of 22.5% compared with the optimally chosen one-dimensional MMCD parameter.

  • PDF

A Study on Comfortableness Evaluation Technique of Chairs using Electroencephalogram (뇌파를 이용한 의자의 쾌적성 평가 기술에 관한 연구)

  • 김동준
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.52 no.12
    • /
    • pp.702-707
    • /
    • 2003
  • This study describes a new technique for human sensibility evaluation using electroencephalogram(EEG). Production of EEG is assumed to be linear. The linear predictor coefficients and the linear cepstral coefficients of EEG are used as the feature parameters of sensibility and pattern classification performances of them are compared. Using the better parameter, a human sensibility evaluation algorithm is designed. The obtained results are as follows. The linear predictor coefficients showed the better performance in pattern classification than the linear cepstral coefficients. Then, using the linear predictor coefficients as the feature parameter, a human sensibility evaluation algorithm is developed at the base of a multi-layer neural network. This algorithm showed 90% of accuracy in comfortableness evaluation in spite of fluctuations in statistics of EEG signal.

A Study on the Pattern Recognition of EMG Signals for Head Motion Recognition (머리 움직임 인식을 위한 근전도 신호의 패턴 인식 기법에 관한 연구)

  • 이태우;전창익;이영석;유세근;김성환
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.2
    • /
    • pp.103-110
    • /
    • 2004
  • This paper proposes a new method on the EMG AR(autoregressive) modeling in pattern recognition for various head motions. The proper electrode placement in applying AR or cepstral coefficients for EMG signature discrimination is investigated. EMG signals are measured for different 10 motions with two electrode arrangements simultaneously. Electrode pairs are located separately on dominant muscles(S-type arrangement), because the bandwidth of signals obtained from S-type placement is wider than that from C-type(closely in the region between muscles). From the result of EMG pattern recognition test, the proposed mIAR(modified integrated mean autoregressive model) technique improves the recognitions rate around 17-21% compared with other the AR and cepstral methods.

Robust Speech Recognition Using Real-Time Higher Order Statistics Normalization (고차통계 정규화를 이용한 강인한 음성인식)

  • Jeong, Ju-Hyun;Song, Hwa-Jeon;Kim, Hyung-Soon
    • MALSORI
    • /
    • no.54
    • /
    • pp.63-72
    • /
    • 2005
  • The performance of speech recognition system is degraded by the mismatch between training and test environments. Many studies have been presented to compensate for noise components in the cepstral domain. Recently, higher order cepstral moment normalization method has been introduced to improve recognition accuracy. In this paper, we present real-time high order moment normalization method with post-processing smoothing filter to reduce the parameter estimation error in higher order moment computation. In experiments using Aurora2 database, we obtained error rate reduction of 44.7% with proposed algorithm in comparison with baseline system.

  • PDF

Discriminative Weight Training for Gender Identification (변별적 가중치 학습을 적용한 성별인식 알고리즘)

  • Kang, Sang-Ick;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.5
    • /
    • pp.252-255
    • /
    • 2008
  • In this paper, we apply a discriminative weight training to a support vector machine (SVM) based gender identification. In our approach, the gender decision rule is expressed as the SVM of optimally weighted mel-frequency cepstral coefficients (MFCC) based on a minimum classification error (MCE) method which is different from the previous works in that different weights are assigned to each MFCC filter bank which is considered more realistic. According to the experimental results, the proposed approach is found to be effective for gender identification using SVM.

Speaker Independent Recognition Algorithm based on Parameter Extraction by MFCC applied Wiener Filter Method (위너필터법이 적용된 MFCC의 파라미터 추출에 기초한 화자독립 인식알고리즘)

  • Choi, Jae-Seung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.6
    • /
    • pp.1149-1154
    • /
    • 2017
  • To obtain good recognition performance of speech recognition system under background noise, it is very important to select appropriate feature parameters of speech. The feature parameter used in this paper is Mel frequency cepstral coefficient (MFCC) with the human auditory characteristics applied to Wiener filter method. That is, the feature parameter proposed in this paper is a new method to extract the parameter of clean speech signal after removing background noise. The proposed method implements the speaker recognition by inputting the proposed modified MFCC feature parameter into a multi-layer perceptron network. In this experiments, the speaker independent recognition experiments were performed using the MFCC feature parameter of the 14th order. The average recognition rates of the speaker independent in the case of the noisy speech added white noise are 94.48%, which is an effective result. Comparing the proposed method with the existing methods, the performance of the proposed speaker recognition is improved by using the modified MFCC feature parameter.

Authentication Performance Optimization for Smart-phone based Multimodal Biometrics (스마트폰 환경의 인증 성능 최적화를 위한 다중 생체인식 융합 기법 연구)

  • Moon, Hyeon-Joon;Lee, Min-Hyung;Jeong, Kang-Hun
    • Journal of Digital Convergence
    • /
    • v.13 no.6
    • /
    • pp.151-156
    • /
    • 2015
  • In this paper, we have proposed personal multimodal biometric authentication system based on face detection, recognition and speaker verification for smart-phone environment. Proposed system detect the face with Modified Census Transform algorithm then find the eye position in the face by using gabor filter and k-means algorithm. Perform preprocessing on the detected face and eye position, then we recognize with Linear Discriminant Analysis algorithm. Afterward in speaker verification process, we extract the feature from the end point of the speech data and Mel Frequency Cepstral Coefficient. We verified the speaker through Dynamic Time Warping algorithm because the speech feature changes in real-time. The proposed multimodal biometric system is to fuse the face and speech feature (to optimize the internal operation by integer representation) for smart-phone based real-time face detection, recognition and speaker verification. As mentioned the multimodal biometric system could form the reliable system by estimating the reasonable performance.

Usefulness of Cepstral Peak Prominence (CPP) in Unilateral Vocal Fold Paralysis Dysphonia Evaluation (일측성 성대마비 환자 평가에서 Cepstral Peak Prominence의 유용성)

  • Lee, Chang-Yoon;Jeong, Hee Seok;Son, Hee Young
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.28 no.2
    • /
    • pp.84-88
    • /
    • 2017
  • Background and Objectives : The purpose of this study was to compare the usefulness of Cepstral peak prominence (CPP) with parameter of Multiple Dimensional Voice Program (MDVP) in evaluating unilateral vocal fold paraylsis patients with subjective voice impairment. Materials and Methods : From July 2014 to August 2016, 37 patients with unilateral vocal fold paralysis who had been diagnosed with unilateral vocal fold paralysis and had received two or more voice tests before and after the diagnosis were evaluated for maximum phonation time (MPT), MDVP and CPP. Respectively. Voice tests were performed with short vowel /a/ and paragraph reading. Results : The CPP-a (CPP with vowel /a/) and CPP-s (CPP with paragraph reading) of the Cepstrum were statistically negatively correlated with G, R, B, and A before the voice therapy. Jitter, Shimmer, and NHR of MDVP were positively correlated with G, R, B. Jitter, Shimmer, and NHR of the MDVP were significantly correlated with the Cepstrum index. G, B, A and CPP-a and CPP-s showed a statistically significant negative correlation and a somewhat higher correlation coefficient between 0.5 and 0.78. On the other hand, in MDVP index, there was a positive correlation with G and B only with Jitter of 0.4. Conclusion : CPP can be an important evaluation tool in the evaluation of speech in the unilateral vocal cord paralysis when speech energy changes or the cycle is not constant during speech.

  • PDF