• 제목/요약/키워드: cepstral

검색결과 293건 처리시간 0.027초

음성감정인식에서 음색 특성 및 영향 분석 (Analysis of Voice Quality Features and Their Contribution to Emotion Recognition)

  • 이정인;최정윤;강홍구
    • 방송공학회논문지
    • /
    • 제18권5호
    • /
    • pp.771-774
    • /
    • 2013
  • 본 연구는 감정상태와 음색특성의 관계를 확인하고, 추가로 cepstral 피쳐와 조합하여 감정인식을 진행하였다. Open quotient, harmonic-to-noise ratio, spectral tilt, spectral sharpness를 포함하는 특징들을 음색검출을 위해 적용하였고, 일반적으로 사용되는 피치와 에너지를 기반한 운율피쳐를 적용하였다. ANOVA분석을 통해 각 특징벡터의 유효성을 살펴보고, sequential forward selection 방법을 적용하여 최종 감정인식 성능을 분석하였다. 결과적으로, 제안된 피쳐들으로부터 성능이 향상되는 것을 확인하였고, 특히 화남과 기쁨에 대하여 에러가 줄어드는 것을 확인하였다. 또한 음색관련 피쳐들이 cepstral 피쳐와 결합할 경우 역시 인식 성능이 향상되었다.

성대마비로 인한 기식 음성에 대한 Cepstral 분석 (A Cepstral Analysis of Breathy Voice with Vocal Fold Paralysis)

  • 강영애;성철재
    • 말소리와 음성과학
    • /
    • 제4권2호
    • /
    • pp.89-94
    • /
    • 2012
  • The aim of this study is to investigate the usefulness of the parameter CPP (cepstral peak prominence) and LTAS (long term average spectrum) band energy for an analysis of breathy voice with vocal fold paralysis. Thirty-four female subjects who have vocal paralysis after thyroidectomy participated in this study. According to the perceptual judgements by three speech pathologists and one phonetic scholar, subjects were divided into two groups: breathy voice group (n = 21) and non-breathy voice group (n = 13). Maximum sustained phonation task was measured for acoustic analysis. CPP-related (i.e. mean F0, mean CPP, and mean CPPs) and LTAS-related (i.e. minimum, maximum, and mean) parameters were used. Independent samples t-test was conducted. Regarding CPP, there are significant differences in mean CPP and mean CPPs between groups. The values of mean CPP and CPPs in the non-breathy voice group are higher than those in the breathy voice group. The CPP could be regarded as the useful parameter for breathy voice analysis in the clinic. When it comes to LTAS, energy from 0 to 2 kHz are significantly different between groups. The minimum value of non-breathy group is lower than that of breathy group, whereas the maximum value of non-breathy group is higher. The frequency band below 2 kHz seems to be related to breathy voice.

분산 음성 인식 시스템을 위한 특징 계수 양자화 방식 설계 (Design of a Quantization Algorithm of the Speech Feature Parameters for the Distributed Speech Recognition)

  • 이준석;윤병식;강상원
    • 한국음향학회지
    • /
    • 제24권4호
    • /
    • pp.217-223
    • /
    • 2005
  • 본 논문에서는 분산 음성 인식 시스템에서 사용되는 멜켑스트럼 계수를 양자화 하기 위하여 예측 구조를 갖는 BC-TCQ 양자화기를 제안하였다. 분산 음성 인식 시스템을 위한 효율적인 멜켑스트럼 계수 양자화기를 설계하기 위하여, 인접 프레임간의 높은 상관도를 이용한 1차 AR 예측 필터를 적용하였다. 그리고 예측 필터에 의해서 구해지는 예측 에러 벡터는 BC-TCQ를 사용하여 양자화를 수행하였다. 본 연구에서 제안된 예측 BC-TCQ멜켑스트럼 계수 양자화기는 분산 음성 인식 시스템을 위해 ETSI 규격에서 사용되는 split VQ 멜켑스트럼 계수 양자화 방식보다 cepstral distortion (CD) 측면에서 훨씬 좋은 성능을 보이며, 인코딩 연산 복잡도 및 메모리 요구량에서도 더 유리하다.

A 3-Level Endpoint Detection Algorithm for Isolated Speech Using Time and Frequency-based Features

  • Eng, Goh Kia;Ahmad, Abdul Manan
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2004년도 ICCAS
    • /
    • pp.1291-1295
    • /
    • 2004
  • This paper proposed a new approach for endpoint detection of isolated speech, which proves to significantly improve the endpoint detection performance. The proposed algorithm relies on the root mean square energy (rms energy), zero crossing rate and spectral characteristics of the speech signal where the Euclidean distance measure is adopted using cepstral coefficients to accurately detect the endpoint of isolated speech. The algorithm offers better performance than traditional energy-based algorithm. The vocabulary for the experiment includes English digit from one to nine. These experimental results were conducted by 360 utterances from a male speaker. Experimental results show that the accuracy of the algorithm is quite acceptable. Moreover, the computation overload of this algorithm is low since the cepstral coefficients parameters will be used in feature extraction later of speech recognition procedure.

  • PDF

멜 켑스트럼 모듈레이션 에너지를 이용한 음성/음악 판별 (Speech/Music Discrimination Using Mel-Cepstrum Modulation Energy)

  • 김봉완;최대림;이용주
    • 대한음성학회지:말소리
    • /
    • 제64호
    • /
    • pp.89-103
    • /
    • 2007
  • In this paper, we introduce mel-cepstrum modulation energy (MCME) for a feature to discriminate speech and music data. MCME is a mel-cepstrum domain extension of modulation energy (ME). MCME is extracted on the time trajectory of Mel-frequency cepstral coefficients, while ME is based on the spectrum. As cepstral coefficients are mutually uncorrelated, we expect the MCME to perform better than the ME. To find out the best modulation frequency for MCME, we perform experiments with 4 Hz to 20 Hz modulation frequency. To show effectiveness of the proposed feature, MCME, we compare the discrimination accuracy with the results obtained from the ME and the cepstral flux.

  • PDF

정보이론 관점에서 음성 신호의 화자 특징 정보를 정량적으로 측정하는 방법에 관한 연구 (Quantitative Measure of Speaker Specific Information in Human Voice: From the Perspective of Information Theoretic Approach)

  • Kim Samuel;Seo Jung Tae;Kang Hong Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • 제24권1E호
    • /
    • pp.16-20
    • /
    • 2005
  • A novel scheme to measure the speaker information in speech signal is proposed. We develope the theory of quantitative measurement of the speaker characteristics in the information theoretic point of view, and connect it to the classification error rate. Homomorphic analysis based features, such as mel frequency cepstral coefficient (MFCC), linear prediction cepstral coefficient (LPCC), and linear frequency cepstral coefficient (LFCC) are studied to measure speaker specific information contained in those feature sets by computing mutual information. Theories and experimental results provide us quantitative measure of speaker information in speech signal.

Development of Software For Machinery Diagnostics by Adaptive Noise Cancelling Method (1St: Cepstrum Analysis)

  • Lee, Jung-Chul;Oh, Jae-Eung;Yum, Sung-Ha
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1987년도 한국자동제어학술회의논문집(한일합동학술편); 한국과학기술대학, 충남; 16-17 Oct. 1987
    • /
    • pp.836-841
    • /
    • 1987
  • Many kinds of conditioning monitoring technique have been studied, so this study has investigated the possibility of checking the trend in the fault diagnosis of ball bearing, one of the elements of rotating machine, by applying the cepstral analysis method using the adaptive noise cancelling (ANC) method. And computer simulation is conducted in oder to identify obviously the physical meaning of ANC. The optimal adaptation gain in adaptive filter is estimated, the performance of ANC according to the change of the signal to noise ratio and convergence of LMS algorithm is considered by simulation. It is verified that cepstral analysis using ANC method is more effective than the conventional cepstral analysis method in bearing fault diagnosis.

  • PDF

청각장애인을 위한 상황인지기반의 음향강화기술 (Sound Reinforcement Based on Context Awareness for Hearing Impaired)

  • 최재훈;장준혁
    • 대한전자공학회논문지SP
    • /
    • 제48권5호
    • /
    • pp.109-114
    • /
    • 2011
  • 본 논문에서는 청각장애인을 위한 음향 데이터를 이용한 음향강화 알고리즘을 Gaussian Mixture Model (GMM)을 이용한 상황인지 시스템 기반으로 제안한다. 음향 신호 데이터에서 Mel-Frequency Cepstral Coefficients (MFCC) 특징벡터를 추출하여 GMM을 구성하고 이를 기반으로 상황인지 결과에 따라 위험음향일 경우 음향강화기술을 제안한다. 실험결과 제안된 상황인지 기반의 음향강화 알고리즘이 다양한 음향학적 환경에서 우수한 성능을 보인 것을 알 수 있었다.

심리 음향 겝스트럼 평균 차감법을 이용한 이동 전화망에서의 음질 평가 (Speech Quality Measure in a Mobile Communication System using PLP Cepstral Distance with CMS)

  • 윤종진;박상욱;박영철;안동순;윤대희
    • 한국통신학회논문지
    • /
    • 제25권12B호
    • /
    • pp.2046-2051
    • /
    • 2000
  • 본 논문에서는 기존의 음질 평가 방법들보다 우수할 뿐 아니라 다양한 채널 경로의 음성 신호에 대해서도 일관된 성능을 갖는 새로운 음질 평가 방법 PLP-CMS(Perceptual Linear Predictive-Cepstral Mean Subtraction)를 제안한다. CDMA PCS 이동 전화 환경에서 음성 신호의 주관적 음질을 효과적으로 예측할 수 있는 PLP-CMS는 심리 음향 선형 예측 분석(PLP Analysis: Perceptual Linear Predictive Analysis)을 이용하여 주관적 음질과의 상관 관계를 높였으며, 겝스트럼 평균 차감(CMS: Cepstral Mean Subtraction) 과정을 통하여 PSTN 경로에 무관하게 일관된 성능을 갖음을 확인하였다.

  • PDF

실시간 고차통계 정규화와 Smoothing 필터를 이용한 강인한 음성인식 (Robust Speech Recognition Using Real-Time High Order Statistics Normalization and Smoothing Filter)

  • 정주현;송화전;김형순
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 춘계 학술대회 발표논문집
    • /
    • pp.91-94
    • /
    • 2005
  • The performance of speech recognition is degraded by the mismatch between training and test environments. Many methods have been presented to compensate for additive noise and channel effect in the cepstral domain, and Cepstral Mean Subtraction (CMS) is the representative method among them. Recently, high order cepstral moment normalization method has introduced to improve recognition accuracy. In this paper, we apply high order moment normalization method and smoothing filter for real-time processing. In experiments using Aurora2 DB, we obtained error rate reduction of 49.7% with the proposed algorithm in comparison with baseline system.

  • PDF