• Title/Summary/Keyword: cepstral

Search Result 293, Processing Time 0.021 seconds

Emotion Recognition using Robust Speech Recognition System (강인한 음성 인식 시스템을 사용한 감정 인식)

  • Kim, Weon-Goo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.5
    • /
    • pp.586-591
    • /
    • 2008
  • This paper studied the emotion recognition system combined with robust speech recognition system in order to improve the performance of emotion recognition system. For this purpose, the effect of emotional variation on the speech recognition system and robust feature parameters of speech recognition system were studied using speech database containing various emotions. Final emotion recognition is processed using the input utterance and its emotional model according to the result of speech recognition. In the experiment, robust speech recognition system is HMM based speaker independent word recognizer using RASTA mel-cepstral coefficient and its derivatives and cepstral mean subtraction(CMS) as a signal bias removal. Experimental results showed that emotion recognizer combined with speech recognition system showed better performance than emotion recognizer alone.

Multiple octave-band based genre classification algorithm for music recommendation (음악추천을 위한 다중 옥타브 밴드 기반 장르 분류기)

  • Lim, Shin-Cheol;Jang, Sei-Jin;Lee, Seok-Pil;Kim, Moo-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.7
    • /
    • pp.1487-1494
    • /
    • 2011
  • In this paper, a novel genre classification algorithm is proposed for music recommendation system. Especially, to improve the classification accuracy, the band-pass filter for octave-based spectral contrast (OSC) feature is designed considering the psycho-acoustic model and actual frequency range of musical instruments. The GTZAN database including 10 genres was used for 10-fold cross validation experiments. The proposed multiple-octave based OSC produces better accuracy by 2.26% compared with the conventional OSC. The combined feature vector based on the proposed OSC and mel-frequency cepstral coefficient (MFCC) gives even better accuracy.

Digital Audio Watermarking in The Cepstrum Domain (켑스트럼 영역에서의 오디오 워터마킹 방법)

  • 이상광;호요성
    • Journal of Broadcast Engineering
    • /
    • v.6 no.1
    • /
    • pp.13-20
    • /
    • 2001
  • In this paper, we propose a new digital audio watermarking scheme In the cepstrum domain. We insert a digital watermark signal Into the cepstral components of the audio signal using a technique analogous to spread spectrum Communications, hiding a narrow band signal in a wade band channel. In our proposed method, we use pseudo-random sequences to watermark the audio signal. The watermark Is then weighted in the cepstrum domain according to the distribution of cepstral coefficients and the frequency masking characteristics of the human auditory system. The proposed watermark embedding scheme minimizes audibility of the watermark signal. and the embedded watermark is robust to mu1tip1e watermarks, MPEG audio ceding and additive noose.

  • PDF

A Study on Speaker Recognition Algorithm Through Wire/Wireless Telephone (유무선 전화를 통한 화자인식 알고리즘에 관한 연구)

  • 김정호;정희석;강철호;김선희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3
    • /
    • pp.182-187
    • /
    • 2003
  • In this thesis, we propose the algorithm to improve the performance of speaker verification that is mapping feature parameters by using RBF neural network. There is a big difference between wire vector region and wireless one which comes from the same speaker. For wire/wireless speakers model production, speaker verification system should distinguish the wire/wireless channel that based on speech recognition system. And the feature vector of untrained channel models is mapped to the feature vector(LPC Cepstrum) of trained channel model by using RBF neural network. As a simulation result, the proposed algorithm makes 0.6%∼10.5% performance improvement compared to conventional method such as cepstral mean subtraction.

Implementation of HMM Based Speech Recognizer with Medium Vocabulary Size Using TMS320C6201 DSP (TMS320C6201 DSP를 이용한 HMM 기반의 음성인식기 구현)

  • Jung, Sung-Yun;Son, Jong-Mok;Bae, Keun-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.1E
    • /
    • pp.20-24
    • /
    • 2006
  • In this paper, we focused on the real time implementation of a speech recognition system with medium size of vocabulary considering its application to a mobile phone. First, we developed the PC based variable vocabulary word recognizer having the size of program memory and total acoustic models as small as possible. To reduce the memory size of acoustic models, linear discriminant analysis and phonetic tied mixture were applied in the feature selection process and training HMMs, respectively. In addition, state based Gaussian selection method with the real time cepstral normalization was used for reduction of computational load and robust recognition. Then, we verified the real-time operation of the implemented recognition system on the TMS320C6201 EVM board. The implemented recognition system uses memory size of about 610 kbytes including both program memory and data memory. The recognition rate was 95.86% for ETRI 445DB, and 96.4%, 97.92%, 87.04% for three kinds of name databases collected through the mobile phones.

Study of Spectral Factorization using Circulant Matrix Factorization to Design the FIR/IIR Lattice Filters (FIR/IIR Lattice 필터의 설계를 위한 Circulant Matrix Factorization을 사용한 Spectral Factorization에 관한 연구)

  • 김상태;박종원
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.3
    • /
    • pp.437-447
    • /
    • 2003
  • We propose the methods to design the finite impulse response (FIR) and the infinite impulse response (IIR) lattice filters using Schur algorithm through the spectral factorization of the covariance matrix by circulant matrix factorization (CMF). Circulant matrix factorization is also very powerful tool used fur spectral factorization of the covariance polynomial in matrix domain to obtain the minimum phase polynomial without the polynomial root finding problem. Schur algorithm is the method for a fast Cholesky factorization of Toeplitz matrix, which easily determines the lattice filter parameters. Examples for the case of the FIR Inter and for the case of the IIR filter are included, and performance of our method check by comparing of our method and another methods (polynomial root finding and cepstral deconvolution).

A Study on the Algorithm Development for Speech Recognition of Korean and Japanese (한국어와 일본어의 음성 인식을 위한 알고리즘 개발에 관한 연구)

  • Lee, Sung-Hwa;Kim, Hyung-Lae
    • Journal of IKEEE
    • /
    • v.2 no.1 s.2
    • /
    • pp.61-67
    • /
    • 1998
  • In this thesis, experiment have performed with the speaker recognition using multilayer feedforward neural network(MFNN) model using Korean and Japanese digits . The 5 adult males and 5 adult females pronounciate form 0 to 9 digits of Korean, Japanese 7 times. And then, they are extracted characteristics coefficient through Pitch deletion algorithm, LPC analysis, and LPC Cepstral analysis to generate input pattern of MFNN. 5 times among them are used to train a neural network, and 2 times is used to measure the performance of neural network. Both Korean and Japanese, Pitch coefficients is about 4%t more enhanced than LPC or LPC Cepstral coefficients.

  • PDF

Voice Quality of Dysarthric Speakers in Connected Speech (연결발화에서 마비말화자의 음질 특성)

  • Seo, Inhyo;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.5 no.4
    • /
    • pp.33-41
    • /
    • 2013
  • This study investigated the perceptual and cepstral/spectral characteristics of phonation and their relationships in dysarthria in connected speech. Twenty-two participants were divided into two groups; the eleven dysarthric speakers were paired with matching age and gender healthy control participants. A perceptual evaluation was performed by three speech pathologists using the GRBAS scale to measure the cepstrual/spectral characteristics of phonation between the two groups' connected speech. Correlations showed dysarthric speakers scored significantly worse (with a higher rating) with severities in G (overall dysphonia grade), B (breathiness), and S (strain), while the smoothed prominence of the cepstral peak (CPPs) was significantly lower. The CPPs were significantly correlated with the perceptual ratings, including G, B, and S. The utility of CPPs is supported by its high relationship with perceptually rated dysphonia severity in dysarthric speakers. The receiver operating characteristic (ROC) analysis showed that the threshold of 5.08 dB for the CPPs achieved a good classification for dysarthria, with 63.6% sensitivity and the perfect specificity (100%). Those results indicate the CPPs reliably distinguished between healthy controls and dysarthric speakers. However, the CPP frequency (CPP F0) and low-high spectral ratio (L/H ratio) were not significantly different between the two groups.

Classification of Doppler Audio Signals for Moving Target Using Hidden Markov Model in Pulse Doppler Radar (펄스 도플러 레이더에서 HMM을 이용한 이동표적의 도플러 오디오 신호 식별)

  • Sim, Jae-Hun;Lee, Jung-Ho;Bae, Keun-Sung
    • Journal of IKEEE
    • /
    • v.22 no.3
    • /
    • pp.624-629
    • /
    • 2018
  • Classification of moving targets in Pulse Doppler Radar(PDR) for surveillance and reconnaissance purposes is generally carried out based on listening and training experience of Doppler audio signals by radar operator. In this paper, we proposed the automatic classification method to identify the class of moving target with Doppler audio signals using the Mel Frequency Cepstral Coefficients(MFCC) and the Hidden Markov Model(HMM) algorithm which are widely used in speech recognition and the classification performance was analyzed and verified by simulations.

Characteristics of voice quality on clear versus casual speech in individuals with Parkinson's disease (명료발화와 보통발화에서 파킨슨병환자 음성의 켑스트럼 및 스펙트럼 분석)

  • Shin, Hee-Baek;Shim, Hee-Jeong;Jung, Hun;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.77-84
    • /
    • 2018
  • The purpose of this study is to examine the acoustic characteristics of Parkinsonian speech, with respect to different utterance conditions, by employing acoustic/auditory-perceptual analysis. The subjects of the study were 15 patients (M=7, F=8) with Parkinson's disease who were asked to read out sentences under different utterance conditions (clear/casual). The sentences read out by each subject were recorded, and the recorded speech was subjected to cepstrum and spectrum analysis using Analysis of Dysphonia in Speech and Voice (ADSV). Additionally, auditory-perceptual evaluation of the recorded speech was conducted with respect to breathiness and loudness. Results indicate that in the case of clear speech, there was a statistically significant increase in the cepstral peak prominence (CPP), and a decrease in the L/H ratio SD (ratio of low to high frequency spectral energy SD) and CPP F0 SD values. In the auditory-perceptual evaluation, a decrease in breathiness and an increase in loudness were noted. Furthermore, CPP was found to be highly correlated to breathiness and loudness. This provides objective evidence of the immediate usefulness of clear speech intervention in improving the voice quality of Parkinsonian speech.