• 제목/요약/키워드: cepstrum coefficients

검색결과 68건 처리시간 0.026초

Analysis of Speech Signals Depending on the Microphone and Micorphone Distance

  • Son, Jong-Mok
    • The Journal of the Acoustical Society of Korea
    • /
    • 제17권4E호
    • /
    • pp.41-47
    • /
    • 1998
  • Microphone is the first link in the speech recognition system. Depending on its type and mounting position, the microphone can significantly distort the spectrum and affect the performance of the speech recognition system. In this paper, characteristics of the speech signal for different microphones and microphone distances are investigated both in time and frequency domains. In the time domain analysis, the average signal-to-noise ration is measure ration is measured for the database we collected depending on the microphones and microphone distances. Mel-frequency spectral coefficients and mel-frequency cepstrum are computed to examine the spectral characteristics. Analysis results are discussed with our findings, and the result of recognition experiments is given.

  • PDF

연결 축소 회로망을 이용한 EMG 신호 기능 인식에 관한 연구 (A Study on EMG Functional Recognition Vsing Reduced-Connection Network)

  • 조정호;최윤호
    • 대한의용생체공학회:의공학회지
    • /
    • 제11권2호
    • /
    • pp.249-256
    • /
    • 1990
  • In this study, LPC cepstrum coefficients are used as feature vector extracted from AR model of EMG signal, and a reduced-connection network whlch has reduced connection between nodes is constructed to classify and recognize EMG functional classes. The proposed network reduces learning time and improves system stability Therefore it is Ehown that the proposed network is appropriate in recognizing function of EMG signal.

  • PDF

Speaker-Dependent Emotion Recognition For Audio Document Indexing

  • Hung LE Xuan;QUENOT Georges;CASTELLI Eric
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2004년도 ICEIC The International Conference on Electronics Informations and Communications
    • /
    • pp.92-96
    • /
    • 2004
  • The researches of the emotions are currently great interest in speech processing as well as in human-machine interaction domain. In the recent years, more and more of researches relating to emotion synthesis or emotion recognition are developed for the different purposes. Each approach uses its methods and its various parameters measured on the speech signal. In this paper, we proposed using a short-time parameter: MFCC coefficients (Mel­Frequency Cepstrum Coefficients) and a simple but efficient classifying method: Vector Quantification (VQ) for speaker-dependent emotion recognition. Many other features: energy, pitch, zero crossing, phonetic rate, LPC... and their derivatives are also tested and combined with MFCC coefficients in order to find the best combination. The other models: GMM and HMM (Discrete and Continuous Hidden Markov Model) are studied as well in the hope that the usage of continuous distribution and the temporal behaviour of this set of features will improve the quality of emotion recognition. The maximum accuracy recognizing five different emotions exceeds $88\%$ by using only MFCC coefficients with VQ model. This is a simple but efficient approach, the result is even much better than those obtained with the same database in human evaluation by listening and judging without returning permission nor comparison between sentences [8]; And this result is positively comparable with the other approaches.

  • PDF

Automatic Detection of Cow's Oestrus in Audio Surveillance System

  • Chung, Y.;Lee, J.;Oh, S.;Park, D.;Chang, H.H.;Kim, S.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제26권7호
    • /
    • pp.1030-1037
    • /
    • 2013
  • Early detection of anomalies is an important issue in the management of group-housed livestock. In particular, failure to detect oestrus in a timely and accurate way can become a limiting factor in achieving efficient reproductive performance. Although a rich variety of methods has been introduced for the detection of oestrus, a more accurate and practical method is still required. In this paper, we propose an efficient data mining solution for the detection of oestrus, using the sound data of Korean native cows (Bos taurus coreanea). In this method, we extracted the mel frequency cepstrum coefficients from sound data with a feature dimension reduction, and use the support vector data description as an early anomaly detector. Our experimental results show that this method can be used to detect oestrus both economically (even a cheap microphone) and accurately (over 94% accuracy), either as a standalone solution or to complement known methods.

LPC Cepstral 벡터 양자화에 의한 저 전송율 CELP 음성부호기의 스펙트럼 표기 (Spectrum Representation Based on LPC Cepstral VQ for Low Bit Rate CELP Coder)

  • 정재호
    • 한국통신학회논문지
    • /
    • 제19권4호
    • /
    • pp.761-771
    • /
    • 1994
  • 본 논문에서는, 매우 낮은 전송율이 요구되는 음성통신의 환경하에서 CELP 음성 부호기를 사용할 경우, 스펙트럼에 대한 정보를 어떻게 효과적으로 나타낼 것인가에 대하여 고찰하였다. 구체적으로, 스펙트럼에 대한 정보를 나타내는 LPC 파라메타를 cepstrum으로 변형시키고, 변형된 LPC cepstrum계수들을 효과적으로 벡터 양자화하는 방법을 제시하였다. 벡터 양자화에 사용되는 코드-북의 설계를 위하여, 주파수 대역에서 서로 다른 의미를 갖는 세계의 cepstral distance measure들을 시도하였으며, 각각에 대한 성능이 분석되어졌다. 시뮬레이션을 통하여, 본 논문에서 제시한 LPC cepstral 벡터 양자화 방식이 스펙트럼에 대한 정보를 매우 효과적으로 나타낼 수 있음을 보였다.

  • PDF

FSVQ, 퍼지 개념 및 이중 스펙트럼 특징을 이용한 HMM에 기초를 둔 음성 인식 (HMM-based Speech Recognition using FSVQ, Fuzzy Concept and Doubly Spectral Feature)

  • 정의봉
    • 한국컴퓨터산업학회논문지
    • /
    • 제5권4호
    • /
    • pp.491-502
    • /
    • 2004
  • 본 논문은 화자 독립의 단독어 인식에 관한 연구로써, FSVQ(first section vector quantization), 퍼지 이론 및 이중 스펙트럼 특징을 이용한 HMM(hidden Markov model) 모델을 제안한다. 제안된 연구 방법에서, 이중 특징 파라메타로써 LPC ?스트럼과 LPC 스트럼의 회귀 계수를 사용한다. 학습 데이터는 몇 개의 구간으로 나누어지며, 첫 번째 구간의 코드북(codebook)을 만든 후, 첫 번째 구간의 코드북으로 부터, 퍼지 개념을 도입하여 확률 값이 큰 순서에 의해 다중 관측열을 구한다. 그 다음, 첫 번째 구간의 관측열을 학습시키고, 같은 방법으로 확률 값을 얻은 단어가 인식되어 진다. 제안된 방법에 의한 인식 실험을 수행하는 것 이외에도 비교를 위하여 다른 방법의 인식 실험을 같은 조건하에서 같은 데이터로 수행하였다. 실험 결과, 본 연구에서 제안한 방법이 다른 방법들보다 인식률이 우수함을 입증하였다. 입증하였다.

  • PDF

후두질환 음성의 자동 식별 성능 비교 (Performance Comparison of Automatic Detection of Laryngeal Diseases by Voice)

  • 강현민;김수미;김유신;김형순;조철우;양병곤;왕수건
    • 대한음성학회지:말소리
    • /
    • 제45호
    • /
    • pp.35-45
    • /
    • 2003
  • Laryngeal diseases cause significant changes in the quality of speech production. Automatic detection of laryngeal diseases by voice is attractive because of its nonintrusive nature. In this paper, we apply speech recognition techniques to detection of laryngeal cancer, and investigate which feature parameters and classification methods are appropriate for this purpose. Linear Predictive Cepstral Coefficients (LPCC) and Mel-Frequency Cepstral Coefficients (MFCC) are examined as feature parameters, and parameters reflecting the periodicity of speech and its perturbation are also considered. As for classifier, multilayer perceptron neural networks and Gaussian Mixture Models (GMM) are employed. According to our experiments, higher order LPCC with the periodic information parameters yields the best performance.

  • PDF

HMM-Based Automatic Speech Recognition using EMG Signal

  • Lee Ki-Seung
    • 대한의용생체공학회:의공학회지
    • /
    • 제27권3호
    • /
    • pp.101-109
    • /
    • 2006
  • It has been known that there is strong relationship between human voices and the movements of the articulatory facial muscles. In this paper, we utilize this knowledge to implement an automatic speech recognition scheme which uses solely surface electromyogram (EMG) signals. The EMG signals were acquired from three articulatory facial muscles. Preliminary, 10 Korean digits were used as recognition variables. The various feature parameters including filter bank outputs, linear predictive coefficients and cepstrum coefficients were evaluated to find the appropriate parameters for EMG-based speech recognition. The sequence of the EMG signals for each word is modelled by a hidden Markov model (HMM) framework. A continuous word recognition approach was investigated in this work. Hence, the model for each word is obtained by concatenating the subword models and the embedded re-estimation techniques were employed in the training stage. The findings indicate that such a system may have a capacity to recognize speech signals with an accuracy of up to 90%, in case when mel-filter bank output was used as the feature parameters for recognition.

화자확인에서 특징벡터의 순시 정보와 선형 변환의 효과적인 적용 (Effective Combination of Temporal Information and Linear Transformation of Feature Vector in Speaker Verification)

  • 서창우;조미화;임영환;전성채
    • 말소리와 음성과학
    • /
    • 제1권4호
    • /
    • pp.127-132
    • /
    • 2009
  • The feature vectors which are used in conventional speaker recognition (SR) systems may have many correlations between their neighbors. To improve the performance of the SR, many researchers adopted linear transformation method like principal component analysis (PCA). In general, the linear transformation of the feature vectors is based on concatenated form of the static features and their dynamic features. However, the linear transformation which based on both the static features and their dynamic features is more complex than that based on the static features alone due to the high order of the features. To overcome these problems, we propose an efficient method that applies linear transformation and temporal information of the features to reduce complexity and improve the performance in speaker verification (SV). The proposed method first performs a linear transformation by PCA coefficients. The delta parameters for temporal information are then obtained from the transformed features. The proposed method only requires 1/4 in the size of the covariance matrix compared with adding the static and their dynamic features for PCA coefficients. Also, the delta parameters are extracted from the linearly transformed features after the reduction of dimension in the static features. Compared with the PCA and conventional methods in terms of equal error rate (EER) in SV, the proposed method shows better performance while requiring less storage space and complexity.

  • PDF

한국어와 일본어의 음성 인식을 위한 알고리즘 개발에 관한 연구 (A Study on the Algorithm Development for Speech Recognition of Korean and Japanese)

  • 이성화;김병래
    • 전기전자학회논문지
    • /
    • 제2권1호
    • /
    • pp.61-67
    • /
    • 1998
  • 본 연구에서는 다층 순방향 신경망(MFNN) 모델을 이용해서 한국어 및 일본어 숫자음 인식 실험을 수행하였다. 각각 5명의 한국인 남성 및 여성 화자가 0부터 9까지의 10개의 숫자를 7회 발음토록 하였고, 그중 2회 발음한 것을 인식 실험에 사용하였다. 이들 음성 데이터로부터 각각 추출된 피치 계수, 선형 예측 계수, 선형 예측 켑스트럼 계수들을 신경망의 입력 패턴으로 입력시켜 인식 성능을 측정하였다. 한국어를 사용한 실험과 일본어를 사용한 실험 모두에서 피치 계수를 사용하는 것이 다른 계수를 사용하는 것보다 약 4% 정도 우수한 성능을 나타내었다.

  • PDF