• Title/Summary/Keyword: Speech Feature Analysis

Search Result 177, Processing Time 0.029 seconds

The Pattern Recognition Methods for Emotion Recognition with Speech Signal (음성신호를 이용한 감성인식에서의 패턴인식 방법)

  • Park Chang-Hyeon;Sim Gwi-Bo
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.05a
    • /
    • pp.347-350
    • /
    • 2006
  • In this paper, we apply several pattern recognition algorithms to emotion recognition system with speech signal and compare the results. Firstly, we need emotional speech databases. Also, speech features for emotion recognition is determined on the database analysis step. Secondly, recognition algorithms are applied to these speech features. The algorithms we try are artificial neural network, Bayesian learning, Principal Component Analysis, LBG algorithm. Thereafter, the performance gap of these methods is presented on the experiment result section. Truly, emotion recognition technique is not mature. That is, the emotion feature selection, relevant classification method selection, all these problems are disputable. So, we wish this paper to be a reference for the disputes.

  • PDF

Statistical Extraction of Speech Features Using Independent Component Analysis and Its Application to Speaker Identification

  • 장길진;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.156-156
    • /
    • 2002
  • We apply independent component analysis (ICA) for extracting an optimal basis to the problem of finding efficient features for representing speech signals of a given speaker The speech segments are assumed to be generated by a linear combination of the basis functions, thus the distribution of speech segments of a speaker is modeled by adapting the basis functions so that each source component is statistically independent. The learned basis functions are oriented and localized in both space and frequency, bearing a resemblance to Gabor wavelets. These features are speaker dependent characteristics and to assess their efficiency we performed speaker identification experiments and compared our results with the conventional Fourier-basis. Our results show that the proposed method is more efficient than the conventional Fourier-based features in that they can obtain a higher speaker identification rate.

Independent Component Analysis on a Subband Domain for Robust Speech Recognition (음성의 특징 단계에 독립 요소 해석 기법의 효율적 적용을 통한 잡음 음성 인식)

  • Park, Hyeong-Min;Jeong, Ho-Yeong;Lee, Tae-Won;Lee, Su-Yeong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.37 no.6
    • /
    • pp.22-31
    • /
    • 2000
  • In this paper, we propose a method for removing noise components in the feature extraction process for robust speech recognition. This method is based on blind separation using independent component analysis (ICA). Given two noisy speech recordings the algorithm linearly separates speech from the unwanted noise signal. To apply ICA as closely as possible to the feature level for recognition, a new spectral analysis is presented. It modifies the computation of band energies by previously averaging out fast Fourier transform (FFT) points in several divided ranges within one met-scaled band. The simple analysis using sample variances of band energies of speech and noise, and recognition experiments showed its noise robustness. For noisy speech signals recorded in real environments, the proposed method which applies ICA to the new spectral analysis improved the recognition performances to a considerable extent, and was particularly effective for low signal-to-noise ratios (SNRs). This method gives some insights into applying ICA to feature levels and appears useful for robust speech recognition.

  • PDF

Recognition of Korean Connected Digit Telephone Speech Using the Training Data Based Temporal Filter (훈련데이터 기반의 temporal filter를 적용한 4연숫자 전화음성 인식)

  • Jung, Sung-Yun;Bae, Keun-Sung
    • MALSORI
    • /
    • no.53
    • /
    • pp.93-102
    • /
    • 2005
  • The performance of a speech recognition system is generally degraded in telephone environment because of distortions caused by background noise and various channel characteristics. In this paper, data-driven temporal filters are investigated to improve the performance of a specific recognition task such as telephone speech. Three different temporal filtering methods are presented with recognition results for Korean connected-digit telephone speech. Filter coefficients are derived from the cepstral domain feature vectors using the principal component analysis. According to experimental results, the proposed temporal filtering method has shown slightly better performance than the previous ones.

  • PDF

An Analysis Method of Strange Attractor for the Feature Extraction (음성 특징 추출을 위한 스트레인지 어트랙터의 분석 방법)

  • Kim, Tae-Sik
    • Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.147-155
    • /
    • 2002
  • In the area of speech processing, raw signals used to be presented into 2D format. However, such kind of presentation methods have limitation to extract characteristics from the signal because of the presentation method. Generally, not much information can be detected from the 2D signal. Strange attractor in the field of chaos theory provides a 3D presentation method. In the area of recognition problem, signal presentation method is very important because good features can be detected from a good presentation. This paper discusses a new feature extraction method that extracts features from a cycle of the strange attractor. A neural network is used to check whether the method extracts suitable features or not. The result shows very good points that can be applied to some areas of signal processing.

  • PDF

A Study on the Classification of the Korean Consonants in the VCV Speech Chain (VCV 연쇄음성상에 존재하는 한국어 자음의 분류에 관한 연구)

  • 최윤석;김기석;김원준;황희영
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.39 no.6
    • /
    • pp.607-615
    • /
    • 1990
  • In this paper, I propose the experimental models to classify the consonants in the Vowel-Consonant-Vowel (VCV) speech chain into four phonemic groups such as nasals, liquids, plosives and the others. To classify the fuzzy patterns like speech, it is necessary to analyze the distribution of acoustic feature of many training data. The classification rules are maximum 4 th order polynomial functions obtained by regression analysis, contributing collectively the result. The final result shows about 87% success rates with the data spoken by one man.

Speech signal processing in the auditory system (청각 계통에서의 음성신호처리)

  • 이재혁;심재성;백승화;박상희
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1987.10b
    • /
    • pp.680-683
    • /
    • 1987
  • The speech signal processing in the auditory system can be analysized based on two representations : Average discharge rate and Temporal discharge pattern. But the average discharge rate representation is restricted by the narrow dynamic range because of the rate saturation and the two tone suppression phenomena, and the temporal discharge pattern representation needs a sophisticate frequency analysis and synchrony measure. In this paper, a simple representation is proposed : using a model considering the interaction of Cochlear fluid-BM movement and a haircell model, the feature of speech signals (formant frequency and pitch of vowels) is easily estimated in the Average Synchronized Rate.

  • PDF

A Study on Korean Speech Analysis using Walsh Transform (Walsh변환을 이용한 한국어 숫자음 음성분석에 관한 연구)

  • 김계현;김준현
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.37 no.4
    • /
    • pp.251-256
    • /
    • 1988
  • This work describes a speech analysis of Korean number ('1'-'10') which are spoken by several speakers using Fast Walsh Transform(FWHT) method. FWHT includes only addition and subtraction operations, therefore faster and needs less memory than FFT(Fast Fourier Transfifrm) or LPC(Linear Predictive Coding) analysis method. We have investigated that FWHT method can find speaker independent feature(which represents same cue about some word independent of different speakers) The results of this experiment, the 70% of same words(korean number '2')which spoken by several speakers have had slmilar patterns.

  • PDF

Development of an Optimized Feature Extraction Algorithm for Throat Signal Analysis

  • Jung, Young-Giu;Han, Mun-Sung;Lee, Sang-Jo
    • ETRI Journal
    • /
    • v.29 no.3
    • /
    • pp.292-299
    • /
    • 2007
  • In this paper, we present a speech recognition system using a throat microphone. The use of this kind of microphone minimizes the impact of environmental noise. Due to the absence of high frequencies and the partial loss of formant frequencies, previous systems using throat microphones have shown a lower recognition rate than systems which use standard microphones. To develop a high performance automatic speech recognition (ASR) system using only a throat microphone, we propose two methods. First, based on Korean phonological feature theory and a detailed throat signal analysis, we show that it is possible to develop an ASR system using only a throat microphone, and propose conditions of the feature extraction algorithm. Second, we optimize the zero-crossing with peak amplitude (ZCPA) algorithm to guarantee the high performance of the ASR system using only a throat microphone. For ZCPA optimization, we propose an intensification of the formant frequencies and a selection of cochlear filters. Experimental results show that this system yields a performance improvement of about 4% and a reduction in time complexity of 25% when compared to the performance of a standard ZCPA algorithm on throat microphone signals.

  • PDF

Variation Analysis of Feature Parameters According to the Channel Distortion of Korean Telephone Digit Speech (한국어 숫자음 전화음성의 채널왜곡에 따른 특징파라미터의 변이 분석)

  • 정성윤;손종목;김민성;배건성
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.191-194
    • /
    • 2002
  • The final purpose of this paper is the enhancement of speech recognition rate under the matched telephone environment between training data and test data. To analyze the effect by the distortion of the changing telephone channel on every call, MFCC is used as the feature parameter and CMN, RTCN, and RASTA are used as channel compensation techniques. For each case, the variation of feature parameters of all phones is analyzed. And, we find recognition rates according to each compensation method using the continuous HMM recognizer, and examine the relationship between variation and recognition rate.

  • PDF