• Title/Summary/Keyword: Noise speech data

Search Result 144, Processing Time 0.024 seconds

A Study on a Model Parameter Compensation Method for Noise-Robust Speech Recognition (잡음환경에서의 음성인식을 위한 모델 파라미터 변환 방식에 관한 연구)

  • Chang, Yuk-Hyeun;Chung, Yong-Joo;Park, Sung-Hyun;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.5
    • /
    • pp.112-121
    • /
    • 1997
  • In this paper, we study a model parameter compensation method for noise-robust speech recognition. We study model parameter compensation on a sentence by sentence and no other informations are used. Parallel model combination(PMC), well known as a model parameter compensation algorithm, is implemented and used for a reference of performance comparision. We also propose a modified PMC method which tunes model parameter with an association factor that controls average variability of gaussian mixtures and variability of single gaussian mixture per state for more robust modeling. We obtain a re-estimation solution of environmental variables based on the expectation-maximization(EM) algorithm in the cepstral domain. To evaluate the performance of the model compensation methods, we perform experiments on speaker-independent isolated word recognition. Noise sources used are white gaussian and driving car noise. To get corrupted speech we added noise to clean speech at various signal-to-noise ratio(SNR). We use noise mean and variance modeled by 3 frame noise data. Experimental result of the VTS approach is superior to other methods. The scheme of the zero order VTS approach is similar to the modified PMC method in adapting mean vector only. But, the recognition rate of the Zero order VTS approach is higher than PMC and modified PMC method based on log-normal approximation.

  • PDF

A Study on the Room Acoustics in Churches (교회 건축물의 실내음향 특성에 관한 연구)

  • 주진수
    • Journal of KSNVE
    • /
    • v.9 no.4
    • /
    • pp.681-686
    • /
    • 1999
  • In a church, speech intelligibility is very important together with the reverberance for musical activities. In order to obtain the primary data of a acoustical design for churches records were refereed and churches were measured in Europe and Japan. And in the base of measurements, those were judged by subjective hearing test. As some results, it has been found that the room acoustics of churches were different in a country and the reverberation time was perferred two seconds for speech intelligibility. However, although personal deviations were admitted, more long echoes were preferred for the music.

  • PDF

The Effect of the Telephone Channel to the Performance of the Speaker Verification System (전화선 채널이 화자확인 시스템의 성능에 미치는 영향)

  • 조태현;김유진;이재영;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.12-20
    • /
    • 1999
  • In this paper, we compared speaker verification performance of the speech data collected in clean environment and in channel environment. For the improvement of the performance of speaker verification gathered in channel, we have studied on the efficient feature parameters in channel environment and on the preprocessing. Speech DB for experiment is consisted of Korean doublet of numbers, considering the text-prompted system. Speech features including LPCC(Linear Predictive Cepstral Coefficient), MFCC(Mel Frequency Cepstral Coefficient), PLP(Perceptually Linear Prediction), LSP(Line Spectrum Pair) are analyzed. Also, the preprocessing of filtering to remove channel noise is studied. To remove or compensate for the channel effect from the extracted features, cepstral weighting, CMS(Cepstral Mean Subtraction), RASTA(RelAtive SpecTrAl) are applied. Also by presenting the speech recognition performance on each features and the processing, we compared speech recognition performance and speaker verification performance. For the evaluation of the applied speech features and processing methods, HTK(HMM Tool Kit) 2.0 is used. Giving different threshold according to male or female speaker, we compare EER(Equal Error Rate) on the clean speech data and channel data. Our simulation results show that, removing low band and high band channel noise by applying band pass filter(150~3800Hz) in preprocessing procedure, and extracting MFCC from the filtered speech, the best speaker verification performance was achieved from the view point of EER measurement.

  • PDF

Distribution of the Slopes of Autocovariances of Speech Signals in Frequency Bands (음성 신호의 주파수 대역별 자기 공분산 기울기 분포)

  • Kim, Seonil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.5
    • /
    • pp.1076-1082
    • /
    • 2013
  • The frequency bands were discovered which maximize the slopes of autocovariances of speech signals in frequency domain to increase the possibility of segregation between speech signals and background noise signal. A speech signal is divided into blocks which include multiples of sampled data, then those blocks are transformed to frequency domain using Fast Fourier Transform(FFT). To find linear equation by Linear Regression, the coefficients of autocovariance within blocks of some frequency band are used. The slope of the linear equation which is called the slope of autocovariance is varied from band to band according to the characteristics of the speech signal. Using speech signals of a man which consist of 200 files, the coefficients of the slopes of autocovariances are analyzed and compared from band to band.

Robust Multi-channel Wiener Filter for Suppressing Noise in Microphone Array Signal (마이크로폰 어레이 신호의 잡음 제거를 위한 강인한 다채널 위너 필터)

  • Jung, Junyoung;Kim, Gibak
    • Journal of Broadcast Engineering
    • /
    • v.23 no.4
    • /
    • pp.519-525
    • /
    • 2018
  • This paper deals with noise suppression of multi-channel data captured by microphone array using multi-channel Wiener filter. Multi-channel Wiener filter does not rely on information about the direction of the target speech and can be partitioned into an MVDR (Minimum Variance Distortionless Response) spatial filter and a single channel spectral filter. The acoustic transfer function between the single speech source and microphones can be estimated by subspace decomposition of multi-channel Wiener filter. The errors are incurred in the estimation of the acoustic transfer function due to the errors in the estimation of correlation matrices, which in turn results in speech distortion in the MVDR filter. To alleviate the speech distortion in the MVDR filter, diagonal loading is applied. In the experiments, database with seven microphones was used and MFCC distance was measured to demonstrate the effectiveness of the diagonal loading.

The relevancy between Physical index and subjective appraisal of classrooms (강의실 내의 물리지표와 주관적 평가와의 상관관계)

  • Lee, Chai-Bong;Kim, Yong-Man
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2002.11b
    • /
    • pp.743-748
    • /
    • 2002
  • The eventual Purpose of this research is to make optimum standards for acoustic-environment by using not only physical characteristics but also subjective appraisals. Basic physical data were measured which were necessary to establish standards for acoustic environment in campus buildings, TSP has used to measure sound levels, reverberation times, clearness indexes, and speech-transmission-index. In addition to physical characteristics, questionnaires were given to university students to given subjective appraisals. For instance, questions about volume or clearness of lectures. The relevancy between physical characteristics and subjective appraisals was studied.

  • PDF

Performance Improvement of Speech Recognition Based on Independent Component Analysis (독립성분분석법을 이용한 음성인식기의 성능향상)

  • 김창근;한학용;허강인
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2001.06a
    • /
    • pp.285-288
    • /
    • 2001
  • In this paper, we proposed new method of speech feature extraction using ICA(Independent Component Analysis) which minimized the dependency and correlation among speech signals on purpose to separate each component in the speech signal. ICA removes the repeating of data after finding the axis direction which has the greatest variance in input dimension. We verified improvement of speech recognition ability with training and recognition experiments when ICA compared with conventional mel-cepstrum features using HMM. Also, we can see that ICA dealt with the situation of recognition ability decline that is caused by environmental noise.

  • PDF

A Discrete Feature Vector for Endpoint Detection of Speech with Hidden Markov Model (숨은마코프모형을 이용하는 음성 끝점 검출을 위한 이산 특징벡터)

  • Lee, Jei-Ky;Oh, Chang-Hyuck
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.6
    • /
    • pp.959-967
    • /
    • 2008
  • The purpose of this paper is to suggest a discrete feature vector, robust in various levels of noisy environment and inexpensive in computation, for detection of speech segments and is to show such properties of the feature with real speech data. The suggested feature is one dimensional vector which represents slope of short term energies and is discretized into three values to reduce computational burden of computations in HMM. In experiments with speech data, the method with the suggested feature vector showed good performance even in noisy environments.

A Study on the Features for Building Korean Digit Recognition System Based on Multilayer Perceptron (다층 퍼셉트론에 기반한 한국어 숫자음 인식시스템 구현을 위한 특징 연구)

  • 김인철;김대영
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.6 no.4
    • /
    • pp.81-88
    • /
    • 2001
  • In this paper, a Korean digit recognition system based on a multilayer Perceptron is implemented. We also investigate the performance of widely used speech features, such as the Mel-scale filterbank, MFCC, LPCC, and PLP coefficients, by applying them as input of the proposed recognition system. In order to build a robust speech system, the experiments for demonstrating its recognition performance for the clean data as well as corrupt data are carried out. In experiments of recognizing 20 Korean digit, we found that the Mel-scale filterbank coefficients performs best in terms of recognition accuracy for the speech dependent and speech independent database even though noise is considerably added.

  • PDF

A Data-Driven Jacobian Adaptation Method for the Noisy Speech Recognition (잡음음성인식을 위한 데이터 기반의 Jacobian 적응방식)

  • Chung Young-Joo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.4
    • /
    • pp.159-163
    • /
    • 2006
  • In this paper a data-driven method to improve the performance of the Jacobian adaptation (JA) for the noisy speech recognition is proposed. In stead of constructing the reference HMM by using the model composition method like the parallel model combination (PMC), we propose to train the reference HMM directly with the noisy speech. This was motivated from the idea that the directly trained reference HMM will model the acoustical variations due to the noise better than the composite HMM. For the estimation of the Jacobian matrices, the Baum-Welch algorithm is employed during the training. The recognition experiments have been done to show the improved performance of the proposed method over the Jacobian adaptation as well as other model compensation methods.