Search | Korea Science

The Development of a Speech Recognition Method Robust to Channel Distortions and Noisy Environments for an Audio Response System(ARS) (잡음환경및 채널왜곡에 강인한 ARS용 전화음성인식 방식 연구)

Ahn, Jung-Mo;Yim, Kye-Jong;Kay, Young-Chul;Koo, Myoung-Wan
- The Journal of the Acoustical Society of Korea
- /
- v.16 no.2
- /
- pp.41-48
- /
- 1997
This paper proposes the methods for improving the recognition rate of theARS, especially equipped with the speech recognition capability. Telephone speech, which is the input to the ARS, is usually affected by the announcements from the system, channel noise, and channel distortion, thus directly applying the recognition algorithm developed for clean speech to the noisy telephone speech will bring the significant performance degradation. To cope with this problem, this paper proposes three methods: 1)the accurate detection of the inputting instant of the speech in order to immediately turn off the announcements from the system at that instant, 2)the effective end-point detection of the noisy telephone speech on the basis of Teager energy, and 3)the SDCN-based compensation of the channel distortion. Experiments on speaker-independent, noisy telephone speech reveal that the combination of the above three proposed methods provides great improvements on the recognition rate over the conventional method, showing about 77% in contrast to only 23%.
PDF

Noisy Speech Recognition Based on Spectral Mapping Techniques (스펙트럼사상기법을 기초로 한 잡음음성인식)

Lee, Ki-Young
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.1E
- /
- pp.39-45
- /
- 1995
This paper presents noisy speech recognition method based on spectral mapping techniques of speaker adaptation method. In the presented method, the spectral mapping training makes the spectral distortion of noisy speech reduced, and for the more correctively spectral mapping, let the adjustment window;s slope be adaptive to several word lengths. As a result of recognition experiment, the recognition rate is higher than that of the conventional method using VQ and DTW without noise processing. Even when SNR level is 0 dB, the recognition rate is 10 times more than that using the conventional method. It is confirmed that the speacker adaptation technique using the spectral mapping training has an ability to improve the recognition performance for noisy speech.
PDF

A Data-Driven Jacobian Adaptation Method for the Noisy Speech Recognition (잡음음성인식을 위한 데이터 기반의 Jacobian 적응방식)

Chung Young-Joo
- The Journal of the Acoustical Society of Korea
- /
- v.25 no.4
- /
- pp.159-163
- /
- 2006
In this paper a data-driven method to improve the performance of the Jacobian adaptation (JA) for the noisy speech recognition is proposed. In stead of constructing the reference HMM by using the model composition method like the parallel model combination (PMC), we propose to train the reference HMM directly with the noisy speech. This was motivated from the idea that the directly trained reference HMM will model the acoustical variations due to the noise better than the composite HMM. For the estimation of the Jacobian matrices, the Baum-Welch algorithm is employed during the training. The recognition experiments have been done to show the improved performance of the proposed method over the Jacobian adaptation as well as other model compensation methods.
https://doi.org/10.7776/ASK.2006.25.4.159 인용 PDF KSCI

SPEECH ENHANCEMENT BY FREQUENCY-WEIGHTED BLOCK LMS ALGORITHM

Cho, D.H.
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1985.10a
- /
- pp.87-94
- /
- 1985
In this paper, enhancement of speech corrupted by additive white or colored noise is stuided. The nuconstrained frequency-domain block least-mean-square (UFBLMS) adaptation algorithm and its frequency-weighted version are newly applied to speech enhancement. For enhancement of speech degraded by white noise, the performance of the UFBLMS algorithm is superior to the spectral subtraction method or Wiener filtering technique by more than 3 dB in segmented frequency-weighted signal-to-noise ratio(FWSNERSEG) when SNR of speech is in the range of 0 to 10 dB. As for enhancement of noisy speech corrupted by colored noise, the UFBLMS algorithm is superior to that of the spectral subtraction method by about 3 to 5 dB in FWSNRSEG. Also, it yields better performance by about 2 dB in FWSNR and FWSNRSEG than that of time-domain least-mean-square (TLMS) adaptive prediction filter(APF). In view of the computational complexity and performance improvement in speech quality and intelligibility, the frequency-weighted UFBLMS algorithm appears to yield the best performance among various algorithms in enhancing noisy speech corrupted by white or colored noise.
PDF

Optimized Wiener Filter for Noise Reduction in VoIP Environments (VoIP 환경에서의 잡음제거를 위한 최적화된 위너 필터)

Jeong, Sang-Bae;Lee, Sung-Doke;Hahn, Min-Soo
- MALSORI
- /
- no.64
- /
- pp.105-119
- /
- 2007
Noise reduction technologies are indispensable to achieve acceptable speech quality in VoIP systems. This paper proposes a Wiener filter optimized to the estimated SNR of noisy speech for the noise reduction in VoIP environments. The proposed noise canceller is applied as a pre-processor before speech encoding. The performance of the proposed method is evaluated by the PESQ in various noisy conditions. In this paper, the proposed algorithm is applied to G.711, G.723.1, and G.729A which are all VoIP speech codecs. The PESQ results show that the performance of our proposed noise reduction scheme outperforms those of the noise suppression in the IS-127 EVRC and the ETSI standard for the advanced distributed speech recognition front-end.
PDF

Feature Vector Processing for Speech Emotion Recognition in Noisy Environments (잡음 환경에서의 음성 감정 인식을 위한 특징 벡터 처리)

Park, Jeong-Sik;Oh, Yung-Hwan
- Phonetics and Speech Sciences
- /
- v.2 no.1
- /
- pp.77-85
- /
- 2010
This paper proposes an efficient feature vector processing technique to guard the Speech Emotion Recognition (SER) system against a variety of noises. In the proposed approach, emotional feature vectors are extracted from speech processed by comb filtering. Then, these extracts are used in a robust model construction based on feature vector classification. We modify conventional comb filtering by using speech presence probability to minimize drawbacks due to incorrect pitch estimation under background noise conditions. The modified comb filtering can correctly enhance the harmonics, which is an important factor used in SER. Feature vector classification technique categorizes feature vectors into either discriminative vectors or non-discriminative vectors based on a log-likelihood criterion. This method can successfully select the discriminative vectors while preserving correct emotional characteristics. Thus, robust emotion models can be constructed by only using such discriminative vectors. On SER experiment using an emotional speech corpus contaminated by various noises, our approach exhibited superior performance to the baseline system.
PDF

Voice Activity Detection Using Global Speech Absence Probability Based on Teager Energy in Noisy Environments (잡음환경에서 Teager Energy 기반의 전역 음성부재확률을 이용하는 음성검출)

Park, Yun-Sik;Lee, Sang-Min
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.49 no.1
- /
- pp.97-103
- /
- 2012
In this paper, we propose a novel voice activity detection (VAD) algorithm to effectively distinguish speech from nonspeech in various noisy environments. Global speech absence probability (GSAP) derived from likelihood ratio (LR) based on the statistical model is widely used as the feature parameter for VAD. However, the feature parameter based on conventional GSAP is not sufficient to distinguish speech from noise at low SNRs (signal-to-noise ratios). The presented VAD algorithm utilizes GSAP based on Teager energy (TE) as the feature parameter to provide the improved performance of decision for speech segments in noisy environment. Performances of the proposed VAD algorithm are evaluated by objective test under various environments and better results compared with the conventional methods are obtained.
PDF KSCI

Feature Compensation Combining SNR-Dependent Feature Reconstruction and Class Histogram Equalization

Suh, Young-Joo;Kim, Hoi-Rin
- ETRI Journal
- /
- v.30 no.5
- /
- pp.753-755
- /
- 2008
In this letter, we propose a new histogram equalization technique for feature compensation in speech recognition under noisy environments. The proposed approach combines a signal-to-noise-ratio-dependent feature reconstruction method and the class histogram equalization technique to effectively reduce the acoustic mismatch present in noisy speech features. Experimental results from the Aurora 2 task confirm the superiority of the proposed approach for acoustic feature compensation.
PDF

A study of speech. enhancement through wavelet analysis using auditory mechanism (인간의 청각 메커니즘을 적용한 웨이블렛 분석을 통한 음성 향상에 대한 연구)

이준석;길세기;홍준표;홍승홍
- Proceedings of the IEEK Conference
- /
- 2002.06d
- /
- pp.397-400
- /
- 2002
This paper has been studied speech enhancement method in noisy environment. By mean of that we prefer human auditory mechanism which is perfect system and applied wavelet transform. Multi-resolution of wavelet transform make possible multiband spectrum analysis like human ears. This method was verified very effective way in noisy speech enhancement.
PDF

Speech Recognition by Integrating Audio, Visual and Contextual Features Based on Neural Networks (신경망 기반 음성, 영상 및 문맥 통합 음성인식)

김명원;한문성;이순신;류정우
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.41 no.3
- /
- pp.67-77
- /
- 2004
The recent research has been focused on fusion of audio and visual features for reliable speech recognition in noisy environments. In this paper, we propose a neural network based model of robust speech recognition by integrating audio, visual, and contextual information. Bimodal Neural Network(BMNN) is a multi-layer perception of 4 layers, each of which performs a certain level of abstraction of input features. In BMNN the third layer combines audio md visual features of speech to compensate loss of audio information caused by noise. In order to improve the accuracy of speech recognition in noisy environments, we also propose a post-processing based on contextual information which are sequential patterns of words spoken by a user. Our experimental results show that our model outperforms any single mode models. Particularly, when we use the contextual information, we can obtain over 90% recognition accuracy even in noisy environments, which is a significant improvement compared with the state of art in speech recognition. Our research demonstrates that diverse sources of information need to be integrated to improve the accuracy of speech recognition particularly in noisy environments.
PDF KSCI

Search Result 395, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)