Search | Korea Science

Performance Improvement of Speech Recognizer in Noisy Environments Based on Auditory Modeling (청각 구조를 이용한 잡음 음성의 인식 성능 향상)

Jung, Ho-Young;Kim, Do-Yeong;Un, Chong-Kwan;Lee, Soo-Young
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.5
- /
- pp.51-57
- /
- 1995
In this paper, we study a noise-robust feature extraction method of speech signal based on auditory modeling. The auditory model consists of a basilar membrane, a hair cell model and spectrum output stage. Basilar membrane model describes a response characteristic of membrane according to vibration in speech wave, and is represented as a band-pass filter bank. Hair cell model describes a neural transduction according to displacements of the basilar membrane. It responds adaptively to relative values of input and plays an important role for noise-robustness. Spectrum output stage constructs a mean rate spectrum using the average firing rate of each channel. And we extract feature vectors using a mean rate spectrum. Simulation results show that when auditory-based feature extraction is used, the speech recognition performance in noisy environments is improved compared to other feature extraction methods.
PDF

A Study on the First Order Plus Time Delay Model Identification from Noisy Step Responses (노이즈가 있는 계단응답으로부터 일차시간지연모델 확인에 관한 연구)

Ju, Seungmin;Kim, Sung Jin;Byeon, Jeonguk;Chun, Daewoong;Sung, Su Whan;Lee, Jietae
- Korean Chemical Engineering Research
- /
- v.46 no.5
- /
- pp.949-957
- /
- 2008
Estimating the first order plus time delay model on the basis of the step responses has been widely used in industry for the tuning of PID controllers. Even though various model identification methods from simple graphical approaches to complicated approaches based on least squares method have been proposed, simple approaches to incorporate noisy step responses are rarely available. In this research, we will compare and analyze recent approaches using the integrals of the step responses and develop an improved identification method to incorporate real situations more effectively.
PDF KSCI

Speech Recognition based on Environment Adaptation using SNR Mapping (SNR 매핑을 이용한 환경적응 기반 음성인식)

Chung, Yong-Joo
- The Journal of the Korea institute of electronic communication sciences
- /
- v.9 no.5
- /
- pp.543-548
- /
- 2014
Multiple-model based speech recognition framework (MMSR) has been known to be very successful in speech recognition. Since it uses multiple hidden Markov modes (HMMs) that corresponds to various noise types and signal-to-noise ratio (SNR) values, the selected acoustic model can have a close match with the test noisy speech. However, since the number of HMM sets is limited in practical use, the acoustic mismatch still remains as a problem. In this study, we experimentally determined the optimal SNR mapping between the test noisy speech and the HMM set to mitigate the mismatch between them. Improved performance was obtained by employing the SNR mapping instead of using the estimated SNR from the test noisy speech. When we applied the proposed method to the MMSR, the experimental results on the Aurora 2 database show that the relative word error rate reduction of 6.3% and 9.4% was achieved compared to a conventional MMSR and multi-condition training (MTR), respectively.
https://doi.org/10.13067/JKIECS.201.9.5.543 인용 PDF KSCI

Feature Compensation Method Based on Parallel Combined Mixture Model (병렬 결합된 혼합 모델 기반의 특징 보상 기술)

김우일;이흥규;권오일;고한석
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.7
- /
- pp.603-611
- /
- 2003
This paper proposes an effective feature compensation scheme based on speech model for achieving robust speech recognition. Conventional model-based method requires off-line training with noisy speech database and is not suitable for online adaptation. In the proposed scheme, we can relax the off-line training with noisy speech database by employing the parallel model combination technique for estimation of correction factors. Applying the model combination process over to the mixture model alone as opposed to entire HMM makes the online model combination possible. Exploiting the availability of noise model from off-line sources, we accomplish the online adaptation via MAP (Maximum A Posteriori) estimation. In addition, the online channel estimation procedure is induced within the proposed framework. For more efficient implementation, we propose a selective model combination which leads to reduction or the computational complexities. The representative experimental results indicate that the suggested algorithm is effective in realizing robust speech recognition under the combined adverse conditions of additive background noise and channel distortion.
PDF KSCI

Single-Channel Non-Causal Speech Enhancement to Suppress Reverberation and Background Noise

Song, Myung-Suk;Kang, Hong-Goo
- The Journal of the Acoustical Society of Korea
- /
- v.31 no.8
- /
- pp.487-506
- /
- 2012
This paper proposes a speech enhancement algorithm to improve the speech intelligibility by suppressing both reverberation and background noise. The algorithm adopts a non-causal single-channel minimum variance distortionless response (MVDR) filter to exploit an additional information that is included in the noisy-reverberant signals in subsequent frames. The noisy-reverberant signals are decomposed into the parts of the desired signal and the interference that is not correlated to the desired signal. Then, the filter equation is derived based on the MVDR criterion to minimize the residual interference without bringing speech distortion. The estimation of the correlation parameter, which plays an important role to determine the overall performance of the system, is mathematically derived based on the general statistical reverberation model. Furthermore, the practical implementation methods to estimate sub-parameters required to estimate the correlation parameter are developed. The efficiency of the proposed enhancement algorithm is verified by performance evaluation. From the results, the proposed algorithm achieves significant performance improvement in all studied conditions and shows the superiority especially for the severely noisy and strongly reverberant environment.
https://doi.org/10.7776/ASK.2012.31.8.487 인용 PDF KSCI

Efficient Edge Detection in Noisy Images using Robust Rank-Order Test (잡음영상에서 로버스트 순위-순서 검정을 이용한 효과적인 에지검출)

Lim, Dong-Hoon
- The Korean Journal of Applied Statistics
- /
- v.20 no.1
- /
- pp.147-157
- /
- 2007
Edge detection has been widely used in computer vision and image processing. We describe a new edge detector based on the robust rank-order test which is a useful alternative to Wilcoxon test. Our method is based on detecting pixel intensity changes between two neighborhoods with a $r{\times}r$ window using an edge-height model to perform effectively on noisy images. Some experiments of our robust rank-order detector with several existing edge detectors are carried out on both synthetic images and real images with and without noise.
https://doi.org/10.5351/KJAS.2007.20.1.147 인용 PDF KSCI

A Noisy Videos Background Subtraction Algorithm Based on Dictionary Learning

Xiao, Huaxin;Liu, Yu;Tan, Shuren;Duan, Jiang;Zhang, Maojun
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.8 no.6
- /
- pp.1946-1963
- /
- 2014
Most background subtraction methods focus on dynamic and complex scenes without considering robustness against noise. This paper proposes a background subtraction algorithm based on dictionary learning and sparse coding for handling low light conditions. The proposed method formulates background modeling as the linear and sparse combination of atoms in the dictionary. The background subtraction is considered as the difference between sparse representations of the current frame and the background model. Assuming that the projection of the noise over the dictionary is irregular and random guarantees the adaptability of the approach in large noisy scenes. Experimental results divided in simulated large noise and realistic low light conditions show the promising robustness of the proposed approach compared with other competing methods.
https://doi.org/10.3837/tiis.2014.06.008 인용 PDF KSCI KPUBS HTML

Voice Activity Detection Using Global Speech Absence Probability Based on Teager Energy in Noisy Environments (잡음환경에서 Teager Energy 기반의 전역 음성부재확률을 이용하는 음성검출)

Park, Yun-Sik;Lee, Sang-Min
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.49 no.1
- /
- pp.97-103
- /
- 2012
In this paper, we propose a novel voice activity detection (VAD) algorithm to effectively distinguish speech from nonspeech in various noisy environments. Global speech absence probability (GSAP) derived from likelihood ratio (LR) based on the statistical model is widely used as the feature parameter for VAD. However, the feature parameter based on conventional GSAP is not sufficient to distinguish speech from noise at low SNRs (signal-to-noise ratios). The presented VAD algorithm utilizes GSAP based on Teager energy (TE) as the feature parameter to provide the improved performance of decision for speech segments in noisy environment. Performances of the proposed VAD algorithm are evaluated by objective test under various environments and better results compared with the conventional methods are obtained.
PDF KSCI

Voice Activity Detection Based on SNR and Non-Intrusive Speech Intelligibility Estimation

An, Soo Jeong;Choi, Seung Ho
- International Journal of Internet, Broadcasting and Communication
- /
- v.11 no.4
- /
- pp.26-30
- /
- 2019
This paper proposes a new voice activity detection (VAD) method which is based on SNR and non-intrusive speech intelligibility estimation. In the conventional SNR-based VAD methods, voice activity probability is obtained by estimating frame-wise SNR at each spectral component. However these methods lack performance in various noisy environments. We devise a hybrid VAD method that uses non-intrusive speech intelligibility estimation as well as SNR estimation, where the speech intelligibility score is estimated based on deep neural network. In order to train model parameters of deep neural network, we use MFCC vector and the intrusive speech intelligibility score, STOI (Short-Time Objective Intelligent Measure), as input and output, respectively. We developed speech presence measure to classify each noisy frame as voice or non-voice by calculating the weighted average of the estimated STOI value and the conventional SNR-based VAD value at each frame. Experimental results show that the proposed method has better performance than the conventional VAD method in various noisy environments, especially when the SNR is very low.
https://doi.org/10.7236/IJIBC.2019.11.4.26 인용 PDF KSCI

Recursive Estimation using the Hidden Filter Model for Enhancing Noisy Speech

Kang, Yeong-Tae
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.3E
- /
- pp.27-30
- /
- 1996
A recursive estimation for the enhancement of white noise contaminated speech is proposed. This method is based on the Kalman filter with time-varying parametric model for the clean speech signal. Then, hidden filter model are used to model the clean speech signal. An approximation improvement of 4-5 dB in SNR is achieved at 5 and 10 dB input SNR, respectively.
PDF

Search Result 344, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)