Search | Korea Science

Energy Feature Normalization for Robust Speech Recognition in Noisy Environments

Lee, Yoon-Jae;Ko, Han-Seok
- Speech Sciences
- /
- v.13 no.1
- /
- pp.129-139
- /
- 2006
In this paper, we propose two effective energy feature normalization methods for robust speech recognition in noisy environments. In the first method, we estimate the noise energy and remove it from the noisy speech energy. In the second method, we propose a modified algorithm for the Log-energy Dynamic Range Normalization (ERN) method. In the ERN method, the log energy of the training data in a clean environment is transformed into the log energy in noisy environments. If the minimum log energy of the test data is outside of a pre-defined range, the log energy of the test data is also transformed. Since the ERN method has several weaknesses, we propose a modified transform scheme designed to reduce the residual mismatch that it produces. In the evaluation conducted on the Aurora2.0 database, we obtained a significant performance improvement.
PDF

Improvement of Speech Reconstructed from MFCC Using GMM (GMM을 이용한 MFCC로부터 복원된 음성의 개선)

Choi, Won-Young;Choi, Mu-Yeol;Kim, Hyung-Soon
- MALSORI
- /
- no.53
- /
- pp.129-141
- /
- 2005
The goal of this research is to improve the quality of reconstructed speech in the Distributed Speech Recognition (DSR) system. For the extended DSR, we estimate the variable Maximum Voiced Frequency (MVF) from Mel-Frequency Cepstral Coefficient (MFCC) based on Gaussian Mixture Model (GMM), to implement realistic harmonic plus noise model for the excitation signal. For the standard DSR, we also make the voiced/unvoiced decision from MFCC based on GMM because the pitch information is not available in that case. The perceptual test reveals that speech reconstructed by the proposed method is preferred to the one by the conventional methods.
PDF

Performance Improvement of SPLICE-based Noise Compensation for Robust Speech Recognition (강인한 음성인식을 위한 SPLICE 기반 잡음 보상의 성능향상)

Kim, Hyung-Soon;Kim, Doo-Hee
- Speech Sciences
- /
- v.10 no.3
- /
- pp.263-277
- /
- 2003
One of major problems in speech recognition is performance degradation due to the mismatch between the training and test environments. Recently, Stereo-based Piecewise LInear Compensation for Environments (SPLICE), which is frame-based bias removal algorithm for cepstral enhancement using stereo training data and noisy speech model as a mixture of Gaussians, was proposed and showed good performance in noisy environments. In this paper, we propose several methods to improve the conventional SPLICE. First we apply Cepstral Mean Subtraction (CMS) as a preprocessor to SPLICE, instead of applying it as a postprocessor. Secondly, to compensate residual distortion after SPLICE processing, two-stage SPLICE is proposed. Thirdly we employ phonetic information for training SPLICE model. According to experiments on the Aurora 2 database, proposed method outperformed the conventional SPLICE and we achieved a 50% decrease in word error rate over the Aurora baseline system.
PDF

Prediction of Prosodic Boundaries Using Dependency Relation

Kim, Yeon-Jun;Oh, Yung-Hwan
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.4E
- /
- pp.26-30
- /
- 1999
This paper introduces a prosodic phrasing method in Korean to improve the naturalness of speech synthesis, especially in text-to-speech conversion. In prosodic phrasing, it is necessary to understand the structure of a sentence through a language processing procedure, such as part-of-speech (POS) tagging and parsing, since syntactic structure correlates better with the prosodic structure of speech than with other factors. In this paper, the prosodic phrasing procedure is treated from two perspectives: dependency parsing and prosodic phrasing using dependency relations. This is appropriate for Ural-Altaic, since a prosodic boundary in speech usually concurs with a governor of dependency relation. From experimental results, using the proposed method achieved 12% improvement in prosody boundary prediction accuracy with a speech corpus consisting 300 sentences uttered by 3 speakers.
PDF

Speech Recognition in Noisy Environments using Wiener Filtering (Wiener Filtering을 이용한 잡음환경에서의 음성인식)

Kim, Jin-Young;Eom, Ki-Wan;Choi, Hong-Sub
- Speech Sciences
- /
- v.1
- /
- pp.277-283
- /
- 1997
In this paper, we present a robust recognition algorithm based on the Wiener filtering method as a research tool to develop the Korean Speech recognition system. We especially used Wiener filtering method in cepstrum-domain, because the method in frequency-domain is computationally expensive and complex. Evaluation of the effectiveness of this method has been conducted in speaker-independent isolated Korean digit recognition tasks using discrete HMM speech recognition systems. In these tasks, we used 12th order weighted cepstral as a feature vector and added computer simulated white gaussian noise of different levels to clean speech signals for recognition experiments under noisy conditions. Experimental results show that the presented algorithm can provide an improvement in recognition of as much as from $5\%\;to\;\20\%$ in comparison to spectral subtraction method.
PDF

The Effect of Microteaching on Self Efficacy and Speech Anxiety for Student Health Educators (마이크로티칭이 예비보건교육사의 자기효능감 및 발표불안에 미치는 효과)

Kwon, Myung-Soon;Cho, Hae-Ryun
- Journal of Korean Public Health Nursing
- /
- v.26 no.1
- /
- pp.91-100
- /
- 2012
Purpose: The purpose of this study was to examine the effects of microteaching for student health educators. Methods: Subjects included 43 students in the nursing department of "H" university. In order to measure the degree of self-efficacy and speech anxiety at pre-treatment and post-treatment, subjects in the experimental group underwent training in microteaching. Results: No significant difference of variance in self-efficacy and speech anxiety was observed between subjects in the experimental group and those in the control group. However, self-confidence, a sub-domain of self-efficacy, showed a significant increase from pretest to post-test. In addition, speech anxiety measured during the post-test showed improvement, when compared with that of the pretest. Conclusion: Findings of this study provide preliminary evidence that microteaching may result in improved speech behavior.
https://doi.org/10.5932/JKPHN.2012.26.1.091 인용 PDF KSCI

Speech Recognition Using Noise Processing in Spectral Dimension (스펙트럴 차원의 잡음처리를 이용한 음성인식)

Lee, Gwang-seok
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2009.10a
- /
- pp.738-741
- /
- 2009
This research is concerned for improving the result of speech recognition under the noisy speech. We knew that spectral subtraction and recovery of valleys in spectral envelope obtained from noisy speech are more effective for the improvement of the recognition. In this research, the averaged spectral envelope obtained from vowel spectrums are used for the emphasis of valleys. The vocalic spectral information at lower frequency range is emphasized and the spectrum obtained from consonants is not changed. In simulation, the emphasis coefficients are varied on cepstral domain. This method is used for the recognition of noisy digits and is improved.
PDF

KMSAV: Korean multi-speaker spontaneous audiovisual dataset

Kiyoung Park;Changhan Oh;Sunghee Dong
- ETRI Journal
- /
- v.46 no.1
- /
- pp.71-81
- /
- 2024
Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open-source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state-of-the-art ASR and AVSR techniques, capitalizing on both pretrained models and fine-tuning processes. After fine-tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.
https://doi.org/10.4218/etrij.2023-0352 인용 PDF

Relationship between Speech Perception in Noise and Phonemic Restoration of Speech in Noise in Individuals with Normal Hearing

Vijayasarathy, Srikar;Barman, Animesh
- Journal of Audiology & Otology
- /
- v.24 no.4
- /
- pp.167-173
- /
- 2020
Background and Objectives: Top-down restoration of distorted speech, tapped as phonemic restoration of speech in noise, maybe a useful tool to understand robustness of perception in adverse listening situations. However, the relationship between phonemic restoration and speech perception in noise is not empirically clear. Subjects and Methods: 20 adults (40-55 years) with normal audiometric findings were part of the study. Sentence perception in noise performance was studied with various signal-to-noise ratios (SNRs) to estimate the SNR with 50% score. Performance was also measured for sentences interrupted with silence and for those interrupted by speech noise at -10, -5, 0, and 5 dB SNRs. The performance score in the noise interruption condition was subtracted by quiet interruption condition to determine the phonemic restoration magnitude. Results: Fairly robust improvements in speech intelligibility was found when the sentences were interrupted with speech noise instead of silence. Improvement with increasing noise levels was non-monotonic and reached a maximum at -10 dB SNR. Significant correlation between speech perception in noise performance and phonemic restoration of sentences interrupted with -10 dB SNR speech noise was found. Conclusions: It is possible that perception of speech in noise is associated with top-down processing of speech, tapped as phonemic restoration of interrupted speech. More research with a larger sample size is indicated since the restoration is affected by the type of speech material and noise used, age, working memory, and linguistic proficiency, and has a large individual variability.
https://doi.org/10.7874/jao.2019.00472 인용

Relationship between Speech Perception in Noise and Phonemic Restoration of Speech in Noise in Individuals with Normal Hearing

Vijayasarathy, Srikar;Barman, Animesh
- Korean Journal of Audiology
- /
- v.24 no.4
- /
- pp.167-173
- /
- 2020
Background and Objectives: Top-down restoration of distorted speech, tapped as phonemic restoration of speech in noise, maybe a useful tool to understand robustness of perception in adverse listening situations. However, the relationship between phonemic restoration and speech perception in noise is not empirically clear. Subjects and Methods: 20 adults (40-55 years) with normal audiometric findings were part of the study. Sentence perception in noise performance was studied with various signal-to-noise ratios (SNRs) to estimate the SNR with 50% score. Performance was also measured for sentences interrupted with silence and for those interrupted by speech noise at -10, -5, 0, and 5 dB SNRs. The performance score in the noise interruption condition was subtracted by quiet interruption condition to determine the phonemic restoration magnitude. Results: Fairly robust improvements in speech intelligibility was found when the sentences were interrupted with speech noise instead of silence. Improvement with increasing noise levels was non-monotonic and reached a maximum at -10 dB SNR. Significant correlation between speech perception in noise performance and phonemic restoration of sentences interrupted with -10 dB SNR speech noise was found. Conclusions: It is possible that perception of speech in noise is associated with top-down processing of speech, tapped as phonemic restoration of interrupted speech. More research with a larger sample size is indicated since the restoration is affected by the type of speech material and noise used, age, working memory, and linguistic proficiency, and has a large individual variability.
https://doi.org/10.7874/jao.2019.00472 인용

Search Result 613, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)