Search | Korea Science

The Magnitude Distribution method of U/V decision (음성신호의 전폭분포를 이용한 유/무성음 검출에 대한 연구)

배성근
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1993.06a
- /
- pp.249-252
- /
- 1993
In speech signal processing, The accurate detection of the voiced/unvoiced is important for robust word recognition and analysis. This algorithm is based on the MD in the frame of speech signals that does not require statistical information about either signal or background-noise to decide a voiced/unvoiced. This paper presents a method of estimation the Characteristic of Magnitude Distribution from noisy speech and also of estimation the optimal threshold based on the MD of the voiced/unvoiced decision. The performances of this detectors is evaluated and compared to that obtained from classifying other paper.
PDF

Otsu's method for speech endpoint detection (Otsu 방법을 이용한 음성 종결점 탐색 알고리즘)

Gao, Yu;Zang, Xian;Chong, Kil-To
- Proceedings of the IEEK Conference
- /
- 2009.05a
- /
- pp.40-42
- /
- 2009
This paper presents an algorithm, which is based on Otsu's method, for accurate and robust endpoint detection for speech recognition under noisy environments. The features are extracted in time domain, and then an optimal threshold is selected by minimizing the discriminant criterion, so as to maximize the separability of the speech part and environment part. The simulation results show that the method play a good performance in detection accuracy.
PDF

Distant-talking of Speech Interface for Humanoid Robots (휴머노이드 로봇을 위한 원거리 음성 인터페이스 기술 연구)

Lee, Hyub-Woo;Yook, Dong-Suk
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.39-40
- /
- 2007
For efficient interaction between human and robots, speech interface is a core problem especially in noisy and reverberant conditions. This paper analyzes main issues of spoken language interface for humanoid robots, such as sound source localization, voice activity detection, and speaker recognition.
PDF

Performance Assessment of Speech Recogniger using Lombard Speech (롬바드 음성을 이용한 음성인식기의 성능 평가)

Jung, Sung-Yun;Chung, Hyun-Yeol;Kim, Kyung-Tae
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.5
- /
- pp.59-68
- /
- 1994
This paper describes the performance assessment test and analysis of test results on a Korean speech recognizer which recognizes Lombard effect received speech in noisy environment, as a basic performance assessment research. In the assessement test, standard speech data were first manipulated close to speech uttered in a noisy environment, and then performance assessment tests were carried out along with the assessment items (the type of noise, SNR) in two ways-one with Lombard effect received speech(LES), the other with not received(NLES). As a result, when 90% of recognition rate is set to be a recognition limit, it was achieved at 10dB SNR point with LES, while at 30dB with NLES. This 20dB of SNR difference indicates Lombard effect should be considered in real world assessment test. The type of noises didn't affect performance of recognizers in out tests. ANOVA analysis, in evaluating several kinds of recognizers, showed every assessment item affecting the recognition performance could be quantified.
PDF

Method for Spectral Enhancement by Binary Mask for Speech Recognition Enhancement Under Noise Environment (잡음환경에서 음성인식 성능향상을 위한 바이너리 마스크를 이용한 스펙트럼 향상 방법)

Choi, Gab-Keun;Kim, Soon-Hyob
- The Journal of the Acoustical Society of Korea
- /
- v.29 no.7
- /
- pp.468-474
- /
- 2010
The major factor that disturbs practical use of speech recognition is distortion by the ambient and channel noises. Generally, the ambient noise drops the performance and restricts places to use. DSR (Distributed Speech Recognition) based speech recognition also has this problem. Various noise cancelling algorithms are applied to solve this problem, but loss of spectrum and remaining noise by incorrect noise estimation at low SNR environments cause drop of recognition rate. This paper proposes methods for speech enhancement. This method uses MMSE-STSA for noise cancelling and ideal binary mask to compensate damaged spectrum. According to experiments at noisy environment (SNR 15 dB ~ 0 dB), the proposed methods showed better spectral results and recognition performance.
https://doi.org/10.7776/ASK.2010.29.7.468 인용 PDF KSCI

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

Kwon, Oh-Wook;Alex Waibel
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.1E
- /
- pp.3-11
- /
- 2002
Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.
PDF KSCI

Harmonics-based Spectral Subtraction and Feature Vector Normalization for Robust Speech Recognition

Beh, Joung-Hoon;Lee, Heung-Kyu;Kwon, Oh-Il;Ko, Han-Seok
- Speech Sciences
- /
- v.11 no.1
- /
- pp.7-20
- /
- 2004
In this paper, we propose a two-step noise compensation algorithm in feature extraction for achieving robust speech recognition. The proposed method frees us from requiring a priori information on noisy environments and is simple to implement. First, in frequency domain, the Harmonics-based Spectral Subtraction (HSS) is applied so that it reduces the additive background noise and makes the shape of harmonics in speech spectrum more pronounced. We then apply a judiciously weighted variance Feature Vector Normalization (FVN) to compensate for both the channel distortion and additive noise. The weighted variance FVN compensates for the variance mismatch in both the speech and the non-speech regions respectively. Representative performance evaluation using Aurora 2 database shows that the proposed method yields 27.18% relative improvement in accuracy under a multi-noise training task and 57.94% relative improvement under a clean training task.
PDF

CASA-based Front-end Using Two-channel Speech for the Performance Improvement of Speech Recognition in Noisy Environments (잡음환경에서의 음성인식 성능 향상을 위한 이중채널 음성의 CASA 기반 전처리 방법)

Park, Ji-Hun;Yoon, Jae-Sam;Kim, Hong-Kook
- Proceedings of the IEEK Conference
- /
- 2007.07a
- /
- pp.289-290
- /
- 2007
In order to improve the performance of a speech recognition system in the presence of noise, we propose a noise robust front-end using two-channel speech signals by separating speech from noise based on the computational auditory scene analysis (CASA). The main cues for the separation are interaural time difference (ITD) and interaural level difference (ILD) between two-channel signal. As a result, we can extract 39 cepstral coefficients are extracted from separated speech components. It is shown from speech recognition experiments that proposed front-end has outperforms the ETSI front-end with single-channel speech.
PDF

Eigenvoice Adaptation of Classification Model for Binary Mask Estimation (Eigenvoice를 이용한 이진 마스크 분류 모델 적응 방법)

Kim, Gibak
- Journal of Broadcast Engineering
- /
- v.20 no.1
- /
- pp.164-170
- /
- 2015
This paper deals with the adaptation of classification model in the binary mask approach to suppress noise in the noisy environment. The binary mask estimation approach is known to improve speech intelligibility of noisy speech. However, the same type of noisy data for the test data should be included in the training data for building the classification model of binary mask estimation. The eigenvoice adaptation is applied to the noise-independent classification model and the adapted model is used as noise-dependent model. The results are reported in Hit rates and False alarm rates. The experimental results confirmed that the accuracy of classification is improved as the number of adaptation sentences increases.
https://doi.org/10.5909/JBE.2015.20.1.164 인용 PDF KSCI KPUBS HTML

Fast Speaker Adaptation in Noisy Environment using Environment Clustering (잡음 환경하에서 환경 군집화를 이용한 고속화자 적응)

Kim, Young-Kuk;Song, Hwa-Jeon;Kim, Hyung-Soon
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.33-36
- /
- 2007
In this paper, we investigate a fast speaker adaptation method based on eigenvoice in several noisy environments. In order to overcome its weakness against noise, we propose a noisy environment clustering method which divides the noisy adaptation utterances into utterance groups with similar environments by the vector quantization based clustering using a cepstral mean as a feature vector. Then each utterance group is used for adaptation to make an environment dependent model. According to our experiment, we obtained 19-37 % relative improvement in error rate compared with the simultaneous speaker adaptation and environmental compensation method
PDF

Search Result 395, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)