Search | Korea Science

A Speech Recognition System based on a New Endpoint Estimation Method jointly using Audio/Video Informations (음성/영상 정보를 이용한 새로운 끝점추정 방식에 기반을 둔 음성인식 시스템)

이동근;김성준;계영철
- Journal of Broadcast Engineering
- /
- v.8 no.2
- /
- pp.198-203
- /
- 2003
We develop the method of estimating the endpoints of speech by jointly using the lip motion (visual speech) and speech being included in multimedia data and then propose a new speech recognition system (SRS) based on that method. The endpoints of noisy speech are estimated as follows : For each test word, two kinds of endpoints are detected from visual speech and clean speech, respectively Their difference is made and then added to the endpoints of visual speech to estimate those for noisy speech. This estimation method for endpoints (i.e. speech interval) is applied to form a new SRS. The SRS differs from the convention alone in that each word model in the recognizer is provided an interval of speech not Identical but estimated respectively for the corresponding word. Simulation results show that the proposed method enables the endpoints to be accurately estimated regardless of the amount of noise and consequently achieves 8 o/o improvement in recognition rate.
PDF KSCI

PCMM-Based Feature Compensation Method Using Multiple Model to Cope with Time-Varying Noise (시변 잡음에 대처하기 위한 다중 모델을 이용한 PCMM 기반 특징 보상 기법)

김우일;고한석
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.6
- /
- pp.473-480
- /
- 2004
In this paper we propose an effective feature compensation scheme based on the speech model in order to achieve robust speech recognition. The proposed feature compensation method is based on parallel combined mixture model (PCMM). The previous PCMM works require a highly sophisticated procedure for estimation of the combined mixture model in order to reflect the time-varying noisy conditions at every utterance. The proposed schemes can cope with the time-varying background noise by employing the interpolation method of the multiple mixture models. We apply the‘data-driven’method to PCMM tot move reliable model combination and introduce a frame-synched version for estimation of environments posteriori. In order to reduce the computational complexity due to multiple models, we propose a technique for mixture sharing. The statistically similar Gaussian components are selected and the smoothed versions are generated for sharing. The performance is examined over Aurora 2.0 and speech corpus recorded while car-driving. The experimental results indicate that the proposed schemes are effective in realizing robust speech recognition and reducing the computational complexities under both simulated environments and real-life conditions.
PDF KSCI

A Study on the Improvement of Isolated Word Recognition for Telephone Speech (전화음성의 격리단어인식 개선에 관한 연구)

Do, Sam-Joo;Un, Chong-Kwan
- The Journal of the Acoustical Society of Korea
- /
- v.9 no.4
- /
- pp.66-76
- /
- 1990
In this work, the effect of noise and distortion of a telephone channel on the speech recognition is studied, and methods to improve the recognition rate are proposed. Computer simulation is done using the 100-word test data whichwere made by pronouncing ten times 100-phonetically balanced Korean isolated words in a speaker dependent mode. First, a spectral subtraction method is suggested to improve the noisy speech recognition. Then, the effect of bandwidth limiting and channel distortion is studied. It has been found that bandwidth limiting and amplitude distortion lower the recognition rate significantly, but phase distortion affects little. To reduce the channel effect, we modify the reference pattern according to some training data. When both channel noise and distortion exist, the recognition rate without the proposed method is merely 7.7~26.4%, but the recognition rate with the proposed method is drastically increased to 76.2~92.3%.
PDF

A New Feature for Speech Segments Extraction with Hidden Markov Models (숨은마코프모형을 이용하는 음성구간 추출을 위한 특징벡터)

Hong, Jeong-Woo;Oh, Chang-Hyuck
- Communications for Statistical Applications and Methods
- /
- v.15 no.2
- /
- pp.293-302
- /
- 2008
In this paper we propose a new feature, average power, for speech segments extraction with hidden Markov models, which is based on mel frequencies of speech signals. The average power is compared with the mel frequency cepstral coefficients, MFCC, and the power coefficient. To compare performances of three types of features, speech data are collected for words with explosives which are generally known hard to be detected. Experiments show that the average power is more accurate and efficient than MFCC and the power coefficient for speech segments extraction in environments with various levels of noise.
https://doi.org/10.5351/CKSS.2008.15.2.293 인용 PDF KSCI

Adaptive Channel Normalization Based on Infomax Algorithm for Robust Speech Recognition

Jung, Ho-Young
- ETRI Journal
- /
- v.29 no.3
- /
- pp.300-304
- /
- 2007
This paper proposes a new data-driven method for high-pass approaches, which suppresses slow-varying noise components. Conventional high-pass approaches are based on the idea of decorrelating the feature vector sequence, and are trying for adaptability to various conditions. The proposed method is based on temporal local decorrelation using the information-maximization theory for each utterance. This is performed on an utterance-by-utterance basis, which provides an adaptive channel normalization filter for each condition. The performance of the proposed method is evaluated by isolated-word recognition experiments with channel distortion. Experimental results show that the proposed method yields outstanding improvement for channel-distorted speech recognition.
PDF

The Research of Reducing the Fixed Codebook Search Time of G.723.1 MP-MLQ (잡음 환경에서의 전송율 감소를 위한 G.723.1 VAD 성능개선에 관한 연구)

김정진;박영호;배명진
- Proceedings of the IEEK Conference
- /
- 2000.06d
- /
- pp.98-101
- /
- 2000
On CELP type Vocoders G.723.1 6.3kbps/5.3kbps Dual Rate Speech Codec, which is developed for Internet Phone and videoconferencing, uses VAD(Voice Activity Detection)/CNG (Comfort Noise Generator) in order to reduce the bit rate in a silence period. In order to reduce the bit rate effectively in this paper, we first set the boundary condition of the energy threshold to prevent the consumption of unnecessary processing time, and use three decision rules to detect an active frame by energy, pitch gain and LSP distance. To evaluate the performance of the proposed algorithm we use silence-inserted speech data with 0, 5, 10, 20dB of SNR. As a result when SNR is over 5dB, the bit rate is reduced up to about 40% without speech degradation and the processing time is additionally decreased.
PDF

The Study of Correlation between HNR and NNE (HNR과 NNE와의 상관관계 연구)

Kim, Kyung-Yuel;Shin, Myung-Sun;An, Jong-Bok;Jeong, Ok-Ran
- Speech Sciences
- /
- v.8 no.3
- /
- pp.235-241
- /
- 2001
The purpose of this study was to determine the correlation between HNR and NNE, EGG HNR and EGG NNE, which are most closely related to noise of voice. Dr. Speech was utilized to obtain acoustic measurements and physiologic measurements, simultaneously. In addition, no normative data of HNR, NNE, EGG HNR, EGG NNE are available in Korean subjects at present. The Pearson correlation coefficient was used to find the correlation between HNR and NNE, EGG HNR and EGG NNE. The results of the study were as follows: First, there was no correlation between HNR and NNE. Second, there was a negative correlation between EGG HNR and EGG NNE. Finally, the mean value of HNR of normal Korean adults was $28.8\pm2.7$ dB, that of NNE -$11.7\pm3.9$ dB, that of EGG HNR $32.9\pm4.6$ dB, and that of EGG NNE $-30.3\pm4.6$ dB.
PDF

Production of English final stops by Korean speakers

Kim, Jungyeon
- Phonetics and Speech Sciences
- /
- v.10 no.4
- /
- pp.11-17
- /
- 2018
This study reports on a production experiment designed to investigate how Korean speaking learners of English produce English forms ending in stops. In a repetition experiment, Korean participants listened to English nonce words ending in a stop and repeated what they heard. English speakers were recruited for the same task as a control group. The experimental result indicated that the transcriptions of the Korean productions by English native speakers showed vowel insertion in only 3% of productions although the pronunciation of English final stops showed that noise intervals after the closure of final stops were significantly longer for Korean speakers than for English speakers. This finding is inconsistent with the loanword data where 49% of words showed vowel insertion. It is also not compatible with the perceptual similarity approach, which predicts that because Korean speakers accurately perceive an English final stop as a final consonant, they will insert a vowel to make the English sound more similar to the Korean sound.
https://doi.org/10.13064/KSSS.2018.10.4.011 인용 PDF KSCI

Wavelet-based Algorithm for Signal Reconstruction (신호 복원을 위한 웨이브렛기반 알고리즘)

Bae, Sang-Bum;Kim, Nam-Ho
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.11 no.1
- /
- pp.150-156
- /
- 2007
Noise is generated by several causes, when signal is processed. Hence, it generates error in the process of data transmission and decreases recognition ratio of image and speech data. Therefore, after eliminating those noises, a variety of methods for reconstructing the signal have been researched. Recently, wavelet transform which has time-frequency localization and is possible for multiresolution analysis is applied to many fields of technology. Then threshold-and correlation-based methods are proposed for removing noise. But, conventional methods accept a lot of noise as an edge and are impossible to remove the additive white Gaussian noise (AWGN) and the impulse noise at the same time. Therefore, in this paper we proposed new wavelet-based algorithm for reconstructing degraded signal by noise and compared it with conventional methods.
https://doi.org/10.6109/JKIICE.2007.11.1.150 인용 PDF KSCI

Voice Recognition Based on Adaptive MFCC and Neural Network (적응 MFCC와 Neural Network 기반의 음성인식법)

Bae, Hyun-Soo;Lee, Suk-Gyu
- IEMEK Journal of Embedded Systems and Applications
- /
- v.5 no.2
- /
- pp.57-66
- /
- 2010
In this paper, we propose an enhanced voice recognition algorithm using adaptive MFCC(Mel Frequency Cepstral Coefficients) and neural network. Though it is very important to extract voice data from the raw data to enhance the voice recognition ratio, conventional algorithms are subject to deteriorating voice data when they eliminate noise within special frequency band. Differently from the conventional MFCC, the proposed algorithm imposed bigger weights to some specified frequency regions and unoverlapped filterbank to enhance the recognition ratio without deteriorating voice data. In simulation results, the proposed algorithm shows better performance comparing with MFCC since it is robust to variation of the environment.
PDF KSCI

Search Result 144, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)