Search | Korea Science

Pre-Processing for Performance Enhancement of Speech Recognition in Digital Communication Systems (디지털 통신 시스템에서의 음성 인식 성능 향상을 위한 전처리 기술)

Seo, Jin-Ho;Park, Ho-Chong
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.7
- /
- pp.416-422
- /
- 2005
Speech recognition in digital communication systems has very low performance due to the spectral distortion caused by speech codecs. In this paper, the spectral distortion by speech codecs is analyzed and a pre-processing method which compensates for the spectral distortion is proposed for performance enhancement of speech recognition. Three standard speech codecs. IS-127 EVRC. ITU G.729 CS-ACELP and IS-96 QCELP. are considered for algorithm development and evaluation, and a single method which can be applied commonly to all codecs is developed. The performance of the proposed method is evaluated for three codecs, and by using the speech features extracted from the compensated spectrum. the recognition rate is improved by the maximum of $15.6\%$ compared with that using the degraded speech features.
PDF KSCI

Robust End Point Detection for Robot Speech Recognition Using Double Talk Detection (음성인식 로봇을 위한 동시통화검출 기반의 강인한 음성 끝점 검출)

Moon, Sung-Kyu;Park, Jin-Soo;Ko, Han-Seok
- The Journal of the Acoustical Society of Korea
- /
- v.31 no.3
- /
- pp.161-169
- /
- 2012
This paper presents a robust speech end-point detector using double talk detection in echoic conditioned speech recognition robot. The proposed method consists of combining conventional end-point detector result and double talk detector result. We have tested the proposed method in isolated word recognition system under echoic conditioned environment. As a result, the proposed algorithm shows superior performance of 30 % to the available techniques in the points of speech recognition rates.
https://doi.org/10.7776/ASK.2012.31.3.161 인용 PDF KSCI

Language Specific CTC Projection Layers on Wav2Vec2.0 for Multilingual ASR (다국어 음성인식을 위한 언어별 출력 계층 구조 Wav2Vec2.0)

Lee, Won-Jun;Lee, Geun-Bae
- Annual Conference on Human and Language Technology
- /
- 2021.10a
- /
- pp.414-418
- /
- 2021
다국어 음성인식은 단일언어 음성인식에 비해 높은 난이도를 보인다. 하나의 단일 모델로 다국어 음성인식을 수행하기 위해선 다양한 언어가 공유하는 음성적 특성을 모델이 학습할 수 있도록 하여 음성인식 성능을 향상시킬 수 있다. 본 연구는 딥러닝 음성인식 모델인 Wav2Vec2.0 구조를 변경하여 한국어와 영어 음성을 하나의 모델로 학습하는 방법을 제시한다. CTC(Connectionist Temporal Classification) 손실함수를 이용하는 Wav2Vec2.0 모델의 구조에서 각 언어마다 별도의 CTC 출력 계층을 두고 각 언어별 사전(Lexicon)을 적용하여 음성 입력을 다른 언어로 혼동되는 경우를 원천적으로 방지한다. 제시한 Wav2Vec2.0 구조를 사용하여 한국어와 영어를 잘못 분류하여 음성인식률이 낮아지는 문제를 해결하고 더불어 제시된 한국어 음성 데이터셋(KsponSpeech)에서 한국어와 영어를 동시에 학습한 모델이 한국어만을 이용한 모델보다 향상된 음성 인식률을 보임을 확인하였다. 마지막으로 Prefix 디코딩을 활용하여 언어모델을 이용한 음성인식 성능 개선을 수행하였다.
PDF

Robust Speech Enhancement Using HMM and $H_\infty$ Filter (HMM과 $H_\infty$필터를 이용한 강인한 음성 향상)

이기용;김준일
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.7
- /
- pp.540-547
- /
- 2004
Since speech enhancement algorithms based on Kalman/Wiener filter require a priori knowledge of the noise and have focused on the minimization of the variance of the estimation error between clean and estimated speech signal, small estimation error on the noise statistics may lead to large estimation error. However, H/sub ∞/ filter does not require any assumptions and a priori knowledge of the noise statistics, but searches the best estimated signal among the entire estimated signal by applying least upper bound, consequently it is more robust to the variation of noise statistics than Kalman/Wiener filter. In this paper, we Propose a speech enhancement method using HMM and multi H/sub ∞/ filters. First, HMM parameters are estimated with the training data. Secondly, speech is filtered with multiple number of H/sub ∞/ filters. Finally, the estimation of clean speech is obtained from the sum of the weighted filtered outputs. Experimental results shows about 1dB∼2dB SNR improvement with a slight increment of computation compared with the Kalman filter method.
PDF KSCI

Word Boundary Detection of Voice Signal Using Recurrent Fuzzy Associative Memory (순환 퍼지연상기억장치를 이용한 음성경계 추출)

Ma Chang-Su;Kim Gye-Young
- Journal of KIISE:Software and Applications
- /
- v.31 no.9
- /
- pp.1171-1179
- /
- 2004
We describe word boundary detection that extracts the boundary between speech and non-speech. The proposed method uses two features. One is the normalized root mean square of speech signal, which is insensitive to white noises and represents temporal information. The other is the normalized met-frequency band energy of voice signal, which is frequency information of the signal. Our method detects word boundaries using a recurrent fuzzy associative memory(RFAM) that extends FAM by adding recurrent nodes. Hebbian learning method is employed to establish the degree of association between an input and output. An error back-propagation algorithm is used for teaming the weights between the consequent layer and the recurrent layer. To confirm the effectiveness, we applied the suggested system to voice data obtained from KAIST.
PDF KSCI

Speech Basis Matrix Using Noise Data and NMF-Based Speech Enhancement Scheme (잡음 데이터를 활용한 음성 기저 행렬과 NMF 기반 음성 향상 기법)

Kwon, Kisoo;Kim, Hyung Young;Kim, Nam Soo
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.40 no.4
- /
- pp.619-627
- /
- 2015
This paper presents a speech enhancement method using non-negative matrix factorization (NMF). In the training phase, each basis matrix of source signal is obtained from a proper database, and these basis matrices are utilized for the source separation. In this case, the performance of speech enhancement relies heavily on the basis matrix. The proposed method for which speech basis matrix is made a high reconstruction error for noise signal shows a better performance than the standard NMF which basis matrix is trained independently. For comparison, we propose another method, and evaluate one of previous method. In the experiment result, the performance is evaluated by perceptual evaluation speech quality and signal to distortion ratio, and the proposed method outperformed the other methods.
https://doi.org/10.7840/kics.2015.40.4.619 인용 PDF KSCI

A Study on the Realization of Wireless Home Network System Using High-performance Speech Recognition in Variable Position (가변위치 고음성인식 기술을 이용한 무선 홈 네트워크 시스템 구현에 관한 연구)

Yoon, Jun-Chul;Choi, Sang-Bang;Park, Chan-Sub;Kim, Se-Yong;Kim, Ki-Man;Kang, Suk-Youb
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.14 no.4
- /
- pp.991-998
- /
- 2010
In realization of wireless home network system using speech recognition in indoor voice recognition environment, background noise and reverberation are two main causes of digression in voice recognition system. In this study, the home network system resistant to reverberation and background noise using voice section detection method based on spectral entropy in indoor recognition environment is to be realized. Spectral subtraction can reduce the effect of reverberation and remove noise independent from voice signal by eliminating signal distorted by reverberation in spectrum. For effective spectral subtraction, the correct separation of voice section and silent section should be accompanied and for this, improvement of performance needs to be done, applying to voice section detection method based on entropy. In this study, experimental and indoor environment testing is carried out to figure out command recognition rate in indoor recognition environment. The test result shows that command recognition rate improved in static environment and reverberant room condition, using voice section detection method based on spectral entropy.
https://doi.org/10.6109/jkiice.2010.14.4.991 인용 PDF KSCI

Statistical Voice Activity Defector Based on Signal Subspace Model (신호 준공간 모델에 기반한 통계적 음성 검출기)

Ryu, Kwang-Chun;Kim, Dong-Kook
- The Journal of the Acoustical Society of Korea
- /
- v.27 no.7
- /
- pp.372-378
- /
- 2008
Voice activity detectors (VAD) are important in wireless communication and speech signal processing, In the conventional VAD methods, an expression for the likelihood ratio test (LRT) based on statistical models is derived in discrete Fourier transform (DFT) domain, Then, speech or noise is decided by comparing the value of the expression with a threshold, This paper presents a new statistical VAD method based on a signal subspace approach, The probabilistic principal component analysis (PPCA) is employed to obtain a signal subspace model that incorporates probabilistic model of noisy signal to the signal subspace method, The proposed approach provides a novel decision rule based on LRT in the signal subspace domain, Experimental results show that the proposed signal subspace model based VAD method outperforms those based on the widely used Gaussian distribution in DFT domain.
https://doi.org/10.7776/ASK.2008.27.7.372 인용 PDF KSCI

Reserved Slot Allocation Scheme for Voice Service in WATM MAC (무선 비동기 전송모드 매체 접근제어에서 음성서비스를 위한 예약 슬롯 할당 알고리즘)

김관웅;배성환;전병실
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.7
- /
- pp.101-108
- /
- 2001
In this paper we focus on dynamic reservation slot allocation scheme for supporting QoS of a voice traffic in WATM MAC. Especially, voice traffic is the most important real-time object, and so we propose a new MAC protocol for voice traffic over WATM networks in the multimedia environment. According to the characteristics of voice traffic which is repeatedly in silent state and active state, new protocol allocates reservation slots dynamically with respect to the number of silent voice source of which starting time is stored to the state table in base station (BS). The simulation results show that the proposed protocol has better performance than slotted ALOHA in average access delay, collision rate, better than NC-PRMA(Non Collision Packet Reservation Multiple Access) in band width efficiency, and can provide a certain level of QoS requirement by the given slot assignment even though the number of voice terminals is increased.
PDF

Reduction Algorithm of Environmental Noise by Multi-band Filter (멀티밴드필터에 의한 환경잡음억압 알고리즘)

Choi, Jae-Seung
- Journal of the Korea Society of Computer and Information
- /
- v.17 no.8
- /
- pp.91-97
- /
- 2012
This paper first proposes the speech recognition algorithm by detection of the speech and noise sections at each frame, then proposes the reduction algorithm of environmental noise by multi-band filter which removes the background noises at each frame according to detection of the speech and noise sections. The proposed algorithm reduces the background noises using filter bank sub-band domain after extracting the features from the speech data. In this experiment, experimental results of the proposed noise reduction algorithm by the multi-band filter demonstrate using the speech and noise data, at each frame. Based on measuring the spectral distortion, experiments confirm that the proposed algorithm is effective for the speech by corrupted the noise.
https://doi.org/10.9708/jksci.2012.17.8.091 인용 PDF KSCI

Search Result 1,997, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)