Search | Korea Science

Energy Feature Normalization for Robust Speech Recognition in Noisy Environments

Lee, Yoon-Jae;Ko, Han-Seok
- Speech Sciences
- /
- v.13 no.1
- /
- pp.129-139
- /
- 2006
In this paper, we propose two effective energy feature normalization methods for robust speech recognition in noisy environments. In the first method, we estimate the noise energy and remove it from the noisy speech energy. In the second method, we propose a modified algorithm for the Log-energy Dynamic Range Normalization (ERN) method. In the ERN method, the log energy of the training data in a clean environment is transformed into the log energy in noisy environments. If the minimum log energy of the test data is outside of a pre-defined range, the log energy of the test data is also transformed. Since the ERN method has several weaknesses, we propose a modified transform scheme designed to reduce the residual mismatch that it produces. In the evaluation conducted on the Aurora2.0 database, we obtained a significant performance improvement.
PDF

A Log-Energy Feature Normalization Method Using ARMA Filter (ARMA 필터를 이용한 로그 에너지 특징의 정규화 방법)

Shen, Guang-Hu;Jung, Ho-Youl;Chung, Hyun-Yeol
- Journal of Korea Multimedia Society
- /
- v.11 no.10
- /
- pp.1325-1337
- /
- 2008
The difference of environments between training and recognition is the major reason of degradation of speech recognition. To solve this mismatch of environments, various noise processing methods have been studied. Among them, ERN(log-Energy dynamic Range Normalization) and SEN(Silence Energy Normalization) for normalization of log energy features show better performance than others. However, these methods have a problem that they can hardly achieve normalization for the relatively higher values of log energy features and the environmental mismatch caused by this problem becomes bigger especially in low SNR environments. To solve these problems, we propose applying ARMA filter as post-processing for smoothing log energy features by calculating the moving average in auto-regression scheme. From the recognition results conducted on Aurora 2.0 DB, the proposed method shows improved recognition results comparing with conventional methods.
PDF

Cepstral Distance and Log-Energy Based Silence Feature Normalization for Robust Speech Recognition (강인한 음성인식을 위한 켑스트럼 거리와 로그 에너지 기반 묵음 특징 정규화)

Shen, Guang-Hu;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.29 no.4
- /
- pp.278-285
- /
- 2010
The difference between training and test environments is one of the major performance degradation factors in noisy speech recognition and many silence feature normalization methods were proposed to solve this inconsistency. Conventional silence feature normalization method represents higher classification performance in higher SNR, but it has a problem of performance degradation in low SNR due to the low accuracy of speech/silence classification. On the other hand, cepstral distance represents well the characteristic distribution of speech/silence (or noise) in low SNR. In this paper, we propose a Cepstral distance and Log-energy based Silence Feature Normalization (CLSFN) method which uses both log-energy and cepstral euclidean distance to classify speech/silence for better performance. Because the proposed method reflects both the merit of log energy being less affected with noise in high SNR and the merit of cepstral distance having high discrimination accuracy for speech/silence classification in low SNR, the classification accuracy will be considered to be improved. The experimental results showed that our proposed CLSFN presented the improved recognition performances comparing with the conventional SFN-I/II and CSFN methods in all kinds of noisy environments.
PDF KSCI

Performance Improvements for Silence Feature Normalization Method by Using Filter Bank Energy Subtraction (필터 뱅크 에너지 차감을 이용한 묵음 특징 정규화 방법의 성능 향상)

Shen, Guanghu;Choi, Sook-Nam;Chung, Hyun-Yeol
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.35 no.7C
- /
- pp.604-610
- /
- 2010
In this paper we proposed FSFN (Filter bank sub-band energy subtraction based CLSFN) method to improve the recognition performance of the existing CLSFN (Cepstral distance and Log-energy based Silence Feature Normalization). The proposed FSFN reduces the energy of noise components in filter bank sub-band domain when extracting the features from speech data. This leads to extract the enhanced cepstral features and thus improves the accuracy of speech/silence classification using the enhanced cepstral features. Therefore, it can be expected to get improved performance comparing with the existing CLSFN. Experimental results conducted on Aurora 2.0 DB showed that our proposed FSFN method improves the averaged word accuracy of 2% comparing with the conventional CLSFN method, and FSFN combined with CMVN (Cepstral Mean and Variance Normalization) also showed the best recognition performance comparing with others.
PDF KSCI

Identification of Tetrachloroethylene Sorption Behaviors in Natural Sorbents Via Sorption Models

Al Masud, Md Abdullah;Choi, Jiyeon;Shin, Won Sik
- Journal of Soil and Groundwater Environment
- /
- v.27 no.6
- /
- pp.47-57
- /
- 2022
A number of different methods have been used for modeling the sorption of volatile organic chlorinated compounds such as tetrachloroethylene/perchloroethylene (PCE). In this study, PCE was adsorbed in several natural sorbents, i.e., Pahokee peat, vermicompost, BionSoil^®, and natural soil, in the batch experiments. Several sorption models such as linear, Freundlich, solubility-normalized Freundlich model, and Polanyi-Manes model (PMM) were used to analyze sorption isotherms. The relationship between sorption model parameters, organic carbon content (f_oc), and elemental C/N ratio was studied. The organic carbon normalized partition coefficient values (log K_oc = 1.50-3.13) in four different sorbents were less than the logarithm of the octanol-water partition coefficient (log K_ow = 3.40) of PCE due to high organic carbon contents. The log K_oc decreased linearly with log f_oc and log C/N ratio, but increased linearly with log O/C, log H/C, and log (N+O)/C ratio. Both log K_F,oc or log K_F,oc decreased linearly with log f_oc (R² = 0.88-0.92) and log C/N ratio (R² = 0.57-0.76), but increased linearly with log (N+O)/C (R² = 0.93-0.95). The log q_max,oc decreased linearly as log f_oc and log C/N increased, whereas it increased with log O/C, log H/C and log (N+O)/C ratios. The log q_max,oc increased linearly with (N+O)/C indicating a strong dependence of q_max,oc on the polarity index. The results showed that PCE sorption behaviors were strongly correlated with the physicochemical properties of soil organic matter (SOM).
https://doi.org/10.7857/JSGE.2022.27.6.047 인용 PDF KSCI HTML

An Isolated Word Recognition Using the Mellin Transform (Mellin 변환을 이용한 격리 단어 인식)

김진만;이상욱;고세문
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.24 no.5
- /
- pp.905-913
- /
- 1987
This paper presents a speaker dependent isolated digit recognition algorithm using the Mellin transform. Since the Mellin transform converts a scale information into a phase information, attempts have been made to utilize this scale invariance property of the Mellin transform in order to alleviate a time-normalization procedure required for a speech recognition. It has been found that good results can be obtained by taking the Mellin transform to the features such as a ZCR, log energy, normalized autocorrelation coefficients, first predictor coefficient and normalized prediction error. We employed a difference function for evaluating a similarity between two patterns. When the proposed algorithm was tested on Korean digit words, a recognition rate of 83.3% was obtained. The recognition accuracy is not compatible with the other technique such as LPC distance however, it is believed that the Mellin transform can effectively perform the time-normalization processing for the speech recognition.
PDF

Speech Recognition in Noisy Environments Using Modified Gain Function (변형된 이득함수를 이용한 잡음 환경에서의 음성인식)

Jin, Ho-Sung;Lee, Sang-Ho;Hong, Jae-Keun
- Proceedings of the KAIS Fall Conference
- /
- 2010.05a
- /
- pp.119-123
- /
- 2010
본 논문에서는 2단계 잡음제거 방법의 이득함수를 이용한 고조파 복원 잡음제거 방법의 이득함수를 조정하여 기존의 방법보다 음성개선을 향상시켰고, 제안한 방법으로 개선된 음성을 음성인식 기술에 적용하였다. 본 논문에서는 기존 방법으로 음성개선 결과 묵음구간에서 음성구간으로 변화는 구간에서 이전 프레임의 추정된 음성신호로 스펙트럼의 이득함수가 구해져서 음성이 발생하는 구간에서 왜곡이 발생한다. 따라서 본 논문에서는 이러한 현상을 개선시키기 위해 2단계 잡음제거 방법의 이득함수를 추정된 a priori SNR과 비교하여 이득함수를 조정하고, 2단계 잡음제거 방법의 이득함수를 고조파 복원 방법의 이득함수와 비교하여 이득함수를 조정하여 음성을 개선하는 방법을 제안하였다. 그리고 음성인식을 위한 특징벡터 추출을 위해 제안한 방법으로 개선된 음성의 대수 에너지를 정규화 하는 대수 에너지 정규화 방법(Log Energy Normalization)을 음성인식 방법에 적용하였다.
PDF

Search Result 7, Processing Time 0.031 seconds

Energy Feature Normalization for Robust Speech Recognition in Noisy Environments

A Log-Energy Feature Normalization Method Using ARMA Filter (ARMA 필터를 이용한 로그 에너지 특징의 정규화 방법)

Cepstral Distance and Log-Energy Based Silence Feature Normalization for Robust Speech Recognition (강인한 음성인식을 위한 켑스트럼 거리와 로그 에너지 기반 묵음 특징 정규화)

Performance Improvements for Silence Feature Normalization Method by Using Filter Bank Energy Subtraction (필터 뱅크 에너지 차감을 이용한 묵음 특징 정규화 방법의 성능 향상)

Identification of Tetrachloroethylene Sorption Behaviors in Natural Sorbents Via Sorption Models

An Isolated Word Recognition Using the Mellin Transform (Mellin 변환을 이용한 격리 단어 인식)

Speech Recognition in Noisy Environments Using Modified Gain Function (변형된 이득함수를 이용한 잡음 환경에서의 음성인식)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)