Search | Korea Science

Cepstral Distance and Log-Energy Based Silence Feature Normalization for Robust Speech Recognition (강인한 음성인식을 위한 켑스트럼 거리와 로그 에너지 기반 묵음 특징 정규화)

Shen, Guang-Hu;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.29 no.4
- /
- pp.278-285
- /
- 2010
The difference between training and test environments is one of the major performance degradation factors in noisy speech recognition and many silence feature normalization methods were proposed to solve this inconsistency. Conventional silence feature normalization method represents higher classification performance in higher SNR, but it has a problem of performance degradation in low SNR due to the low accuracy of speech/silence classification. On the other hand, cepstral distance represents well the characteristic distribution of speech/silence (or noise) in low SNR. In this paper, we propose a Cepstral distance and Log-energy based Silence Feature Normalization (CLSFN) method which uses both log-energy and cepstral euclidean distance to classify speech/silence for better performance. Because the proposed method reflects both the merit of log energy being less affected with noise in high SNR and the merit of cepstral distance having high discrimination accuracy for speech/silence classification in low SNR, the classification accuracy will be considered to be improved. The experimental results showed that our proposed CLSFN presented the improved recognition performances comparing with the conventional SFN-I/II and CSFN methods in all kinds of noisy environments.
https://doi.org/10.7776/ASK.2010.29.4.278 인용 PDF KSCI

Harmonics-based Spectral Subtraction and Feature Vector Normalization for Robust Speech Recognition

Beh, Joung-Hoon;Lee, Heung-Kyu;Kwon, Oh-Il;Ko, Han-Seok
- Speech Sciences
- /
- v.11 no.1
- /
- pp.7-20
- /
- 2004
In this paper, we propose a two-step noise compensation algorithm in feature extraction for achieving robust speech recognition. The proposed method frees us from requiring a priori information on noisy environments and is simple to implement. First, in frequency domain, the Harmonics-based Spectral Subtraction (HSS) is applied so that it reduces the additive background noise and makes the shape of harmonics in speech spectrum more pronounced. We then apply a judiciously weighted variance Feature Vector Normalization (FVN) to compensate for both the channel distortion and additive noise. The weighted variance FVN compensates for the variance mismatch in both the speech and the non-speech regions respectively. Representative performance evaluation using Aurora 2 database shows that the proposed method yields 27.18% relative improvement in accuracy under a multi-noise training task and 57.94% relative improvement under a clean training task.
PDF

An Improved Normalization Method for Haar-like Features for Real-time Object Detection (실시간 객체 검출을 위한 개선된 Haar-like Feature 정규화 방법)

Park, Ki-Yeong;Hwang, Sun-Young
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.36 no.8C
- /
- pp.505-515
- /
- 2011
This paper describes a normalization method of Haar-like features used for object detection. Previous method which performs variance normalization on Haar-like features requires a lot of calculations, since it uses an additional integral image for calculating the standard deviation of intensities of pixels in a candidate window and increases possibility of false detection in the area where variance of brightness is small. The proposed normalization method can be performed much faster than the previous method by not using additional integral image and classifiers which are trained with the proposed normalization method show robust performance in various lighting conditions. Experimental result shows that the object detector which uses the proposed method is 26% faster than the one which uses the previous method. Detection rate is also improved by 5% without increasing false alarm rate and 45% for the samples whose brightness varies significantly.
https://doi.org/10.7840/KICS.2011.36C.8.505 인용 PDF KSCI

Feature-Vector Normalization for SVM-based Music Genre Classification (SVM에 기반한 음악 장르 분류를 위한 특징벡터 정규화 방법)

Lim, Shin-Cheol;Jang, Sei-Jin;Lee, Seok-Pil;Kim, Moo-Young
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.48 no.5
- /
- pp.31-36
- /
- 2011
In this paper, Mel-Frequency Cepstral Coefficient (MFCC), Decorrelated Filter Bank (DFB), Octave-based Spectral Contrast (OSC), Zero-Crossing Rate (ZCR), and Spectral Contract/Roll-Off are combined as a set of multiple feature-vectors for the music genre classification system based on the Support Vector Machine (SVM) classifier. In the conventional system, feature vectors for the entire genre classes are normalized for the SVM model training and classification. However, in this paper, selected feature vectors that are compared based on the One-Against-One (OAO) SVM classifier are only used for normalization. Using OSC as a single feature-vector and the multiple feature-vectors, we obtain the genre classification rates of 60.8% and 77.4%, respectively, with the conventional normalization method. Using the proposed normalization method, we obtain the increased classification rates by 8.2% and 3.3% for OSC and the multiple feature-vectors, respectively.
PDF KSCI

Representative Batch Normalization for Scene Text Recognition

Sun, Yajie;Cao, Xiaoling;Sun, Yingying
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.7
- /
- pp.2390-2406
- /
- 2022
Scene text recognition has important application value and attracted the interest of plenty of researchers. At present, many methods have achieved good results, but most of the existing approaches attempt to improve the performance of scene text recognition from the image level. They have a good effect on reading regular scene texts. However, there are still many obstacles to recognizing text on low-quality images such as curved, occlusion, and blur. This exacerbates the difficulty of feature extraction because the image quality is uneven. In addition, the results of model testing are highly dependent on training data, so there is still room for improvement in scene text recognition methods. In this work, we present a natural scene text recognizer to improve the recognition performance from the feature level, which contains feature representation and feature enhancement. In terms of feature representation, we propose an efficient feature extractor combined with Representative Batch Normalization and ResNet. It reduces the dependence of the model on training data and improves the feature representation ability of different instances. In terms of feature enhancement, we use a feature enhancement network to expand the receptive field of feature maps, so that feature maps contain rich feature information. Enhanced feature representation capability helps to improve the recognition performance of the model. We conducted experiments on 7 benchmarks, which shows that this method is highly competitive in recognizing both regular and irregular texts. The method achieved top1 recognition accuracy on four benchmarks of IC03, IC13, IC15, and SVTP.
https://doi.org/10.3837/tiis.2022.07.015 인용 PDF KSCI HTML

On-Line Blind Channel Normalization for Noise-Robust Speech Recognition

Jung, Ho-Young
- IEIE Transactions on Smart Processing and Computing
- /
- v.1 no.3
- /
- pp.143-151
- /
- 2012
A new data-driven method for the design of a blind modulation frequency filter that suppresses the slow-varying noise components is proposed. The proposed method is based on the temporal local decorrelation of the feature vector sequence, and is done on an utterance-by-utterance basis. Although the conventional modulation frequency filtering approaches the same form regardless of the task and environment conditions, the proposed method can provide an adaptive modulation frequency filter that outperforms conventional methods for each utterance. In addition, the method ultimately performs channel normalization in a feature domain with applications to log-spectral parameters. The performance was evaluated by speaker-independent isolated-word recognition experiments under additive noise environments. The proposed method achieved outstanding improvement for speech recognition in environments with significant noise and was also effective in a range of feature representations.
PDF

Line feature extraction in a noisy image

Lee, Joon-Woong;Oh, Hak-Seo;Kweon, In-So
- 제어로봇시스템학회:학술대회논문집
- /
- 1996.10a
- /
- pp.137-140
- /
- 1996
Finding line segments in an intensity image has been one of the most fundamental issues in computer vision. In complex scenes, it is hard to detect the locations of point features. Line features are more robust in providing greater positional accuracy. In this paper we present a robust "line features extraction" algorithm which extracts line feature in a single pass without using any assumptions and constraints. Our algorithm consists of five steps: (1) edge scanning, (2) edge normalization, (3) line-blob extraction, (4) line-feature computation, and (5) line linking. By using edge scanning, the computational complexity due to too many edge pixels is drastically reduced. Edge normalization improves the local quantization error induced from the gradient space partitioning and minimizes perturbations on edge orientation. We also analyze the effects of edge processing, and the least squares-based method and the principal axis-based method on the computation of line orientation. We show its efficiency with some real images.al images.
PDF

Energy Feature Normalization for Robust Speech Recognition in Noisy Environments

Lee, Yoon-Jae;Ko, Han-Seok
- Speech Sciences
- /
- v.13 no.1
- /
- pp.129-139
- /
- 2006
In this paper, we propose two effective energy feature normalization methods for robust speech recognition in noisy environments. In the first method, we estimate the noise energy and remove it from the noisy speech energy. In the second method, we propose a modified algorithm for the Log-energy Dynamic Range Normalization (ERN) method. In the ERN method, the log energy of the training data in a clean environment is transformed into the log energy in noisy environments. If the minimum log energy of the test data is outside of a pre-defined range, the log energy of the test data is also transformed. Since the ERN method has several weaknesses, we propose a modified transform scheme designed to reduce the residual mismatch that it produces. In the evaluation conducted on the Aurora2.0 database, we obtained a significant performance improvement.
PDF

Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network (심층신경망을 이용한 짧은 발화 음성인식에서 극점 필터링 기반의 특징 정규화 적용)

Han, Jaemin;Kim, Min Sik;Kim, Hyung Soon
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.1
- /
- pp.64-68
- /
- 2020
In a conventional speech recognition system using Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), the cepstral feature normalization method based on pole filtering was effective in improving the performance of recognition of short utterances in noisy environments. In this paper, the usefulness of this method for the state-of-the-art speech recognition system using Deep Neural Network (DNN) is examined. Experimental results on AURORA 2 DB show that the cepstral mean and variance normalization based on pole filtering improves the recognition performance of very short utterances compared to that without pole filtering, especially when there is a large mismatch between the training and test conditions.
https://doi.org/10.7776/ASK.2020.39.1.064 인용 PDF KSCI

Vocal Tract Normalization Using The Power Spectrum Warping (파워 스펙트럼 warping을 이용한 성도 정규화)

Yu, Il-Su;Kim, Dong-Ju;No, Yong-Wan;Hong, Gwang-Seok
- Proceedings of the KIEE Conference
- /
- 2003.11b
- /
- pp.215-218
- /
- 2003
The method of vocal tract normalization has been known as a successful method for improving the accuracy of speech recognition. A frequency warping procedure based low complexity and maximum likelihood has been generally applied for vocal tract normalization. In this paper, we propose a new power spectrum warping procedure that can be improve on vocal tract normalization performance than a frequency warping procedure. A mechanism for implementing this method can be simply achieved by modifying the power spectrum of filter bank in Mel-frequency cepstrum feature(MFCC) analysis. Experimental study compared our Proposal method with the well-known frequency warping method. The results have shown that the power spectrum warping is better 50% about the recognition performance than the frequency warping.
PDF

Search Result 156, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)