Noise Robust Automatic Speech Recognition Scheme with Histogram of Oriented Gradient Features

Park, Taejin;Beack, SeungKwan;Lee, Taejin;

doi:10.5573/IEIESPC.2014.3.5.259

IEIE Transactions on Smart Processing and Computing

제3권5호
/
Pages.259-266
/
2014
/
2287-5255(eISSN)

대한전자공학회 (The Institute of Electronics and Information Engineers)

DOI QR Code

Noise Robust Automatic Speech Recognition Scheme with Histogram of Oriented Gradient Features

Park, Taejin (Audio Research Laboratory, Electronics and Telecommunications Research Institute) ;
Beack, SeungKwan (Audio Research Laboratory, Electronics and Telecommunications Research Institute) ;
Lee, Taejin (Audio Research Laboratory, Electronics and Telecommunications Research Institute)

투고 : 2014.01.15
심사 : 2014.07.28
발행 : 2014.10.31

https://doi.org/10.5573/IEIESPC.2014.3.5.259 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

In this paper, we propose a novel technique for noise robust automatic speech recognition (ASR). The development of ASR techniques has made it possible to recognize isolated words with a near perfect word recognition rate. However, in a highly noisy environment, a distinct mismatch between the trained speech and the test data results in a significantly degraded word recognition rate (WRA). Unlike conventional ASR systems employing Mel-frequency cepstral coefficients (MFCCs) and a hidden Markov model (HMM), this study employ histogram of oriented gradient (HOG) features and a Support Vector Machine (SVM) to ASR tasks to overcome this problem. Our proposed ASR system is less vulnerable to external interference noise, and achieves a higher WRA compared to a conventional ASR system equipped with MFCCs and an HMM. The performance of our proposed ASR system was evaluated using a phonetically balanced word (PBW) set mixed with artificially added noise.

키워드

참고문헌

R. P. Lippmann, "Speech recognition by machines and humans," Speech communication, Vol. 22, No. 1, pp. 1-15. 1997. Article (CrossRef Link) https://doi.org/10.1016/S0167-6393(97)00021-6
A. Torre, D. Fohr, and J. P. Haton, "On the Comparison of Front-Ends for Robust Speech Recognition in Car Environments," in Proc. ISCA ITRW on Adaptation Methods for Speech Recognition, Sophia Antipolis, France, 2001, pp. 105-108. Article (CrossRef Link)
G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The HTK book. Cambridge: Entropic Cambridge Research Laboratory, 1997. Article (CrossRef Link)
S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," Acoustics, Speech and Signal Processing, IEEE Transactions on, Vol. 28, No. 4, pp. 357-366, Aug. 1980. Article (CrossRef Link) https://doi.org/10.1109/TASSP.1980.1163420
A.E. Rosenberg, C.H. Lee, F. K. Soong, 1994. "Cepstral channel normalization techniques for HMM-based speaker verification," in Proc. ICSLP, Vol. 4, pp. 1835-1838, 1994. Article (CrossRef Link)
O. Viikki and K. Laurila, "Cepstral domain segmental feature vector normalization for noise robust speech recognition," Speech Communication, Vol. 25, No. 1-3, pp. 133-147, 1998. Article (CrossRef Link) https://doi.org/10.1016/S0167-6393(98)00033-8
A. Torre, et al., "Histogram equalization of speech representation for robust speech recognition," Speech and Audio Processing, IEEE Transactions on, Vol. 13, No. 3, pp. 355-366, May. 2005. Article (CrossRef Link) https://doi.org/10.1109/TSA.2005.845805
H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," Acoustical Society of America Journal, Vol. 87, pp.1738-1752, Apr. 1990. Article (CrossRef Link) https://doi.org/10.1121/1.399423
H. Hermansky and N. Morgan, "RASTA processing of speech," Speech and Audio Processing, IEEE Transactions on, Vol. 2, No. 4, pp. 578-589, Oct. 1994. Article (CrossRef Link) https://doi.org/10.1109/89.326616
M. R. Schadler, R. Marc, B. T. Meyer, and B. Kollmeier. "Spectro-temporal modulation subspacespanning filter bank features for robust automatic speech recognition." The Journal of the Acoustical Society of America, Vol. 131, No. 5, pp. 4134-4151, 2012. Article (CrossRef Link)
N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection," in Proc. CVPR, San Diego, CA, USA, Jun, 2005, pp. 886-893. Article (CrossRef Link)
D. O'Shaughnessy, Speech communication: human and machine, Addison-Wesley, 1987, p. 150. Article (CrossRef Link)
R. Martin, "Speech enhancement based on minimum mean-square error estimation and supergaussian priors." Speech and Audio Processing, IEEE Transactions on, Vol. 13, No. 5, pp. 845-856, 2005. Article (CrossRef Link) https://doi.org/10.1109/TSA.2005.851927
T. Gerkmann, and R. Martin. "Empirical distributions of DFT-domain speech coefficients based on estimated speech variances." Proceedings of International Workshop on Acoustic Echo and Noise Control (IWAENC), Tel Aviv, Israel, 2010. Article (CrossRef Link)
N. Bassiou, and C. Kotropoulos, "Color image histogram equalization by absolute discounting backoff." Computer Vision and Image Understanding, Vol. 107, No. 1, pp. 108-122, 2007. Article (CrossRef Link) https://doi.org/10.1016/j.cviu.2006.11.012
C. C. Chang, and C. J. Lin. "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 27, No. 2.3, 2011. Article (CrossRef Link)
Y.-J Lee, B.-W. Kim, J.-J Kim, O.-Y. Yang, and S.-Y. Lim, "Some considerations for construction of PBW set," in Proc. of the 12th Workshop on Speech Communications and Signal Processing. Acoustical Society of Korea, pp. 310-314, Jun. 1995. Article (CrossRef Link)
D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, M. D. Plumbley, "Detection and classification of acoustic scenes and events: An IEEE AASP challenge," Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013 IEEE Workshop on , vol., no., pp.1,4, 20-23 Oct. 2013. Article (CrossRef Link)

IEIE Transactions on Smart Processing and Computing

Noise Robust Automatic Speech Recognition Scheme with Histogram of Oriented Gradient Features

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)