A Study on Numeral Speech Recognition Using Integration of Speech and Visual Parameters under Noisy Environments

Lee, Sang-Won;Park, In-Jung;

Journal of the Institute of Electronics Engineers of Korea CI (전자공학회논문지CI)

Volume 38 Issue 3
/
Pages.61-67
/
2001
/
1229-6376(pISSN)

The Institute of Electronics and Information Engineers (대한전자공학회)

A Study on Numeral Speech Recognition Using Integration of Speech and Visual Parameters under Noisy Environments

잡음환경에서 음성-영상 정보의 통합 처리를 사용한 숫자음 인식에 관한 연구

Lee, Sang-Won (Dept. of Electronic Engineering, Dankook Univ.) ;
Park, In-Jung (Dept. of Electronic Engineering, Dankook Univ.)

이상원 (단국대학교 전자공학과) ;
박인정 (단국대학교 전자공학과)

Published : 2001.05.25

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, a method that apply LP algorithm to image for speech recognition is suggested, using both speech and image information for recogniton of korean numeral speech. The input speech signal is pre-emphasized with parameter value 0.95, analyzed for B th LP coefficients using Hamming window, autocorrelation and Levinson-Durbin algorithm. Also, a gray image signal is analyzed for 2-dimensional LP coefficients using autocorrelation and Levinson-Durbin algorithm like speech. These parameters are used for input parameters of neural network using back-propagation algorithm. The recognition experiment was carried out at each noise level, three numeral speechs, '3','5', and '9' were enhanced. Thus, in case of recognizing speech with 2-dimensional LP parameters, it results in a high recognition rate, a low parameter size, and a simple algorithm with no additional feature extraction algorithm.

본 논문에서는 한국어 숫자음 인식을 위해 음성과 영상 정보를 사용하고, 음성에 사용하는 선형예측계수 알고리즘을 영상에 적용하는 방법을 제안한다. 입력으로 얻어지는 음성신호는 0.95의 매개변수를 통해 고역 신호가 강조되고, 해밍창과 자기상관 분석, Levinson-Durbin 알고리즘에 의해 13차 선형예측계수를 구한다. 마찬가지로, 그레이 영상신호도, 음성의 자기상관 분석, Levinson-Durbin 알고리즘을 사용하여 13차의 2차원 선형예측계수를 구한다, 이러한 음성/영상 신호에 대한 선형예측계수들은 다층 신경회로망에 적용하여 학습이 이루어졌고, 각 레벨의 잡음이 섞인 음성신호를 적용한 결과, 숫자음 '3', '5', '9' 에서 음성만으로 인식한 결과보다 훨씬 좋은 인식결과를 얻을 수 있었다. 결과적으로, 본 연구에서는 영상 신호의 2차원 선형 예측 계수들이 음성인식에 사용될 경우, 특징 추출에 따른 부가적인 알고리즘이 새로 고안될 필요가 없이, 음성특징 계수를 추출하는 방법을 그대로 사용할 수 있으며, 또한 데이터량과 인식율이 잡음 환경에서 보다 향상되는 효율적인 방법을 제시하고 있음을 알 수 있었다.

Keywords

References

L. R. Rabiner and B. H. Juang, Fundamentals of speech recognition, Prentice-Hall Inc., 1993
Y. Gong, 'Speech recognition in noisy environnments: A survey. Speech Communication', 16: pp.261-291, 1995 https://doi.org/10.1016/0167-6393(94)00059-J
Alejandro Acero, Acoustical and Environmental Robustness in Automatic Speech Recognition, Kluwer Academic Publishers, 1992
K. Ries, 'Hmm and Neural Network based Speech act detection', ICASSP, 1999 https://doi.org/10.1109/ICASSP.1999.758171
E. D. Petajan, 'Automatic Lipreading to enhance speech recognition,' IEEE Global Telecom-munications Conference, pp.265-272, 1981
E. D. Petajan, B. Bischoff and N. M. Brooke, 'An improved automatic lipreading system to enhance speech recognition,' CHI88, pp. 19-25, 1988 https://doi.org/10.1145/57167.57170
K. Mase and A Pentland, 'Automatic optically based recognition of speech,' Pattern Recognition Letters, vol. 8, no. 3, pp. 159-164, 1988 https://doi.org/10.1016/0167-8655(88)90094-3
B. P. Yahas, M. H. Goldstein and T. J. Sejnowski, 'Integration of acostic and visual speech signals using neural networks,' IEEE Communication Magazine, pp. 65-71, 1989 https://doi.org/10.1109/35.41402
D. G. Stork, G. Wolff and E. Levine, 'Neural network lipreading system for improved speech recognition,' lnt'l. Joint Conf. on Neural Networks, pp. 285-295, 1992 https://doi.org/10.1109/IJCNN.1992.226994
A. J. Goldehen, Continuous Automatic Speech Recognition by Lipreading, Ph.D. Dissertation, George Washington University, 1993
J. L. Flanagan, A. C. Surendean and E. E. Jan, 'Spatially selective sound capture for speech and audio processing', Speech Communication, 13: pp. 207-222, 1993 https://doi.org/10.1016/0167-6393(93)90072-S
Allen Gersho, Robert M.Gray, Vector Quantization and Signal Compression, KLUWER ACADEMIC PUBLISHERS, 1992
A. Ogihara, N. Ishhara, E. Asano and H. Shibata, 'Speech Recognition Method by Fusion of Auditory and Visual Information Using Dempster-Shafer's Theorm,' Proc. of ITC CSCC, pp. 386-389, Niigata, Aug. 1999
PAUL M. EMBREE, BRUCE KIMBLE, C. Language Algorithms for Digital Signal Processing, Prentice-Hall International, Inc
J. Luettin, Visual Speech And Speaker Recognition, Ph.D Thesis, Dept. of Computer Science, University of Sheffield, May, 1997

Journal of the Institute of Electronics Engineers of Korea CI (전자공학회논문지CI)

A Study on Numeral Speech Recognition Using Integration of Speech and Visual Parameters under Noisy Environments

잡음환경에서 음성-영상 정보의 통합 처리를 사용한 숫자음 인식에 관한 연구

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)