A New Endpoint Detection Method Based on Chaotic System Features for Digital Isolated Word Recognition System

음성인식을 위한 혼돈시스템 특성기반의 종단탐색 기법

  • Zang, Xian (Control and Instrumentation Department, Chonbuk National University) ;
  • Chong, Kil-To (Electronics and Information Department, Chonbuk National University)
  • 장한 (전북대학교 제어계측공학과) ;
  • 정길도 (전북대학교 전자정보공학부)
  • Published : 2009.09.25

Abstract

In the research field of speech recognition, pinpointing the endpoints of speech utterance even with the presence of background noise is of great importance. These noise present during recording introduce disturbances which complicates matters since what we just want is to get the stationary parameters corresponding to each speech section. One major cause of error in automatic recognition of isolated words is the inaccurate detection of the beginning and end boundaries of the test and reference templates, thus the necessity to find an effective method in removing the unnecessary regions of a speech signal. The conventional methods for speech endpoint detection are based on two linear time-domain measurements: the short-time energy, and short-time zero-crossing rate. They perform well for clean speech but their precision is not guaranteed if there is noise present, since the high energy and zero-crossing rate of the noise is mistaken as a part of the speech uttered. This paper proposes a novel approach in finding an apparent threshold between noise and speech based on Lyapunov Exponents (LEs). This proposed method adopts the nonlinear features to analyze the chaos characteristics of the speech signal instead of depending on the unreliable factor-energy. The excellent performance of this approach compared with the conventional methods lies in the fact that it detects the endpoints as a nonlinearity of speech signal, which we believe is an important characteristic and has been neglected by the conventional methods. The proposed method extracts the features based only on the time-domain waveform of the speech signal illustrating its low complexity. Simulations done showed the effective performance of the Proposed method in a noisy environment with an average recognition rate of up 92.85% for unspecified person.

음성 인식 연구에서 잡음이 있는 상태에서 음성 발음상의 시작점과 종단점을 찾는 것은 매우 중요하다. 기존 음성인식 시스템의 오차는 대부분 참고템플릿의 시작점과 종단점을 왜란이나 잡음으로 인해 자동적으로 찾지 못했을 경우 발생한다. 따라서 음성 신호상에서 필요 없는 부분을 제거할 수 있는 방법이 필요하다. 기존의 음성 종단점을 찾는 방법으로는 시간도메인 측정방법, 미세시간 에너지 분석, 영교차율 방법이 있다. 위의 방법들은 저주파 신호 노이즈의 영향에 정밀성을 보장을 못한다. 따라서 본 논문에서는 시간영역상에서 리야프노프 지수를 이용한 종단점 인식 알고리즘을 제안하였다. 기존의 방법들과의 비교를 통해 제안한 방법의 성능 우수성을 보였으며, 시뮬레이션 및 실험을 통해 잡음환경에서도 음성종단 인식이 가능함을 보였다.

Keywords

References

  1. Woo-Ho Shin, Byoung-Soo Lee, Yun-Keun Lee, Jong-SeokLee, 'Speech/non-speech Classification using Multiple Features for Robust Endpoint Detection', International Conference on Acoustics, Speech, and Signal Processing, 2000
  2. Stefaan Van Gerven, Fei Xie, 'A Comparative Study of Speech Detection Methods', European Conference on Speech, Communication and Techonlogy,1997
  3. Ramalingam Hariharan, Jula Hakkinen, Kari Laurila, 'Robust End-of-utterance Detection for Real-time Speech Recognition Applications', International Conference on Acoustics, Speech, and Signal Processing, 2001
  4. A. Acero, C. Crespo, C. De la Torre, J. Torrecilla, 'Robust HMM-based Endpoint Detector', International Conference on Acoustics, Speech, and Signal Processing, 1994
  5. E. Kosmides, E. Dermatas, G. Kokkinakis, 'Stochastic Endpoint Detection in Noisy Speech', SPECOM Worshop, 109-114, 1997
  6. L. R. Rabiner and M. R. Sambur, 'An Algorithm for Determining the Endpoints of Isolated Utterances', Bell Syst. Tech. J., vol. 54, No. 2, pp. 297-315, February 1975
  7. M. R. Sambur and L. R. Rabiner, 'A Speaker Independent Digit-Recognition System', Bell Syst. Tech. J., vol. 54, No. 1, pp. 81-102, January 1975
  8. Kokkinos, I.; Maragos, P., 'Nonlinear peech analysis using models for chaotic systems', Speech and Audio Processing, IEEE, volume 13, Issue 6, Page(s): 1098-1109, Nov. 2005 https://doi.org/10.1109/TSA.2005.852982
  9. Adriano. Petry, D. A. C. Barone, 'Preliminary experiments in speaker verification using time-dependent largest Lyapunov expontent', Computer Speech and Language, 403-413, 17 (2003)