Endpoint Detection of Speech Signal Using Lyapunov Exponent

리아프노프 지수를 이용한 음성신호 종점 탐색 방법

  • Zang, Xian (Control and Instrumentation Department, Chonbuk National University) ;
  • Kim, Jeong-Yeon (Control and Instrumentation Department, Chonbuk National University) ;
  • Chong, Kil-To (Electronics and Information Department, Chonbuk National University)
  • 장한 (전북대학교 제어계측공학과) ;
  • 김정연 (전북대학교 제어계측공학과) ;
  • 정길도 (전북대학교 전자정보공학부)
  • Published : 2009.01.25

Abstract

In the research of speech recognition, locating the beginning and end of a speech utterance in a background of noise is of great importance. The conventional methods for speech endpoint detection are based on two simple time-domain measurements-short-time energy, and short-time zero-crossing rate, which couldn't guarantee the precise results if in the low signal-to-noise ratio environments. This paper proposes a novel approach that finds the Lyapunov exponent of time-domain waveform. This proposed method has no use for obtaining the frequency-domain parameters for endpoint detection process, e.g. Mel-Scale Features, which have been introduced in other paper. Accordingly, this algorithm is low complexity and suitable for Digital Isolated Word Recognition System.

음성 인식 연구에서 잡음이 존재하는 음성 발음의 처음과 끝을 찾아내는 것은 매우 중요하다. 음성 종점 탐지를 위한 기존의 방식으로는 2개의 간단한 시간 영역 측정법인 단시간 에너지와 단시간 영점교차 비율 방법이 있다. 위의 방법들은 낮은 신호 대 잡음비의 환경에서는 정확한 결과를 보장 할 수 없기 때문에 본 논문에서는 시간 영역 파형의 리아프노프 지수를 이용하여 음성의 시작과 종점을 구별하는 새로운 접근법을 제시하였다. 제안한 방법은 Mel-Scale특징 방법에서 요구되는 종점 탐지 과정을 위한 주파수 영역 매개변수를 얻는 과정이 필요 없기 때문에 보다 간단하다. 제안한 방법의 성능 검증을 위해 아라비아 숫자의 음성단어 분석에 적용해 보았으며, 결과를 통하여 제안한 방법이 인식률을 현저히 증가시킴을 확인하였다.

Keywords

References

  1. Joseph W. Picone, 'Signal Modeling Techniques in Speech Recognition', Proceeding of the IEEE, vol.81, No.9, pages 1215-1247, 1993 https://doi.org/10.1109/5.237532
  2. Steven B. Davis and Paul Mermelstein, 'Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences', IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-28, No.4, August 1980 https://doi.org/10.1109/TASSP.1980.1163420
  3. M. De Wachter, M. Matton, K. Demuynck, P.Wambacq, R. Cools, D. Van Comernolle, 'Template-based Continuous Speech Recognition', IEEE Trans. ASLP 15 (2007) 1377-1390 https://doi.org/10.1109/TASL.2007.894524
  4. F. Itakura, 'Minimum prediction residual principle applied to speech recognition', IEEE Trans. ASSP 23 (1975) 67-72 https://doi.org/10.1109/TASSP.1975.1162641
  5. H. Sakoe, S. Chiba, 'Dynamic programming algorithm optimization for spoken work recognition', IEEE Trans. ASSP 26(1978) 43-49 https://doi.org/10.1109/TASSP.1978.1163055
  6. H. Silverman, D. Morgan, 'The application of dynamic programming to connected speech segmentation', IEEE ASSP Mag. 7, no.3(1990) 7-25 https://doi.org/10.1109/53.54526
  7. Zebulum, R.S.; Vellasco, M.; Perelmuter, G.; Pacheco, M.A.; 'A COMPARISON OF DIFFERENT SPECTRAL ANALYSIS MODELS FOR SPEECH RECOGNITION USING NEURAL NETWORKS', IEEE 39th Midwest symposium on Circuits and Systems, 1996., Volume 3, 18-21 Aug. 1996 Page(s):1428 - 1431 vol.3 https://doi.org/10.1109/MWSCAS.1996.593233
  8. L. R. Rabiner and M. R. Sambur, 'An Algorithm for Determining the Endpoints of Isolated Utterances', Bell Syst. Tech. J., vol. 54, No. 2, pp. 297-315, February 1975
  9. M. R. Sambur and L. R. Rabiner, 'A Speaker Independent Digit-Recognition System', Bell Syst. Tech. J., vol. 54, No. 1, pp. 81-102, January 1975 https://doi.org/10.1002/j.1538-7305.1975.tb02827.x
  10. Kokkinos, I.; Maragos, P., 'Nonlinear speech analysis using models for chaotic systems', Speech and Audio Processing, IEEE, volume 13, Issue 6, Nov. 2005 Page(s): 1098-1109 https://doi.org/10.1109/TSA.2005.852982
  11. Adriano. Petry, D. A. C. Barone, 'Preliminary experiments in speaker verification using time-dependent largest Lyapunov expontent', Computer Speech and Language, 17 (2003), 403-413 https://doi.org/10.1016/S0885-2308(03)00029-9