Implementation of Embedded Speech Recognition System for Supporting Voice Commander to Control an Audio and a Video on Telematics Terminals

텔레메틱스 단말기 내의 오디오/비디오 명령처리를 위한 임베디드용 음성인식 시스템의 구현

  • 권오일 (현대오토넷(주)) ;
  • 이흥규 (미디어젠(주), 음성 인식/합성 연구소)
  • Published : 2005.11.01

Abstract

In this paper, we implement the embedded speech recognition system to support various application services such as audio and video control using speech recognition interface on cars. The embedded speech recognition system is implemented and ported in a DSP board. Because MIC type and speech codecs affect the accuracy of speech recognition. And also, we optimize the simulation and test environment to effectively remove the real noises on a car. We applied a noise suppression and feature compensation algorithm to increase an accuracy of sppech recognition on a car. And we used a context dependent tied-mixture acoustic modeling. The performance evaluation showed high accuracy of proposed system in office environment and even real car environment.

본 논문에서는 차량 내에서 음성인식 인터페이스를 이용한 오비오, 비디오와 같은 응용서비스 처리를 위해 임베디드형 음성인식 시스템을 구현한다. 임베디드형 음성인식 시스템은 DSP 보드로 제작 포팅된다. 이는 음성 인식률이 마이크, 음성 코덱 등의 H/W의 영향을 받기 때문이다. 또한 차량 내 잡음을 효율적으로 제거하기 위한 최적의 환경을 구축하고, 이에 따른 테스트 환경을 최적화한다. 본 논문에서 제안된 시스템은 차량 내에서의 신뢰적인 음성인식을 위해 잡음제거 및 특징보상 기술을 적용하고 임베디드 환경에서의 속도 및 성능 향상을 위한 문맥 종속 믹스쳐 공유 음향 모델링을 적용한다. 성능평가는 일반 실험실 환경에서의 인식률과 실제 차량 내에서의 실차 테스트를 통해 검증되었다.

Keywords

References

  1. X.Huang, et al, 'Semi-continuous hidden Markov models with maximum likelihood VQ', IEEE Workshop on Speech Recognition, New York, 1988
  2. S. J. Young, The HTK book, Cambridge University, version 3.2, 1997
  3. J. Duchateau, et al, 'Fast and accurate acoustic modeling with semi-continuous HMMs', Speech Communication 24, 1988 https://doi.org/10.1016/S0167-6393(98)00002-8
  4. K.M. Knill, et al, 'Use of Gaussian in large vocabulary continuous speech recognition using HMMs', Spoken Language, 1996 ICSLP 96. proceedings, Fourth International Conference on, Volume: 1, 3-6, Oct 1996 https://doi.org/10.1109/ICSLP.1996.607156
  5. Pedro J. Moreno, 'Data-driven environmental compensation for speech recognition: A unified approach', Speech Communication, vol.24, pp267-285, 1988 https://doi.org/10.1016/S0167-6393(98)00025-9
  6. SAEED V. VASEGHI, 'Advanced Digital Signal Processing and Noise Reduction', WILEY, Second Edition
  7. Steven F. Boll, 'A Spectral subtraction Algorithm for Suppression of Acoustic Noise in Speech', IEEE, No. 1379, pp200-203, 1979 https://doi.org/10.1109/ICASSP.1979.1170696
  8. Jounghoon Beh and HanseokK Ko, 'Spectral Subtraction Using Spectral Harmonics for Robust Speech Recognition in Car Environments,' LNCS, Vol.2660, pp.1109-1116, Jun, 2003
  9. Wooil Kim, Sungjoo Ahn, Hanseok Ko, 'Feature Compensation Scheme Based on parallel Combined Mixture Model', 8th European Conference on Speech Communication and Technology September 2003
  10. Junho Park, Hanseok Ko, 'CONSTRUCTION OF DECISION TREE FROM DATA DRIVEN CLUSTERING,' International Conference on Spoken Language Processing(ICSLP) 2002, Vol. 4, pp. 2657-2660, Denver, Sep, 2002
  11. Taeyoon Kim, Hanseok Ko, 'Utterance Verification Under Distributed Detection and Fusion Framework', 8th European Conference on Speech Communication and Technology September 2003