DOI QR코드

DOI QR Code

Reference Channel Input-Based Speech Enhancement for Noise-Robust Recognition in Intelligent TV Applications

지능형 TV의 음성인식을 위한 참조 잡음 기반 음성개선

  • Received : 2012.07.30
  • Accepted : 2012.09.05
  • Published : 2013.02.28

Abstract

In this paper, a noise reduction system is proposed for the speech interface in intelligent TV applications. To reduce TV speaker sound which are very serious noises degrading recognition performance, a noise reduction algorithm utilizing the direct TV sound as the reference noise input is implemented. In the proposed algorithm, transfer functions are estimated to compensate for the difference between the direct TV sound and that recorded with the microphone installed on the TV frame. Then, the noise power spectrum in the received signal is calculated to perform Wiener filter-based noise cancellation. Additionally, a postprocessing step is applied to reduce remaining noises. Experimental results show that the proposed algorithm shows 88% recognition rate for isolated Korean words at 5 dB input SNR.

본 논문에서는 지능형 TV의 음성인터페이스를 위한 잡음제거 시스템에 대해서 제안한다. 음성인식 성능 저하에 매우 나쁜 영향을 주는 TV 소리를 제거하기 위해서 TV 소리 자체를 참조 잡음으로 하는 잡음제거 알고리즘이 구현된다. 제안된 알고리즘에서 TV 스피커와 다채널 장비간의 전달함수를 추정한다. 그 후, 위너 필터를 동작시키기 위해서 잡음의 전력 스펙트럼이 추정된다. 추가적으로 후처리 과정이 적용되어 잔존 잡음을 제거한다. 실험의 의해서 제안된 알고리즘이 5 dB 입력 SNR에서 88 %의 음성인식률을 나타내었다.

Keywords

References

  1. S. Jeong and M. Hahn, "Speech quality and recognition rate improvement in car noise environments," Electronics Letters, vol. 37, no. 12, pp. 801-802, 2001.
  2. Y. Ephraim, and D. Malah, "Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator," IEEE Trans. Acoust., Speech, and Signal Process., vol. 32, no. 6, pp. 1109-1211, 1984. https://doi.org/10.1109/TASSP.1984.1164453
  3. ES 202 212 V1.1.2., "Speech processing, transmission and quality aspects (STQ); distributed speech recognition; extended advanced front-end feature extraction algorithm; compression algorithm; back-end speech reconstruction algorithm," ETSI Standard, 2005.
  4. B. Widrow, and S. D. Stearns, Adaptive Signal Processing, Prentice Hall, 1985.
  5. B. D. Van Veen, and K. M. Buckley, "Beamforming: A versatile approach to spatial filtering", IEEE ASSP Magazine, vol. 5, no. 2, pp. 4-24, 1998.
  6. M. Brandstein, and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications, Springer, 2001.
  7. J. Benesty, J. Chen, and Y. Huang, Microphone Array Signal Processing (Springer Topics in Signal Processing), Springer, 2008.
  8. O. Hoshuyama, et al., "A robust adaptive beamformer for microphone arrays with a blocking matrix using constrained adaptive filters," IEEE Trans. Signal Proc., vol. 47, no. 10, pp. 2677-2688, 1999. https://doi.org/10.1109/78.790650
  9. S. Alexander, Adaptive signal Processing: Theory and Applications, Springer-Verlag, 1986.
  10. A. Oppenheim, R. Schafer, Discrete-time signal processing, Pearson, 2007.
  11. L. Rabiner, and B. H. Juang, Fundamentals of speech recognition, Prentice Hall, 1996.