DOI QR코드

DOI QR Code

Application of Preemphasis FIR Filtering To Speech Detection and Phoneme Segmentation

프리엠퍼시스 FIR 필터링의 음성 검출 및 음소 분할에의 응용

  • 이창영 (동서대학교 산업경영공학과)
  • Received : 2013.04.02
  • Accepted : 2013.05.20
  • Published : 2013.05.31

Abstract

In this paper, we propose a new method of speech detection and phoneme segmentation. We investigate the effect of applying preemphasis FIR filtering on the speech signal before the usual speech detection that utilizes the energy profile for discriminating signals from background noise. By this procedure, only the speech section of low energy and frequency becomes distinct in energy profile. It is verified experimentally that the silence/speech boundary becomes sharper by applying the filtering compared to the conventional method. By applications of this procedure, phoneme segmentation is also found to be much facilitated.

이 논문에서 우리는 음성 검출 및 음소 분할에 대한 새로운 방법을 제안한다. 배경 잡음으로부터 신호를 구분하기 위해 에너지를 활용하게 되는데, 그 이전에 프리엠퍼시스 FIR 필터링을 적용하는 효과에 대해 조사한다. 이 방법에 의해, 에너지 프로필에서 진폭과 주파수의 곱이 동시에 작은 부분이 두드러지게 나타나게 된다. 이 처방에 의해, 묵음/음성 경계가 종전의 방법에 비해 더 선명해짐을 실험적으로 확인하였다. 또한 이 방법을 적용함으로써, 음소 분할 또한 더 수월해짐을 밝혔다.

Keywords

References

  1. G. Kaplan, "Words Into Action I," IEEE Spectrum, Vol. 17, pp. 22-26, 1980.
  2. Myoung-ku Kang, "A Study on the Design of Multimedia Service Platform on Wireless Intelligent Technology," The Journal of the Korea Institute of Electronic Communication Sciences, Vol. 4, No. 1, pp. 24-30, 2009.
  3. Jae-duck Yoo, Hong-tae Park, Hyun-sik Shin, & Yun-ho Shin, "A Study of the Communication Infrastructure Construction for u-City in Korea," The Journal of the Korea Institute of Electronic Communication Sciences, Vol. 1, No. 2, pp. 127-135, 2006.
  4. Y. Chang, S. Hung, N. Wang, & B. Lin, "CSR: A Cloud-Assisted Speech Recognition Service for Personal Mobile Device," International Conference on Parallel Processing (ICPP), pp. 305-314. 2011.
  5. Beom-joon Kim, "Service Quality Criteria for Voice Services over a WiBro Network," The Journal of the Korea Institute of Electronic Communication Sciences, Vol. 6, No. 6, pp. 823-829, 2011.
  6. J.E. Flood & D.I. Urquhart-Pullen, "Timeassignment speech interpolation in timecompression- multiplex transmission," Proceedings of the Institution of Electrical Engineers, Vol. 111, No. 4, pp. 675-683, 1964. https://doi.org/10.1049/piee.1964.0118
  7. J.G. Wilpon, L.R. Rabiner, & T.B. Martin, "An improved word-detection algorithm for telephone- quality speech incorporating both syn tactic and semantic constraints," AT&T Tech. J., Vol. 63, No. 3, pp. 479-498, 1984.
  8. L.R. Rabiner & B. Juang, "Fundamentals of Speech Recognition," Prentice Hall, pp. 143- 149, 1993.
  9. T. Kristjansson, B. Frey, L. Deng, & A. Acero, "Towards non-stationary model-based noise adaptation for large vocabulary speech recognition," ICASSP '01, Vol. 1, pp. 337-340, 2001.
  10. J.R. Deller, J.G. Proakis, & J.H.L. Hansen, "Discrete-Time Processing of Speech Signals," Macmillan, New York, pp. 246-251, 1994.
  11. L.R. Rabiner & B. Juang, "Fundamentals of Speech Recognition," Prentice Hall, pp. 112- 117, 1993.
  12. J.-C. Wang, J.-F. Wang, & Y. Weng, "Chip design of MFCC extraction for speech recognition," The VLSI Journal, Vol. 32, pp. 111-131, 2002. https://doi.org/10.1016/S0167-9260(02)00045-7
  13. L.R. Rabiner & B. Juang, "Fundamentals of Speech Recognition," Prentice Hall, pp. 30-37, 1993.
  14. S. Kajita, K. Takeda, & F. Itakura, "Spectral weighting of SBCOR for noise robust speech recognition," ICASSP '98, Vol. 2, pp. 621-624, 1998.
  15. D.C. Costa, G.A.M. Lopes, C.A.B. Mello, & H.O. Viana, "Speech and phoneme segmentation under noisy environment through spectrogram image analysis," IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1017-1022, 2012.