DOI QR코드

DOI QR Code

통계적 스펙트럼 이퀄라이저를 이용한 저 비트율 음성부호화기의 명료도 향상

Intelligibility Improvement of Low Bit-Rate Speech Coder Using Stochastic Spectral Equalizer

  • Lee, Jeong Hun (Department of Electronic Engineering, Seoul National University of Science and Technology) ;
  • Yun, Deokgyu (Department of Electronic Engineering, Seoul National University of Science and Technology) ;
  • Choi, Seung Ho (Department of Electronic and IT Media Engineering, Seoul National University of Science and Technology)
  • 투고 : 2016.09.26
  • 심사 : 2016.10.18
  • 발행 : 2016.10.31

초록

디지털 음성통신에서의 저 비트율 음성부호화기는 음성발성모델의 파라미터를 사용하여 음성을 합성한다. 이 경우, 파라미터에 할당된 비트가 매우 한정적이기 때문에 합성된 음성의 스펙트럼이 크게 왜곡될 수 있으며, 이는 명료도 저하의 요인이 된다. 본 논문에서는 통계적 스펙트럼 이퀄라이저를 이용한 명료도 향상 기법을 제안한다. 본 기법은 각각의 음성부호화기별로 원음과 합성음의 스펙트럼 비율을 이용하여 통계적으로 가중치 벡터를 구하며, 이를 합성 음성에 적용한다. 객관적인 음성명료도 평가 실험을 통해, 제안한 기법이 기존의 방법보다 성능이 우수함을 확인하였다.

Low bit-rate speech coder in digital speech communications synthesizes speech using vocal tract model parameters. In this case, the spectra of the synthesized speech can be much distorted since the allocated bits for the parameters are considerably limited, which results in the degradation of speech intelligibility. In this paper, we propose a speech intelligibility improvement method using stochastic spectral equalizer. This method stochastically obtains the weight vector of each speech coder using spectral ratios between original and synthesized speech, then applies this weight vector to synthesized speech. From the experiments of objective speech intelligibility tests, we found that the performance of the proposed method is better than that of the conventional method.

키워드

참고문헌

  1. J.-H. Chen and A. Gersho, "Adaptive postfiltering for quality enhancement of coded Speech," IEEE Trans. Speech and Audio Process., vol. 3, no. 1, pp. 59-71, Aug. 1995. https://doi.org/10.1109/89.365380
  2. T. Raitio, A. Suni, H. Pulakka, M. Vainio, and P. Alku, "Comparison of formant enhancement methods for HMM-Based speech synthesis," SSW, pp. 334-339, Sep. 2010.
  3. J. Jensen and C. H. Taal, "Speech intelligibility prediction based on mutual information," IEEE/ACM TASLP, vol. 22, no. 2, pp. 430-440, Feb. 2014. https://doi.org/10.1109/TASLP.2013.2295914
  4. J. P. Campbell Jr., T. E. Tremain, and V. C. Welch, "The federal standard 1016 4800 bps CELP voice coder," Digital Signal Process, vol. 1, no. 3, pp. 145-155, 1991. https://doi.org/10.1016/1051-2004(91)90106-U
  5. Alan McCree, et al., "A 2.4 kbit/s MELP coder candidate for the new US Federal Standard," Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings, 1996 IEEE International Conference on, vol. 1, pp. 200-203, May. 1996.
  6. Y. Chun and B. Jun, "An enhanced MELP vocoder in noise environments," The Journal of Korean Institute of Communications and Information Sciences, vol. 28, no. 1, pp. 81-89, 2003.
  7. T. E. Tremain, "The government standard linear predictive coding algorithm: LPC-10," Speech Technology, vol. 1, no. 2, pp. 40-49, 1982.

피인용 문헌

  1. 광역 스펙트로그램과 심층신경망에 기반한 중첩된 소리의 인식과 영향 분석 vol.23, pp.3, 2016, https://doi.org/10.5909/jbe.2018.23.3.421