Intelligibility Improvement of Low Bit-Rate Speech Coder Using Stochastic Spectral Equalizer

Lee, Jeong Hun;Yun, Deokgyu;Choi, Seung Ho;

doi:10.7840/kics.2016.41.10.1183

The Journal of Korean Institute of Communications and Information Sciences (한국통신학회논문지)

Volume 41 Issue 10
/
Pages.1183-1185
/
2016
/
1226-4717(pISSN)
/
2287-3880(eISSN)

The Korean Institute of Commucations and Information Sciences (한국통신학회)

DOI QR Code

Intelligibility Improvement of Low Bit-Rate Speech Coder Using Stochastic Spectral Equalizer

통계적 스펙트럼 이퀄라이저를 이용한 저 비트율 음성부호화기의 명료도 향상

Lee, Jeong Hun (Department of Electronic Engineering, Seoul National University of Science and Technology) ;
Yun, Deokgyu (Department of Electronic Engineering, Seoul National University of Science and Technology) ;
Choi, Seung Ho (Department of Electronic and IT Media Engineering, Seoul National University of Science and Technology)

Received : 2016.09.26
Accepted : 2016.10.18
Published : 2016.10.31

https://doi.org/10.7840/kics.2016.41.10.1183 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Low bit-rate speech coder in digital speech communications synthesizes speech using vocal tract model parameters. In this case, the spectra of the synthesized speech can be much distorted since the allocated bits for the parameters are considerably limited, which results in the degradation of speech intelligibility. In this paper, we propose a speech intelligibility improvement method using stochastic spectral equalizer. This method stochastically obtains the weight vector of each speech coder using spectral ratios between original and synthesized speech, then applies this weight vector to synthesized speech. From the experiments of objective speech intelligibility tests, we found that the performance of the proposed method is better than that of the conventional method.

디지털 음성통신에서의 저 비트율 음성부호화기는 음성발성모델의 파라미터를 사용하여 음성을 합성한다. 이 경우, 파라미터에 할당된 비트가 매우 한정적이기 때문에 합성된 음성의 스펙트럼이 크게 왜곡될 수 있으며, 이는 명료도 저하의 요인이 된다. 본 논문에서는 통계적 스펙트럼 이퀄라이저를 이용한 명료도 향상 기법을 제안한다. 본 기법은 각각의 음성부호화기별로 원음과 합성음의 스펙트럼 비율을 이용하여 통계적으로 가중치 벡터를 구하며, 이를 합성 음성에 적용한다. 객관적인 음성명료도 평가 실험을 통해, 제안한 기법이 기존의 방법보다 성능이 우수함을 확인하였다.

Keywords

References

J.-H. Chen and A. Gersho, "Adaptive postfiltering for quality enhancement of coded Speech," IEEE Trans. Speech and Audio Process., vol. 3, no. 1, pp. 59-71, Aug. 1995. https://doi.org/10.1109/89.365380
T. Raitio, A. Suni, H. Pulakka, M. Vainio, and P. Alku, "Comparison of formant enhancement methods for HMM-Based speech synthesis," SSW, pp. 334-339, Sep. 2010.
J. Jensen and C. H. Taal, "Speech intelligibility prediction based on mutual information," IEEE/ACM TASLP, vol. 22, no. 2, pp. 430-440, Feb. 2014. https://doi.org/10.1109/TASLP.2013.2295914
J. P. Campbell Jr., T. E. Tremain, and V. C. Welch, "The federal standard 1016 4800 bps CELP voice coder," Digital Signal Process, vol. 1, no. 3, pp. 145-155, 1991. https://doi.org/10.1016/1051-2004(91)90106-U
Alan McCree, et al., "A 2.4 kbit/s MELP coder candidate for the new US Federal Standard," Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings, 1996 IEEE International Conference on, vol. 1, pp. 200-203, May. 1996.
Y. Chun and B. Jun, "An enhanced MELP vocoder in noise environments," The Journal of Korean Institute of Communications and Information Sciences, vol. 28, no. 1, pp. 81-89, 2003.
T. E. Tremain, "The government standard linear predictive coding algorithm: LPC-10," Speech Technology, vol. 1, no. 2, pp. 40-49, 1982.

Cited by

광역 스펙트로그램과 심층신경망에 기반한 중첩된 소리의 인식과 영향 분석 vol.23, pp.3, 2016, https://doi.org/10.5909/jbe.2018.23.3.421

The Journal of Korean Institute of Communications and Information Sciences (한국통신학회논문지)

Intelligibility Improvement of Low Bit-Rate Speech Coder Using Stochastic Spectral Equalizer

통계적 스펙트럼 이퀄라이저를 이용한 저 비트율 음성부호화기의 명료도 향상

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)