Improvement of Synthetic Speech Quality using a New Spectral Smoothing Technique

;;

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Volume 30 Issue 11
/
Pages.1037-1043
/
2003
/
1229-6848(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Improvement of Synthetic Speech Quality using a New Spectral Smoothing Technique

새로운 스펙트럼 완만화에 의한 합성 음질 개선

장효종 (숭실대학교 컴퓨터학과) ;
최형일 (숭실대학교 미디어학부)

Published : 2003.12.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper describes a speech synthesis technique using a diphone as an unit phoneme. Speech synthesis is basically accomplished by concatenating unit phonemes, and it's major problem is discontinuity at the connection part between unit phonemes. To solve this problem, this paper proposes a new spectral smoothing technique which reflects not only formant trajectories but also distribution characteristics of spectrum and human's acoustic characteristics. That is, the proposed technique decides the quantity and extent of smoothing by considering human's acoustic characteristics at the connection part of unit phonemes, and then performs spectral smoothing using weights calculated along a time axis at the border of two diphones. The proposed technique reduces the discontinuity and minimizes the distortion which is caused by spectral smoothing. For the purpose of performance evaluation, we tested on five hundred diphones which are extracted from twenty sentences using ETRI Voice DB samples and individually self-recorded samples.

본 논문에서는 단위음소로 다이폰을 사용하여 음성을 합성하는 방법에 관하여 기술한다. 음성 합성은 기본적으로 단위음소들의 연결을 통하여 이루어지는데, 이때 발생하는 가장 큰 문제점은 두 단위음소 사이의 연결부분에서 불연속이 발생하는 것이다. 이 문제를 해결하기 위하여 본 논문에서는 포만트 궤적뿐 아니라 스펙트럼의 분포특성과 인간의 청각적인 특성을 반영하여 스펙트럼을 완만화하는 방법을 제안한다. 즉, 제안하는 방법은 단위음소의 연결 구간에서 인간의 청각신경 특성을 고려하여 완만화의 양과 범위를 결정한 다음, 두 다이폰 경계의 스펙트럼 분포를 시간에 따라 가중치를 다르게 주어 스펙트럼 완만화를 수행한다. 이 방법은 불연속을 제거하며 완만화로 인하여 발생할 수 있는 음성의 왜곡을 최소화한다. 제안하는 방법의 성능을 평가하기 위하여 ETRI 음성 DB 샘플과 개인별로 자체 녹음한 총 20여개의 문장에서 추출한 약 500여 개의 다이폰에 대하여 실험을 수행하였다.

Keywords

References

R.E. Donovan, P.C. Woodland, A hidden Markov model based trainable speech synthesizer, Computer Speech and Language, pp1-19, 1999 https://doi.org/10.1006/csla.1999.0123
Conkie, A.D., Isard S., Optimal coupling of diphones Progress in Speech Synthesis, Springer, New York, Chapter 23, pp293-304, 1997
Kleijn W.B., Haagen J., Waveform interpolation for coding and synthesis, Speech Coding and Synthesis, Chapter 5, pp175-207, 1995
David T. Chappell, John H.L. Hansen, A Comparison of Spectral Smoothing methods for segment concatenation based speech synthesis, Speech Communication 36, pp343-374, 2002 https://doi.org/10.1016/S0167-6393(01)00008-5
Wouters, J. ,Macon, M.W. ,Control of Spectral Dynamics in Concatenative Speech Synthesis, Speech and Audio Processing, IEEE Transactions on, Vol 9, No. 1, pp30-38, Jan 2001 https://doi.org/10.1109/89.890069
Hossein Najafzadeh-Azghandi, Perceptual Coding of Narrowband Signals, Ph.D The-sis, Department of Electrical & Computer Engineering, McGill University, Montreal, Canada, April 2000
John H. L. Hansen and David T.Chappell, An Auditory-Based Distortion Measure with Application to Concatenative Speech Synthesis, Speech and Audio Processing, IEEE Transactions on, Vol 6, No.5, pp489-495, Sep 1998 https://doi.org/10.1109/89.709674
L. R. Rabiner, R. W. Schafer, Digital Processing of Speech Signals, Prentice-hall, 1978
H. S. Hou and H. C. Andrews, Cubic Splines for Image Interpolatio and Digital Filtering, IEEE Trans. Acoustics,Speech,and Signal Processing, ASSP-26,6, December 1978, 508-517
Esther Klabbers, Raymond Veldhuis, Reducing Audible Spectral Discontivuities, IEEE Transactions on Speech and Audio Processing, Vol 9, No. 1, Jan 2001
H. van den Heuvel, B.Cranen, T.Rietveld, Speaker variability in the coarticulation of /a,i,u/, Speech Communication 18, pp113-130, 1996 https://doi.org/10.1016/0167-6393(95)00039-9

Journal of KIISE:Software and Applications (한국정보과학회논문지:소프트웨어및응용)

Improvement of Synthetic Speech Quality using a New Spectral Smoothing Technique

새로운 스펙트럼 완만화에 의한 합성 음질 개선

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)