Separation of Voiced Sounds and Unvoiced Sounds for Corpus-based Korean Text-To-Speech

한국어 음성합성기의 성능 향상을 위한 합성 단위의 유무성음 분리

  • 홍문기 (서경대학교 컴퓨터과학과) ;
  • 신지영 (고려대학교 국어국문학과) ;
  • 강선미 (서경대학교 컴퓨터과학과)
  • Published : 2003.06.01

Abstract

Predicting the right prosodic elements is a key factor in improving the quality of synthesized speech. Prosodic elements include break, pitch, duration and loudness. Pitch, which is realized by Fundamental Frequency (F0), is the most important element relating to the quality of the synthesized speech. However, the previous method for predicting the F0 appears to reveal some problems. If voiced and unvoiced sounds are not correctly classified, it results in wrong prediction of pitch, wrong unit of triphone in synthesizing the voiced and unvoiced sounds, and the sound of click or vibration. This kind of feature is usual in the case of the transformation from the voiced sound to the unvoiced sound or from the unvoiced sound to the voiced sound. Such problem is not resolved by the method of grammar, and it much influences the synthesized sound. Therefore, to steadily acquire the correct value of pitch, in this paper we propose a new model for predicting and classifying the voiced and unvoiced sounds using the CART tool.

Keywords