• Title/Summary/Keyword: Corpus-based synthesis

Search Result 34, Processing Time 0.022 seconds

Speech Synthesis Based on CVC Speech Segments Extracted from Continuous Speech (연속 음성으로부터 추출한 CVC 음성세그먼트 기반의 음성합성)

  • 김재홍;조관선;이철희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.10-16
    • /
    • 1999
  • In this paper, we propose a concatenation-based speech synthesizer using CVC(consonant-vowel-consonant) speech segments extracted from an undesigned continuous speech corpus. Natural synthetic speech can be generated by a proper modelling of coarticulation effects between phonemes and the use of natural prosodic variations. In general, CVC synthesis unit shows smaller acoustic degradation of speech quality since concatenation points are located in the consonant region and it can properly model the coarticulation of vowels that are effected by surrounding consonants. In this paper, we analyze the characteristics and the number of required synthesis units of 4 types of speech synthesis methods that use CVC synthesis units. Furthermore, we compare the speech quality of the 4 types and propose a new synthesis method based on the most promising type in terms of speech quality and implementability. Then we implement the method using the speech corpus and synthesize various examples. The CVC speech segments that are not in the speech corpus are substituted by demonstrate speech segments. Experiments demonstrate that CVC speech segments extracted from about 100 Mbytes continuous speech corpus can produce high quality synthetic speech.

  • PDF

Unit Generation Based on Phrase Break Strength and Pruning for Corpus-Based Text-to-Speech

  • Kim, Sang-Hun;Lee, Young-Jik;Hirose, Keikichi
    • ETRI Journal
    • /
    • v.23 no.4
    • /
    • pp.168-176
    • /
    • 2001
  • This paper discusses two important issues of corpus-based synthesis: synthesis unit generation based on phrase break strength information and pruning redundant synthesis unit instances. First, the new sentence set for recording was designed to make an efficient synthesis database, reflecting the characteristics of the Korean language. To obtain prosodic context sensitive units, we graded major prosodic phrases into 5 distinctive levels according to pause length and then discriminated intra-word triphones using the levels. Using the synthesis unit with phrase break strength information, synthetic speech was generated and evaluated subjectively. Second, a new pruning method based on weighted vector quantization (WVQ) was proposed to eliminate redundant synthesis unit instances from the synthesis database. WVQ takes the relative importance of each instance into account when clustering similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through objective and subjective evaluations of synthetic speech quality: one to simply limit the maximum number of instances, and the other based on normal VQ-based clustering. For the same reduction rate of instance number, the proposed method showed the best performance. The synthetic speech with reduction rate 45% had almost no perceptible degradation as compared to the synthetic speech without instance reduction.

  • PDF

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

  • Kim, Sang-hun; Park, Jun;Lee, Young-jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.24-33
    • /
    • 2001
  • this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.

  • PDF

An Optimization of Speech Database in Corpus-based speech synthesis sytstem (코퍼스기반 음성합성기의 데이터베이스 최적화 방안)

  • Jang Kyung-Ae;Chung Min-Hwa
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.209-213
    • /
    • 2002
  • This paper describes the reduction of DB without degradation of speech quality in Corpus-based Speech synthesizer of Korean language. In this paper, it is proposed that the frequency of every unit in reduced DB should reflect the frequency of units in Korean language. So, the target population of every unit is set to be proportional to their frequency in Korean large corpus(780K sentences, 45Mega phonemes). Second, the frequent instances during synthesis should be also maintained in reduced DB. To the last, it is proposed that frequency of every instance should be reflected in clustering criterion and used as criterion for selection of representative instances. The evaluation result with proposed methods reveals better quality than using conventional methods.

  • PDF

A Reduction of Speech Database in Corpus-based Speech Synthesis System (코퍼스기반 음성합성기의 데이터베이스 감축방안)

  • Jang Kyung-Ae;Chung Min-Hwa;Kim Jae-In;Koo Myoung-Wan
    • MALSORI
    • /
    • no.44
    • /
    • pp.145-156
    • /
    • 2002
  • This paper describes the reduction of DB without degradation of speech quality in Corpus-based Speech synthesizer of the Korean language. In this paper, it is proposed that the frequency of every unit in reduced DB reflect the frequency of units in the Korean language. So, the target population of every unit is set to be proportional to its frequency in Korean large corpus (780k sentences, 45Mega phones). Secondly, the frequent instances during synthesis should be also maintained in reduced DB. To the last, it is proposed that frequency of every instance be reflected in clustering criteria and used as another important criterion for selection of representative instances. The evaluation result with proposed methods reveals better quality than that using conventional methods.

  • PDF

Implementation and Evaluation of an HMM-Based Speech Synthesis System for the Tagalog Language

  • Mesa, Quennie Joy;Kim, Kyung-Tae;Kim, Jong-Jin
    • MALSORI
    • /
    • v.68
    • /
    • pp.49-63
    • /
    • 2008
  • This paper describes the development and assessment of a hidden Markov model (HMM) based Tagalog speech synthesis system, where Tagalog is the most widely spoken indigenous language of the Philippines. Several aspects of the design process are discussed here. In order to build the synthesizer a speech database is recorded and phonetically segmented. The constructed speech corpus contains approximately 89 minutes of Tagalog speech organized in 596 spoken utterances. Furthermore, contextual information is determined. The quality of the synthesized speech is assessed by subjective tests employing 25 native Tagalog speakers as respondents. Experimental results show that the new system is able to obtain a 3.29 MOS which indicates that the developed system is able to produce highly intelligible neutral Tagalog speech with stable quality even when a small amount of speech data is used for HMM training.

  • PDF

A New Pruning Method for Synthesis Database Reduction Using Weighted Vector Quantization

  • Kim, Sanghun;Lee, Youngjik;Keikichi Hirose
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4E
    • /
    • pp.31-38
    • /
    • 2001
  • A large-scale synthesis database for a unit selection based synthesis method usually retains redundant synthesis unit instances, which are useless to the synthetic speech quality. In this paper, to eliminate those instances from the synthesis database, we proposed a new pruning method called weighted vector quantization (WVQ). The WVQ reflects relative importance of each synthesis unit instance when clustering the similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through the objective and subjective evaluations of the synthetic speech quality: one to simply limit maximum number of instance, and the other based on normal VQ-based clustering. The proposed method showed the best performance under 50% reduction rates. Over 50% of reduction rates, the synthetic speech quality is not seriously but perceptibly degraded. Using the proposed method, the synthesis database can be efficiently reduced without serious degradation of the synthetic speech quality.

  • PDF

A Study on Speech Synthesizer Using Distributed System (분산형 시스템을 적용한 음성합성에 관한 연구)

  • Kim, Jin-Woo;Min, So-Yeon;Na, Deok-Su;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.209-215
    • /
    • 2010
  • Recently portable terminal is received attention by wireless networks and mass capacity ROM. In this result, TTS(Text to Speech) system is inserted to portable terminal. Nevertheless high quality synthesis is difficult in portable terminal, users need high quality synthesis. In this paper, we proposed Distributed TTS (DTTS) that was composed of server and terminal. The DTTS on corpus based speech synthesis can be high quality synthesis. Synthesis system in server that generate optimized speech concatenation information after database search and transmit terminal. Synthesis system in terminal make high quality speech synthesis as low computation using transmitted speech concatenation information from server. The proposed method that can be reducing complexity, smaller power consumption and efficient maintenance.

A Study on the Voice Conversion with HMM-based Korean Speech Synthesis (HMM 기반의 한국어 음성합성에서 음색변환에 관한 연구)

  • Kim, Il-Hwan;Bae, Keun-Sung
    • MALSORI
    • /
    • v.68
    • /
    • pp.65-74
    • /
    • 2008
  • A statistical parametric speech synthesis system based on the hidden Markov models (HMMs) has grown in popularity over the last few years, because it needs less memory and low computation complexity and is suitable for the embedded system in comparison with a corpus-based unit concatenation text-to-speech (TTS) system. It also has the advantage that voice characteristics of the synthetic speech can be modified easily by transforming HMM parameters appropriately. In this paper, we present experimental results of voice characteristics conversion using the HMM-based Korean speech synthesis system. The results have shown that conversion of voice characteristics could be achieved using a few sentences uttered by a target speaker. Synthetic speech generated from adapted models with only ten sentences was very close to that from the speaker dependent models trained using 646 sentences.

  • PDF

Phospholipids from Bombycis corpus and Their Neurotrophic Effects

  • Kwon, Hak-Cheol;Jung, I-Yeon;Cho, Se-Yeon;Cho, Ock-Ryun;Yang, Min-Cheol;Lee, Sung-Ok;Hur, Jin-Young;Kim, Sun-Yeou;Yang, Jong-Beom;Lee, Kang-Ro
    • Archives of Pharmacal Research
    • /
    • v.26 no.6
    • /
    • pp.471-477
    • /
    • 2003
  • Three phospholipids (4-6) and three aromatic amines (1-3) were obtained from the methanol extract of Bombycis corpus. Based on spectral data, their structures have been elucidated as nicotiamide (1), cytidine (2), adenine (3), 1-Ο-(9Z-octadecenoyl)-2-Ο-(8Z,11Z-octadecadienoyl)-sn-glycero-3-phosphorylcholine (4), 1,2-di-Ο-hexadecanoyl-sn-glycero-3-phosphorylcholine (5) and 1,2-di-Ο-9Z-octadecenoyl-sn-glycero-3-phosphorylcholine (6). We examined the effects of compounds on synthesis of NGF in cultured astrocytes. By RT-PCR analysis, expresison of NGF mRNA in astrocytes cultured in serum-starvation increased after the addition of phospholipid (10 $\mu$M). The NGF content in the culture medium was significantly increased by compound 5, compared with the control value. These results suggest that three phospholipid compounds isolated from the methanol extract of Bombycis corpus may exert neurotrophic effects by stimulation of NGF synthesis in astrocytes.