• Title/Summary/Keyword: speech resynthesis

Search Result 4, Processing Time 0.02 seconds

Harmonic Structure Features for Robust Speaker Diarization

  • Zhou, Yu;Suo, Hongbin;Li, Junfeng;Yan, Yonghong
    • ETRI Journal
    • /
    • v.34 no.4
    • /
    • pp.583-590
    • /
    • 2012
  • In this paper, we present a new approach for speaker diarization. First, we use the prosodic information calculated on the original speech to resynthesize the new speech data utilizing the spectrum modeling technique. The resynthesized data is modeled with sinusoids based on pitch, vibration amplitude, and phase bias. Then, we use the resynthesized speech data to extract cepstral features and integrate them with the cepstral features from original speech for speaker diarization. At last, we show how the two streams of cepstral features can be combined to improve the robustness of speaker diarization. Experiments carried out on the standardized datasets (the US National Institute of Standards and Technology Rich Transcription 04-S multiple distant microphone conditions) show a significant improvement in diarization error rate compared to the system based on only the feature stream from original speech.

Voice quality transform using jitter synthesis (Jitter 합성에 의한 음질변환에 관한 연구)

  • Jo, Cheolwoo
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.121-125
    • /
    • 2018
  • This paper describes procedures of changing and measuring voice quality in terms of jitter. Jitter synthesis method was applied to the TD-PSOLA analysis system of the Praat software. The jitter component is synthesized based on a Gaussian random noise model. The TD-PSOLA re-synthesize process is used to synthesize the modified voice with artificial jitter. Various vocal jitter parameters are used to measure the change in quality caused by artificial systematic jitter change. Synthetic vowels, natural vowels and short sentences are used to check the change in voice quality through the synthesizer model. The results shows that the suggested method is useful for voice quality control in a limited way and can be used to alter the jitter component of voice.

F0 Contour Model based on Temporal Decomposition (시간적 분해에 기반한 F0 궤적 모델에 관한 연구)

  • 변효진;김연준;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.8
    • /
    • pp.75-83
    • /
    • 1999
  • This paper proposes a new F0 contour model for intonation control in speech synthesis. We assume that the F0 contour of an utterance can be described using a sequence of time-overlapping events, which determine the fluctuation of a given F0 contour, described by asymmetric Gaussian functions. In addition, We propose a parameter estimation algorithm for the proposed model. The proposed model is not developed with a particular phonological theory in mind, and can be used in both F0 contour analysis and synthesis. For testing our F0 model, we collected 500 sentences from various genres and built a corresponding speech corpus uttered by a professional female announcer. As n result of F0 resynthesis experiment using the proposed model, the RMSE was 7.87Hz for given speech corpus.

  • PDF

Perceptual Boundary on a Synthesized Korean Vowel /o/-/u/ Continuum by Chinese Learners of Korean Language (/오/-/우/ 합성모음 연속체에 대한 중국인 한국어 학습자의 청지각적 경계)

  • Yun, Jihyeon;Kim, EunKyung;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.111-121
    • /
    • 2015
  • The present study examines the auditory boundary between Korean /o/ and /u/ on a synthesized vowel continuum by Chinese learners of Korean language. Preceding researches reported that the Chinese learners have difficulty pronouncing Korean monophthongs /o/ and /u/. In this experiment, a nine-step continuum was resynthesized using Praat from a vowel token from a recording of a male announcer who produced it in isolated form. F1 and F2 were synchronously shifted in equal steps in qtone (quarter tone), while F3 and F4 values were held constant for the entire stimuli. A forced choice identification task was performed by the advanced learners who speak Mandarin Chinese as their native language. Their experiment data were compared to a Korean native group. ROC (Receiver Operating Characteristic) analysis and logistic regression were performed to estimate the perceptual boundary. The result indicated the learner group has a different auditory criterion on the continuum from the Korean native group. This suggests that more importance should be placed on hearing and listening training in order to acquire the phoneme categories of the two vowels.