Search | Korea Science

A Method of Intonation Modeling for Corpus-Based Korean Speech Synthesizer (코퍼스 기반 한국어 합성기의 억양 구현 방안)

Kim, Jin-Young;Park, Sang-Eon;Eom, Ki-Wan;Choi, Seung-Ho
- Speech Sciences
- /
- v.7 no.2
- /
- pp.193-208
- /
- 2000
This paper describes a multi-step method of intonation modeling for corpus-based Korean speech synthesizer. We selected 1833 sentences considering various syntactic structures and built a corresponding speech corpus uttered by a female announcer. We detected the pitch using laryngograph signals and manually marked the prosodic boundaries on recorded speech, and carried out the tagging of part-of-speech and syntactic analysis on the text. The detected pitch was separated into 3 frequency bands of low, mid, high frequency components which correspond to the baseline, the word tone, and the syllable tone. We predicted them using the CART method and the Viterbi search algorithm with a word-tone-dictionary. In the collected spoken sentences, 1500 sentences were trained and 333 sentences were tested. In the layer of word tone modeling, we compared two methods. One is to predict the word tone corresponding to the mid-frequency components directly and the other is to predict it by multiplying the ratio of the word tone to the baseline by the baseline. The former method resulted in a mean error of 12.37 Hz and the latter in one of 12.41 Hz, similar to each other. In the layer of syllable tone modeling, it resulted in a mean error rate less than 8.3% comparing with the mean pitch, 193.56 Hz of the announcer, so its performance was relatively good.
PDF

A Study of Fundamental Frequency for Focused Word Spotting in Spoken Korean (한국어 발화음성에서 중점단어 탐색을 위한 기본주파수에 대한 연구)

Kwon, Soon-Il;Park, Ji-Hyung;Park, Neung-Soo
- The KIPS Transactions:PartB
- /
- v.15B no.6
- /
- pp.595-602
- /
- 2008
The focused word of each sentence is a help in recognizing and understanding spoken Korean. To find the method of focused word spotting at spoken speech signal, we made an analysis of the average and variance of Fundamental Frequency and the average energy extracted from a focused word and the other words in a sentence by experiments with the speech data from 100 spoken sentences. The result showed that focused words have either higher relative average F0 or higher relative variances of F0 than other words. Our findings are to make a contribution to getting prosodic characteristics of spoken Korean and keyword extraction based on natural language processing.
https://doi.org/10.3745/KIPSTB.2008.15-B.6.595 인용 PDF KSCI

Pronunciation of the Korean diphthong /jo/: Phonetic realizations and acoustic properties (한국어 /ㅛ/의 발음 양상 연구: 발음형 빈도와 음향적 특징을 중심으로)

Hyangwon Lee
- Phonetics and Speech Sciences
- /
- v.15 no.1
- /
- pp.9-17
- /
- 2023
The purpose of this study is to determine how the Korean diphthong /jo/ shows phonetic variation in various linguistic environments. The pronunciation of /jo/ is discussed, focusing on the relationship between phonetic variation and the distribution range of vowels. The location in a word (monosyllable, word-initial, word-medial, word-final) and word class (content word, function word) were analyzed using the speech of 10 female speakers of the Seoul Corpus. As a result of determining the frequency of appearance of /jo/ in each environment, the pronunciation type and word class were affected by the location in a word. Frequent phonetic reduction was observed in the function word /jo/ in the acoustic analysis. The word class did not change the average phonetic values of /jo/, but changed the distribution of individual tokens. These results indicate that the linguistic environment affects the phonetic distribution of vowels.
https://doi.org/10.13064/KSSS.2023.15.1.009 인용 PDF

A Study of Fundamental Frequency about Voice Imitation (모방발화의 기본주파수 연구)

Park, Mi-Young;Shin, Ji- Young;Kang, Sun-Mee
- Proceedings of the KSPS conference
- /
- 2004.05a
- /
- pp.199-204
- /
- 2004
The purpose of this paper is to find prosodic characteristics in voice imitation. Speakers change various phonetic features in voice imitation. Speakers change their pitch ranges in the most cases. Especially, the pitch range is important for word conditions. And, as imitators change the voice, the average value of f0 is close to high frequence than low frequence or middle level.
PDF

A study on the change of prosodic units by speech rate and frequency of turn-taking (발화 속도와 말차례 교체 빈도에 따른 운율 단위 변화에 관한 연구)

Won, Yugwon
- Phonetics and Speech Sciences
- /
- v.14 no.2
- /
- pp.29-38
- /
- 2022
This study aimed to analyze the speech appearing in the National Institute of Korean Language's Daily Conversation Speech Corpus (2020) and reveal how the speech rate and the frequency of turn-taking affect the change in prosody units. The analysis results showed a positive correlation between intonation phrase, word phrase frequency, and speaking duration as the speech speed increased; however, the correlation was low, and the suitability of the regression model of the speech rate was 3%-11%, which was weak in explanatory power. There was a significant difference in the mean speech rate according to the frequency of the turn-taking, and the speech rate decreased as the frequency of the turn-taking increased. In addition, as the frequency of turn-taking increased, the frequency of intonation phrases, the frequency of word phrases, and the speaking duration decreased; there was a high negative correlation. The suitability of the regression model of the turn-taking frequency was calculated as 27%-32%. The frequency of turn-taking functions as a factor in changing the speech rate and prosodic units. It is presumed that this can be influenced by the disfluency of the dialogue, the characteristics of turn-taking, and the active interaction between the speakers.
https://doi.org/10.13064/KSSS.2022.14.2.029 인용 PDF KSCI

Unit Generation Based on Phrase Break Strength and Pruning for Corpus-Based Text-to-Speech

Kim, Sang-Hun;Lee, Young-Jik;Hirose, Keikichi
- ETRI Journal
- /
- v.23 no.4
- /
- pp.168-176
- /
- 2001
This paper discusses two important issues of corpus-based synthesis: synthesis unit generation based on phrase break strength information and pruning redundant synthesis unit instances. First, the new sentence set for recording was designed to make an efficient synthesis database, reflecting the characteristics of the Korean language. To obtain prosodic context sensitive units, we graded major prosodic phrases into 5 distinctive levels according to pause length and then discriminated intra-word triphones using the levels. Using the synthesis unit with phrase break strength information, synthetic speech was generated and evaluated subjectively. Second, a new pruning method based on weighted vector quantization (WVQ) was proposed to eliminate redundant synthesis unit instances from the synthesis database. WVQ takes the relative importance of each instance into account when clustering similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through objective and subjective evaluations of synthetic speech quality: one to simply limit the maximum number of instances, and the other based on normal VQ-based clustering. For the same reduction rate of instance number, the proposed method showed the best performance. The synthetic speech with reduction rate 45% had almost no perceptible degradation as compared to the synthetic speech without instance reduction.
PDF

A Study on the change of Korean rhythm patterns - with focus on two syllable words - (한국어 리듬패턴 변화에 관한 연구 -2음절 낱말을 중심으로-)

Kim Sun Ju
- MALSORI
- /
- no.39
- /
- pp.1-14
- /
- 2000
In Korean, it has been well Down that vowel length plays an important role in differentiating word meanings. But the distinction between long and short vowels is often ignored by young generation. The purpose of this paper is to investigate the change in rhythm patterns. In addition, it is also examined whether this change has resulted in the differences in prosodic features between young and old groups. This study is based on H. B. Lee's 'rhythm pattern theory' Based on his assumption, it is suggested that the loss of original vowel length has caused the place of accent to move from the first to the second syllable.
PDF

A Study on the Vowel lengthening and a Morphophonological Interpretatipon for its function (홀소리 길이의 늘어짐(Vowel lengthening)의 기능 및 형태음운론적 해석)

Kim, Chong-Dok
- Proceedings of the KSPS conference
- /
- 2005.04a
- /
- pp.9-13
- /
- 2005
The aim of this paper is to analyze the vowel lengthening in Korean, whose function is distinctive in the word's level. In this paper, I examined two acoustic parameters : vowel length and formants(F1 and F2) to distinguish or to identify the long vowel and his short correspondant, for exemple, /a:/ and /a/. According to the results of experimental analysis and to the discussion on the vowel length's relation and its influence to Korean phonological system, I considered a vowel lengthening as a prosodeme, so as a prosodic element in Korean phonological system.
PDF

Gradient Reduction of $C_1$ in /pk/ Sequences

Son, Min-Jung
- Speech Sciences
- /
- v.15 no.4
- /
- pp.43-60
- /
- 2008
Instrumental studies (e.g., aerodynamic, EPG, and EMMA) have shown that the first of two stops in sequence can be articulatorily reduced in time and space sometimes; either gradient or categorical. The current EMMA study aims to examine possible factors_linguistic (e.g., speech rate, word boundary, and prosodic boundary) and paralinguistic (e.g., natural context and repetition)_to induce gradient reduction of $C_1$ in /pk/ cluster sequences. EMMA data are collected from five Seoul-Korean speakers. The results show that gradient reduction of lip aperture seldom occurs, being quite restricted both in speaker frequency and in token frequency. The results also suggest that the place assimilation is not a lexical process, implying that speakers have not fully developed this process to be phonologized in the abstract level.
PDF

Considering Dynamic Non-Segmental Phonetics

Fujino, Yoshinari
- Proceedings of the KSPS conference
- /
- 2000.07a
- /
- pp.312-320
- /
- 2000
This presentation aims to explore some possibility of non-segmental phonetics usually ignored in phonetics education. In pedagogical phonetics, especially ESL/EFL oriented phonetics speech sounds tend to be classified in two criteria 1) 'pronunciation' which deals with segments and 2) 'prosody' or 'suprasegmentals', a criterion that deals with non-segmental elements such as stress and intonation. However, speech involves more dynamic processing. It is non-linear and multi-dimensional in spite of the linear sequence of symbols in phonetic/phonological transcriptions. No word is without pitch or voice quality apart from segmental characteristics whether it is spoken in isolation or cut out from continuous speech. This simply tells the dichotomy of pronunciation and prosody is merely a useful convention. There exists some room to consider dynamic non-segmental phonetics. Examples of non-segmental phonetic investigation, some of the analyses conducted within the frame of Firthian Prosodic Analysis, especially of the relation between vowel variants and foot types, are examined and we see what kind of auditory phonetic training is required to understand impressionistic transcriptions which lie behind the non-segmental phonetics.
PDF

Search Result 74, Processing Time 0.018 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)