• Title/Summary/Keyword: Prosodic phrase

Search Result 89, Processing Time 0.025 seconds

A Pre-Selection of Candidate Units Using Accentual Characteristic In a Unit Selection Based Japanese TTS System (일본어 악센트 특징을 이용한 합성단위 선택 기반 일본어 TTS의 후보 합성단위의 사전선택 방법)

  • Na, Deok-Su;Min, So-Yeon;Lee, Kwang-Hyoung;Lee, Jong-Seok;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.4
    • /
    • pp.159-165
    • /
    • 2007
  • In this paper, we propose a new pre-selection of candidate units that is suitable for the unit selection based Japanese TTS system. General pre-selection method performed by calculating a context-dependent cost within IP (Intonation Phrase). Different from other languages, however. Japanese has an accent represented as the height of a relative pitch, and several words form a single accentual phrase. Also. the prosody in Japanese changes in accentual phrase units. By reflecting such prosodic change in pre-selection. the qualify of synthesized speech can be improved. Furthermore, by calculating a context-dependent cost within accentual phrase, synthesis speed can be improved than calculating within intonation phrase. The proposed method defines AP. analyzes AP in context and performs pre-selection using accentual phrase matching which calculates CCL (connected context length) of the Phoneme's candidates that should be synthesized in each accentual phrase. The baseline system used in the proposed method is VoiceText, which is a synthesizer of Voiceware. Evaluations were made on perceptual error (intonation error, concatenation mismatch error) and synthesis time. Experimental result showed that the proposed method improved the qualify of synthesized speech. as well as shortened the synthesis time.

A prosodic cue representing scopes of wh-phrases in Korean: Focusing on North Gyeongsang Korean (한국어 의문사 작용역을 나타내는 운율 단서: 경북 방언을 중심으로)

  • Yun, Weonhee;Kim, Ki-tae;Park, Sunwoo
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.41-53
    • /
    • 2020
  • A wh-phrase in an embedded sentence may have either an embedded or a matrix scope. Interpretation of a wh-phrase with a matrix scope has tended to be syntactically unacceptable unless the sentence reads with a wh-intonation. Previous studies have found two differences in prosodic characteristics between sentences with matrix and embedded scopes. Firstly, peak F0s in wh-phrases produced with an F0 compression wh-intonation are higher than those in indirect questions, and peak F0s in matrix verbs are lower than those in sentences with embedded scope. Secondly, a substantial F0 drop is found at the end of embedded sentences in indirect questions, whereas no F0 reduction at the same point is noticed in sentences with a matrix scope produced with a high plateau wh-intonation. However, these characteristics were not found in our experiment. This showed that a more compelling difference exists in the values obtained from subtraction between the peak F0s of each word (or a word plus an ending or case marker) and the F0s at the end of the word. Specifically, the gap between the peak F0 in a word composed with an embedded verb and the F0 at the end of the word, which is a complementizer in Korean, is large in embedded wh-scope sentences and low in matrix wh-scope sentences.

The Implicational Meaning and Prosody of Conjunctive Marker '-ko' in Korean (한국어 대등적 연결어미 '-고'의 함축 의미와 운율)

  • Kim, Mi-Ran
    • Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.289-305
    • /
    • 2001
  • The conjunctive marker '-ko' in Korean can be interpreted as meaning either conjunctive 'and' or ordering 'and then'. The interpretation of '-ko' is ambiguous in written texts but not in spoken texts. It is because the meaning of the utterance is determined by the combination of the text with its prosody. The two meanings of ' -ko' can be explained by the theory of implicature, which was introduced by Grice (1973, 1981). This paper examines the meaning of the marker '-ko' with respect to the relation between its meaning and prosody. The results of the experiments in this paper showed that the prosodic phrasing in Korean influences the interpretation of the marker '-ko'. When two constituents combined by '-ko' are realized in the same accentual phrase, the marker can be interpreted as meaning 'exactly be orderly'. This meaning can be classified as the Particularlized Conversational Implicature (PCl) in Gricean theory. In the other cases of phrasing, the marker '-ko' can mean either 'conjunctive' or 'be orderly' by the Generalized Conversational Implicature (GCI). The fact that phrasing determines the interpretations of the marker '-ko' can be seen as supporting the view that prosody interacts with various levels of linguistic phenomena from phonology to pragmatics.

  • PDF

Prosodic Phrasing and Focus in Korea

  • Baek, Judy Yoo-Kyung
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.246-246
    • /
    • 1996
  • Purpose: Some of the properties of the prosodic phrasing and some acoustic and phonological effects of contrastive focus on the tonal pattern of Seoul Korean is explored based on a brief experiment of analyzing the fundamental frequency(=FO) contour of the speech of the author. Data Base and Analysis Procedures: The examples were chosen to contain mostly nasal and liquid consonants, since it is difficult to track down the formants in stops and fricatives during their corresponding consonantal intervals and stops may yield an effect of unwanted increase in the FO value due to their burst into the following vowel. All examples were recorded three times and the spectrum of the most stable repetition was generated, from which the FO contour of each sentence was obtained, the peaks with a value higher than 250Hz being interpreted as a high tone (=H). The result is then discussed within the prosodic hierarchy framework of Selkirk (1986) and compared with the tonal pattern of the Northern Kyungsang dialect of Korean reported in Kenstowicz & Sohn (1996). Prosodic Phrasing: In N.K. Korean, H never appears both on the object and on the verb in a neutral sentence, which indicates the object and the verb form a single Phonological Phrase ($={\phi}$), given that there is only one pitch peak for each $={\phi}$. However, Seoul Korean shows that both the object and the verb have H of their own, indicating that they are not contained in one $={\phi}$. This violates the Optimality constraint of Wrap-XP (=Enclose a lexical head and its arguments in one $={\phi}$), while N.K. Korean obeys the constraint by grouping a VP in a single $={\phi}$. This asymmetry can be resolved through a constraint that favors the separate grouping of each lexical category and is ranked higher than Wrap-XP in Seoul Korean but vice versa in N.K. Korean; $Align-x^{lex}$ (=Align the left edge of a lexical category with that of a $={\phi}$). (1) nuna-ka manll-ll mEk-nIn-ta ('sister-NOM garlic-ACC eat-PRES-DECL') a. (LLH) (LLH) (HLL) ----Seoul Korean b. (LLH) (LLL LHL) ----N.K. Korean Focus and Phrasing: Two major effects of contrastive focus on phonological phrasing are found in Seoul Korean: (a) the peak of an Intonatioanl Phrase (=IP) falls on the focused element; and (b) focus has the effect of deleting all the following prosodic structures. A focused element always attracts the peak of IP, showing an increase of approximately 30Hz compared with the peak of a non-focused IP. When a subject is focused, no H appears either on the object or on the verb and a focused object is never followed by a verb with H. The post-focus deletion of prosodic boundaries is forced through the interaction of StressFocus (=If F is a focus and DF is its semantic domain, the highest prominence in DF will be within F) and Rightmost-IP (=The peak of an IP projects from the rightmost $={\phi}$). First Stress-F requires the peak of IP to fall on the focused element. Then to avoid violating Rightmost-IP, all the boundaries after the focused element should delete, minimizing the number of $={\phi}$'s intervening from the right edge of IP. (2) (omitted) Conclusion: In general, there seems to be no direct alignment constraints between the syntactically focused element and the edge of $={\phi}$ determined in phonology; all the alignment effects come from a single requirement that the peak of IP projects from the rightmost $={\phi}$ as proposed in Truckenbrodt (1995).

  • PDF

Production and Perception from Perspective of Focus

  • Noh, Bo-Kyung
    • Language and Information
    • /
    • v.6 no.1
    • /
    • pp.105-121
    • /
    • 2002
  • This paper investigates the effect of semantic argument structure on the comprehension and production of sentences by observing the prosodic realizations of English secondary predications. Specifically, the goal of this study is to show how the theory of predication, argument structure, and focus semantically interact to account for similarities and differences between English resultative and depictive predications. To address this issue, production and comprehension tests were performed. In the fried focus domain (verb phrase), subjects were asked to utter and to comprehend ambiguous sentences in the context monologues. The experimental results were generally consistent with general linguistic analyses: In the resultative constructions, secondary subject NPs tend to be accented, as in other argument-head constructions, while in the depictive constructions, secondary predicates tend to have accents, as in other adjunct-head constructions.

  • PDF

Minimization of Prediction System of Prosodic Phrase Boundaries (경량화 운율구 경계 예측 시스템 개발)

  • Kim, Minho;Jung, Youngim;Kwon, Hyuk-Chul
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.744-747
    • /
    • 2010
  • 운율구 경계 예측은 TTS(Text-To-Speech) 엔진이 정확하고 자연스러운 음성합성을 하기 위해 꼭 필요한 기술이다. 하지만, 소프트웨어나 하드웨어적 자원을 많이 요구하는 기술이기 때문에 실행 환경의 제약을 많이 받는다. 본 논문에서는 소형 전자제품과 같이 제한된 환경에서도 안정적으로 실현되는 경량화 운율구 경계 예측 시스템의 개발 과정과 결과에 대하여 설명한다. 운율구 경계 예측 시스템의 필수 요소인 형태소 분석기의 경량화와 전통적인 규칙 기반 운율구 경계 예측 기술과 달리 품사 분석과 구문 분석이 필요하지 않은 운율구 경계 예측 기술을 소개한다.

Patterns of categorical perception and response times in the matrix scope interpretation of embedded wh-phrases in Gyeongsang Korean (경상 방언 내포문 의문사의 작용역 범주 지각 양상과 반응 속도 연구)

  • Weonhee Yun
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.1-11
    • /
    • 2023
  • This study investigated the response time and patterns of categorical perception of the wh-scope of an embedded clause with the non-bridge verb, "gung-geum hada 'wonder'," in the matrix verb phrase in Gyeongsang Korean. Using the same procedure as Yun (2022), 72 responses and response times for each stimulus were collected from 24 participants over the course of three trials. The stimuli were recorded readings of 40 speakers (20 male, 20 female). Context was provided to induce a matrix scope interpretation of the embedded wh-phrase in the target sentence. We sorted the 40 stimuli according to the number of matrix scope responses each received, and charted the response times for each stimulus. Although there was considerable overlap for the different types of wh-scope interpretations, there was a clear difference in categorical perception between the matrix and embedded scopes. The 24 participants also differed in their categorical perceptions. The results suggested that response time and wh-scope interpretation were not directly related and that two main weighted factors affected wh-scope interpretation: morpho-syntactic constraints and prosodic structural integrity. The weighting of each of these factors was inversely correlated and varied among subjects.

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

  • Kim, Sang-hun; Park, Jun;Lee, Young-jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.24-33
    • /
    • 2001
  • this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.

  • PDF

A Neural Network Based Korean Segmental Duration Modeling Using Tonal Information of Phonemes (음소별 성조 정보를 이용한 신경망 기반의 한국어 음소 지속시간 모델링)

  • 김은경;이상호;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.6
    • /
    • pp.84-88
    • /
    • 1999
  • The accurate estimation of segmental duration is crucial for natural-sounding text-to-speech synthesis. For predicting Korean segmental durations, conventional methods utilized phonemic context, part-of-speech context and locational information in prosodic phrase. In this paper, the tonal information of phonemes is employed for more accurate prediction. After defining two non-boundary tones and six boundary tones, we annotated the tonal label on each syllable of 400 sentences. To predict segmental duration using tonal information, we constructed neural networks with a real-valued output node predicting phonemic duration and trained them by backpropagation algorithm. Experimental results showed that the proposed features are effective for predicting Korean segmental durations, and we got 0.863 correlation coefficient of the observed durations and predicted ones.

  • PDF