Search | Korea Science

PROSODY IN SPEECH TECHNOLOGY - National project and some of our related works -

Hirose Keikichi
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.15-18
- /
- 2002
Prosodic features of speech are known to play an important role in the transmission of linguistic information in human conversation. Their roles in the transmission of para- and non- linguistic information are even much more. In spite of their importance in human conversation, from engineering viewpoint, research focuses are mainly placed on segmental features, and not so much on prosodic features. With the aim of promoting research works on prosody, a research project 'Prosody and Speech Processing' is now going on. A rough sketch of the project is first given in the paper. Then, the paper introduces several prosody-related research works, which are going on in our laboratory. They include, corpus-based fundamental frequency contour generation, speech rate control for dialogue-like speech synthesis, analysis of prosodic features of emotional speech, reply speech generation in spoken dialogue systems, and language modeling with prosodic boundaries.
PDF

'Because of Doing' and 'Because of Happening': A Corpus-based Analysis of Korean Causal Conjunctives, -nula(ko) and -nun palamey

Oh, Sang-Suk
- Language and Information
- /
- v.8 no.2
- /
- pp.131-147
- /
- 2004
the two Korean causal conjunctive suffixes, -nula(ko) and -nun palamey, based on corpus linguistic analysis. Many of the linguistic accounts available, both in pedagogical reference and in the literature on linguistics, provide incomplete analyses of these suffixes, based on fabricated linguistic data. Using naturally occurring, real linguistic data, this paper examines the syntactic and semantic structures of the two causal suffixes through a consideration of three areas of corpus linguistic analysis: token frequencies, collocations, and semantic prosody. An analysis based on concordance data reveals that the two causal connectives, -nula(ko) and -nun palamey, have more differences than similarities in terms of syntactic and semantic constraints. The idiosyncratic structures of the two suffixes are discussed in terms of same subject condition, verb selection, same agent condition, synchronicity condition, and negative semantic prosody.
PDF

Comparison of prosodic characteristics by question type in left- and right-hemisphere-injured stroke patients (좌반구 손상과 우반구 손상 뇌졸중 환자의 의문문 유형에 따른 운율 특성 비교)

Yu, Youngmi;Seong, Cheoljae
- Phonetics and Speech Sciences
- /
- v.13 no.3
- /
- pp.1-13
- /
- 2021
This study examined the characteristics of linguistic prosody in terms of cerebral lateralization in three groups of 9 healthy speakers and 14 speakers with a history of stroke (7 with left hemisphere damage (LHD), 7 with right hemisphere damage (RHD)). Specifically, prosodic characteristics related to speech rate, duration, pitch, and intensity were examined in three types of interrogative sentences (wh-questions, yes-no questions, alternative questions) with auditory perceptual evaluation. As a result, the statistically significant key variables showed flaws in production of the linguistic prosody in the speakers with LHD. The statistically significant variables were more insufficiently produced for wh-questions than for yes-no and alternative questions. This trend was particularly noticeable in variables related to pitch and speech rate. This result suggests that when Korean speakers process linguistic prosody, such as that of lexico-semantic and syntactic information in interrogative sentences, the left hemisphere seems to be superior to the right hemisphere.
https://doi.org/10.13064/KSSS.2021.13.3.001 인용 PDF KSCI

Chinese Prosody Generation Based on C-ToBI Representation for Text-to-Speech (음성합성을 위한 C-ToBI기반의 중국어 운율 경계와 F0 contour 생성)

Kim, Seung-Won;Zheng, Yu;Lee, Gary-Geunbae;Kim, Byeong-Chang
- MALSORI
- /
- no.53
- /
- pp.75-92
- /
- 2005
Prosody Generation Based on C-ToBI Representation for Text-to-SpeechSeungwon Kim, Yu Zheng, Gary Geunbae Lee, Byeongchang KimProsody modeling is critical in developing text-to-speech (TTS) systems where speech synthesis is used to automatically generate natural speech. In this paper, we present a prosody generation architecture based on Chinese Tone and Break Index (C-ToBI) representation. ToBI is a multi-tier representation system based on linguistic knowledge to transcribe events in an utterance. The TTS system which adopts ToBI as an intermediate representation is known to exhibit higher flexibility, modularity and domain/task portability compared with the direct prosody generation TTS systems. However, the cost of corpus preparation is very expensive for practical-level performance because the ToBI labeled corpus has been manually constructed by many prosody experts and normally requires a large amount of data for accurate statistical prosody modeling. This paper proposes a new method which transcribes the C-ToBI labels automatically in Chinese speech. We model Chinese prosody generation as a classification problem and apply conditional Maximum Entropy (ME) classification to this problem. We empirically verify the usefulness of various natural language and phonology features to make well-integrated features for ME framework.
PDF

Improvements on Phrase Breaks Prediction Using CRF (Conditional Random Fields) (CRF를 이용한 운율경계추성 성능개선)

Kim Seung-Won;Lee Geun-Bae;Kim Byeong-Chang
- MALSORI
- /
- no.57
- /
- pp.139-152
- /
- 2006
In this paper, we present a phrase break prediction method using CRF(Conditional Random Fields), which has good performance at classification problems. The phrase break prediction problem was mapped into a classification problem in our research. We trained the CRF using the various linguistic features which was extracted from POS(Part Of Speech) tag, lexicon, length of word, and location of word in the sentences. Combined linguistic features were used in the experiments, and we could collect some linguistic features which generate good performance in the phrase break prediction. From the results of experiments, we can see that the proposed method shows improved performance on previous methods. Additionally, because the linguistic features are independent of each other in our research, the proposed method has higher flexibility than other methods.
PDF

The Implicational Meaning and Prosody of Conjunctive Marker '-ko' in Korean (한국어 대등적 연결어미 '-고'의 함축 의미와 운율)

Kim, Mi-Ran
- Speech Sciences
- /
- v.8 no.4
- /
- pp.289-305
- /
- 2001
The conjunctive marker '-ko' in Korean can be interpreted as meaning either conjunctive 'and' or ordering 'and then'. The interpretation of '-ko' is ambiguous in written texts but not in spoken texts. It is because the meaning of the utterance is determined by the combination of the text with its prosody. The two meanings of ' -ko' can be explained by the theory of implicature, which was introduced by Grice (1973, 1981). This paper examines the meaning of the marker '-ko' with respect to the relation between its meaning and prosody. The results of the experiments in this paper showed that the prosodic phrasing in Korean influences the interpretation of the marker '-ko'. When two constituents combined by '-ko' are realized in the same accentual phrase, the marker can be interpreted as meaning 'exactly be orderly'. This meaning can be classified as the Particularlized Conversational Implicature (PCl) in Gricean theory. In the other cases of phrasing, the marker '-ko' can mean either 'conjunctive' or 'be orderly' by the Generalized Conversational Implicature (GCI). The fact that phrasing determines the interpretations of the marker '-ko' can be seen as supporting the view that prosody interacts with various levels of linguistic phenomena from phonology to pragmatics.
PDF

Utilizing Prosodic Information on the Sentence Comprehension in Children with High Functioning Autism

Chung, Chan-Hee;Lee, Hee-Ran;Kim, Jin-Dong
- Biomedical Science Letters
- /
- v.23 no.4
- /
- pp.362-371
- /
- 2017
The purpose of this study is to investigate difficulties in using prosodic information to identify the meaning of ambiguous sentences in children with high functioning autism (HFA). Fifteen high functioning autistic children and fifteen children who matched their chronological age (CA) participated in this study. We compared the performance of the two groups by conducting syntactically and affectively ambiguous sentence comprehension (SASC and AASC) tasks. The results of this study show that in both tasks, the difference between the two groups was statistically significant at each condition and the performance of high functioning autistic children was significantly lower. In a correlation analysis of major variables, children who matched CA showed a correlation between prosody-only (PO) and AASC, while children with HFA showed a correlation between PO and MO (morpheme-only). Children with HFA used grammatical morpheme information to understand general sentences. We found that the ability to use prosodic information in children with HFA is significantly lower than that of normally developed children. Considering the relevance of prosody to linguistic, non-linguistic and emotional aspects of communication, improving prosodic perception is thought to be a way to mediate deficits in the comprehension of ambiguous sentences in children with HFA.
https://doi.org/10.15616/BSL.2017.23.4.362 인용 PDF KSCI

The Study on Korean Prosody Generation using Artificial Neural Networks (인공 신경망의 한국어 운율 발생에 관한 연구)

Min Kyung-Joong;Lim Un-Cheon
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.337-340
- /
- 2004
The exactly reproduced prosody of a TTS system is one of the key factors that affect the naturalness of synthesized speech. In general, rules about prosody had been gathered either from linguistic knowledge or by analyzing the prosodic information from natural speech. But these could not be perfect and some of them could be incorrect. So we proposed artificial neural network(ANN)s that can be trained to team the prosody of natural speech and generate it. In learning phase, let ANNs learn the pitch and energy contour of center phoneme by applying a string of phonemes in a sentence to ANNs and comparing the output pattern with target pattern and making adjustment in weighting values to get the least mean square error between them. In test phase, the estimation rates were computed. We saw that ANNs could generate the prosody of a sentence.
PDF

Prosodic Contour Generation for Korean Text-To-Speech System Using Artificial Neural Networks

Lim, Un-Cheon
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.2E
- /
- pp.43-50
- /
- 2009
To get more natural synthetic speech generated by a Korean TTS (Text-To-Speech) system, we have to know all the possible prosodic rules in Korean spoken language. We should find out these rules from linguistic, phonetic information or from real speech. In general, all of these rules should be integrated into a prosody-generation algorithm in a TTS system. But this algorithm cannot cover up all the possible prosodic rules in a language and it is not perfect, so the naturalness of synthesized speech cannot be as good as we expect. ANNs (Artificial Neural Networks) can be trained to learn the prosodic rules in Korean spoken language. To train and test ANNs, we need to prepare the prosodic patterns of all the phonemic segments in a prosodic corpus. A prosodic corpus will include meaningful sentences to represent all the possible prosodic rules. Sentences in the corpus were made by picking up a series of words from the list of PB (phonetically Balanced) isolated words. These sentences in the corpus were read by speakers, recorded, and collected as a speech database. By analyzing recorded real speech, we can extract prosodic pattern about each phoneme, and assign them as target and test patterns for ANNs. ANNs can learn the prosody from natural speech and generate prosodic patterns of the central phonemic segment in phoneme strings as output response of ANNs when phoneme strings of a sentence are given to ANNs as input stimuli.
PDF KSCI

Modelling Duration In Text-to-Speech Systems

Chung Hyunsong
- MALSORI
- /
- no.49
- /
- pp.159-174
- /
- 2004
The development of the durational component of prosody modelling was overviewed and discussed in text-to-speech conversion of spoken English and Korean, showing the strengths and weaknesses of each approach. The possibility of integrating linguistic feature effects into the duration modelling of TTS systems was also investigated. This paper claims that current approaches to language timing synthesis still require an understanding of how segmental duration is affected by context. Three modelling approaches were discussed: sequential rule systems, Classification and Regression Tree (CART) models and Sums-of-Products (SoP) models. The CART and SoP models show good performance results in predicting segment duration in English, while it is not the case in the SoP modelling of spoken Korean.
PDF

Search Result 18, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)