• 제목/요약/키워드: speech factors

검색결과 352건 처리시간 0.023초

한국어 text-to-speech(TTS) 시스템을 위한 엔드투엔드 합성 방식 연구 (An end-to-end synthesis method for Korean text-to-speech systems)

  • 최연주;정영문;김영관;서영주;김회린
    • 말소리와 음성과학
    • /
    • 제10권1호
    • /
    • pp.39-48
    • /
    • 2018
  • A typical statistical parametric speech synthesis (text-to-speech, TTS) system consists of separate modules, such as a text analysis module, an acoustic modeling module, and a speech synthesis module. This causes two problems: 1) expert knowledge of each module is required, and 2) errors generated in each module accumulate passing through each module. An end-to-end TTS system could avoid such problems by synthesizing voice signals directly from an input string. In this study, we implemented an end-to-end Korean TTS system using Google's Tacotron, which is an end-to-end TTS system based on a sequence-to-sequence model with attention mechanism. We used 4392 utterances spoken by a Korean female speaker, an amount that corresponds to 37% of the dataset Google used for training Tacotron. Our system obtained mean opinion score (MOS) 2.98 and degradation mean opinion score (DMOS) 3.25. We will discuss the factors which affected training of the system. Experiments demonstrate that the post-processing network needs to be designed considering output language and input characters and that according to the amount of training data, the maximum value of n for n-grams modeled by the encoder should be small enough.

조음장애아동과 비장애아동의 말운동통제 보상능력 비교 (Compensation Ability in Speech Motor Control in Children with and without Articulation Disorders)

  • 송윤경;심현섭
    • 음성과학
    • /
    • 제15권3호
    • /
    • pp.183-201
    • /
    • 2008
  • This study attempted to reveal the physiologic etiology or related factors associated with speech processing by comparing the compensation ability in speech motor control in children with and without articulation disorders. Subjects were 35 children with articulation disorder and 35 children without articulation disorder whose age ranged from 5 to 6 years. They were asked to rapidly repeat /$p^ha$/, /$t^ha$/, /$k^ha$/, /$p^hat^hak^ha$/ diadochokinetic movement while mandible was free and mandible was stabilized with bite block. The results showed that children with articulation disorder revealed significantly greater difference in elapsed time for diadochokinetic movement between mandible free and stabilized state compared to the without articulation disorder group. But the correlation between the percentage of consonants correct and the compensation ability in speech motor control in the articulation disorder group was irrelevant. These results point out to the fact that children with articulation disorder have poor compensation ability in speech motor control compared to the children without articulation disorder. On the other hand, the poor ability does not have any relation with the severity of articulation disorder. These results suggest either general or individual characteristics of children with articulation disorder.

  • PDF

감성 평가를 이용한 듣기 좋은 음성 합성음에 대한 연구 (Evaluation of Synthetic Voice which is Agreeable to the Ear Using Sensibility Ergonomics Method)

  • 박용국;김재국;전용웅;조암
    • 대한인간공학회지
    • /
    • 제21권1호
    • /
    • pp.51-65
    • /
    • 2002
  • As the method of providing information is getting multimedia, the synthetic voice is used in not only CTI(Computer Telephony Integration), information service for the blind, but also applications on internet. But properties of synthetic voice, such as speech rate, pitch, timbre and so on, are not adjusted to customers' preference but providers' preference. In order to consider customers' preference, this study proposed four subjective factors of voice through the evaluation of voice using the method of sensibility ergonomics. And the relation synthetic voice to be agreeable to the ear with emotional images was formulated as a fuzzy model. Consequently, this study proposed the speech rate and pitch of synthetic voice which is agreeable to the ear.

한국어 후설 고·중모음에 대한 사회음성학적 연구 (A sociophonetic study on high/mid back vowels in Korean)

  • 이향원;신우봉;신지영
    • 말소리와 음성과학
    • /
    • 제9권2호
    • /
    • pp.39-51
    • /
    • 2017
  • The current study aims to investigate the effect of sociolinguistic factors such as region, generation and gender on the acoustic properties of Korean high and mid back vowels. We analyzed the vowel productions of one hundred twenty-eight subjects from the Korean Standard Speech Database, chosen to represent the different possible combinations of region, generation, and gender. The results reveal a chain-like shift in the back vowels. Unlike previous studies that have reported /o/-/u/ becoming closer as a result of a decreasing F1 in /o/, we found that the distance between the two vowels is decided more by the changing F2 in /u/. Also, the F2 of /u/ and /ɯ/, and the F2 of /ʌ/ and F1 of /o/ appear to move in tandem. Lastly, this study suggests that the reason the vowel changes differ across gender and regional dialects could be because they are all converging on to the standard Korean.

TTS DB 압축을 위한 광대역 파형보간 부호기 구현 (Implementation of Wideband Waveform Interpolation Coder for TTS DB Compression)

  • 양희식;한민수
    • 대한음성학회지:말소리
    • /
    • 제55권
    • /
    • pp.143-158
    • /
    • 2005
  • The adequate compression algorithm is essential to achieve high quality embedded TTS system. in this paper, we Propose waveform interpolation coder for TTS corpus compression after many speech coder investigation. Unlike speech coders in communication system, compression rate and anality are more important factors in TTS DB compression than other performance criteria. Thus we select waveform interpolation algorithm because it provides good speech quality under high compression rate at the cost of complexity. The implemented coder has bit rate 6kbps with quality degradation 0.47. The performance indicates that the waveform interpolation is adequate for TTS DB compression with some further study.

  • PDF

벅아이 코퍼스의 모음 길이 연구 (A Study on the Vowel Duration of the Buckeye Corpus)

  • 정혜정;윤규철
    • 말소리와 음성과학
    • /
    • 제7권4호
    • /
    • pp.103-110
    • /
    • 2015
  • The purpose of this study is to assess the vowel property by examining the vowel duration of the American English vowles found in the Buckeye corpus[6]. The vowel durations were analyzed in terms of various linguistic factors including the number of syllables of the word containing the vowel, the location of the vowel in a word, types of stress, function versus content word, the word frequency in the corpus and the speech rate calculated from the three consecutive words. The findings from this work agreed mostly with those from earlier studies, but with some exceptions. The relationship between the speech rate and the vowel duration proved non-linear.

Vowel Fundamental Frequency in Manner Differentiation of Korean Stops and Affricates

  • Jang, Tae-Yeoub
    • 음성과학
    • /
    • 제7권1호
    • /
    • pp.217-232
    • /
    • 2000
  • In this study, I investigate the role of post-consonantal fundamental frequency (F0) as a cue for automatic distinction of types of Korean stops and affricates. Rather than examining data obtained by restricting contexts to a minimum to prevent the interference of irrelevant factors, a relatively natural speaker independent speech corpus is analysed. Automatic and statistical approaches are adopted to annotate data, to minimise speaker variability, and to evaluate the results. In spite of possible loss of information during those automatic analyses, statistics obtained suggest that vowel F0 is a useful cue for distinguishing manners of articulation of Korean non-continuant obstruents having the same place of articulation, especially of lax and aspirated stops and affricates. On the basis of the statistics, automatic classification is attempted over the relevant consonants in a specific context where the micro-prosodic effects appear to be maximised. The results confirm the usefulness of this effect in application for Korean phone recognition.

  • PDF

한국어 억양구의 경계톤 (The Boundary Tones in Korean Intonational Phrases)

  • 한선희;오미라
    • 음성과학
    • /
    • 제5권2호
    • /
    • pp.109-129
    • /
    • 1999
  • A study of boundary tones, which are realized at the final syllable of an Intonational Phrase, is important in that sentential meaning is often differentiated solely by the use of different boundary tones in Korean. The purposes of this paper are three-fold: Firstly, it aims at finding out the different characteristics of boundary tones between designed corpus and natural speech. Secondly, it is to show that gender and dialectal differences are crucial factors in determining different realizations of boundary tones. Finally, this study is to provide a basis for better speech synthesis and speech recognition through the analysis of the morphemes where boundary tones are realized. This study has shown that nine different kinds of boundary tones are realized based on the contextual, gender and dialectal differences. In addition to the boundary tones suggested in Jun (1993), three more boundary toes are introduced: L-%,H-%,LHLH%.

  • PDF

벅아이 코퍼스 파열음의 성대진동 개시시간 연구 (A Study on the Voice Onset Times of the Buckeye Corpus Stops)

  • 박수희;윤규철
    • 말소리와 음성과학
    • /
    • 제8권1호
    • /
    • pp.9-17
    • /
    • 2016
  • The purpose of this work is to examine the voice onset times(VOTs) of the voiceless and voiced stops from the ten young male speakers of the Buckeye corpus[9]. The factors that are known to affect VOTs were also extracted, including the place of articulation, height of following vowels, location within word, presence of a preceding [s], status of the target word with respect to the content versus function word, presence of a syllabic stress, word frequency and speech rate. Findings from this work mostly agreed with those from earlier studies on English, but with some exceptions and new discoveries. We hope that this work can contribute to figuring out the nature and properties of the spontaneous speech of English.

음성 파형분절의 지수함수 스므딩 기법에 관한 연구 (The Study on the Expential Smoothing Method of the Concatenation Parts in the Speech Waveform)

  • 박찬수
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1991년도 학술발표회 논문집
    • /
    • pp.7-10
    • /
    • 1991
  • In a text-to-speech system, sound units (phonemes, words, or phrases, etc.) can be concatenated together to produce required utterance. The quality of the resulting speech is dependent on factors including the phonological/prosodic contour, the quality of basic concatenation units, and how well the units join together. Thus although the quality of each basic sound unit is high, if occur the discontinuity in the concatenation part then the quality of synthesis speech is decrease. To solve this problem, a smoothing operation should be carried out in concatenation parts. But a major problem is that, as yet, no method of parameter smoothing is available for joining the segment together. Thus in this paper, we proposed a new aigorithm that smoothing the unnatural discountinuous parts which can be occured in speech waveform editing. This algorithm used the exponential smoothing method.

  • PDF