통합 검색 | Korea Science

음성합성시스템을 위한 음색제어규칙 연구 (A Study on Voice Color Control Rules for Speech Synthesis System)

김진영;엄기완
- 음성과학
- /
- 제2권
- /
- pp.25-44
- /
- 1997
When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.
PDF

성도 공명을 중심으로 한 성악 전공 대학생의 발음법 연구 (Diction Problem of Student Singers Based on the Vocal Tract Resonance)

김선숙
- 음성과학
- /
- 제7권4호
- /
- pp.59-72
- /
- 2000
Vocal tract resonances are of paramount importance to voice sounds. Resonance frequencies determine vowel quality and the personal voice timber. The aim of this study was to make an effective diction program according to tuning formant frequencies by adjusting the vocal tract shape in professional voice users. Twelve male student singers and eleven female student singers participated in this study. The subjects repeated five simple vowels /a, e, i, o, u/ in normal speech and singing. The spoken vowels and sung vowels were measured by formant frequencies and the singer's formant frequencies using CSL and DSP Sona-Graph. Separately, Plot formants program was used to draw the vowel chart. The results were as follows. (1) Total formant frequencies of female singers were 11% higher than those of males singers in singing. (2) The F1 and F3 of sung vowels increased compared to F1 and F3 spoken vowels. However, The F2 of sung vowels decreased in comparison with F2 of spoken vowels. (3) Posterior vowel /u/ were moved anteriorly. This phenomenon seemed to be due to head voice singing training. (4) Singer's formant frequencies in student singers appeared according to the part: 2560 Hz for baritone, 2760 Hz for Tenor, 2821 Hz for Mezzo soprano and 3420 Hz for soprano.
PDF

Resonant Voice in Singers

진성민
- 대한음성언어의학회:학술대회논문집
- /
- 대한음성언어의학회 2003년도 제19회 학술대회
- /
- pp.156-158
- /
- 2003
사람의 발성 기관은 공기를 짜내어 주는 역할을 하는 호흡기관(breathing apparatus)과 소리의 원음을 만들어 내는 성대(vocal folds) 그리고 성대로부터 만들어진 원음을 공명 (resonance) 시키고 여과(filtering) 시킴으로써 특징적인 소리의 모양을 갖추는 역할을 하는 성도(vocal tract)로 크게 나누어 볼 수 있다.
PDF

전기 Glottography(EGG)를 이용한 후두구음역학적 특성 (The Role of the Electroglottography on the Laryngeal Articulation of Speech)

홍기환;박병암;양윤수;서수영;김현기
- 대한후두음성언어의학회지
- /
- 제8권1호
- /
- pp.18-26
- /
- 1997
There are two types of phonetic study, acoustic and physiologic, for differentiating the three manner categories of Korean stop consonants. On the physiologic studies, there are endoscopic, electromyographic(EMG), electroglottographic(EGG) and aerodynamic studies. In this study, I tried to investigate general features of Korean stops using EGG study for the open quotient of vocal fold and baseline shift during speech, and aerodynamic characteristics for e subglottal air pressure, air flow and glottal resistance at consonants. On the aerodynamic study, the glottalized and aspirated stops may be characterized by e increasing subglottal pressure comparing with lenis stop at consonants. The airflow is largest in the aspirated stops followed by lenis stops and glottalized. The glottal airway resistance (GAR) showed highest in the glottalized followed by the lenis, but lowest in e aspirated during e production of consonants, and showed highest in e aspirated, but low in the glottalized and lenis during the production of vowel. The glottal resistance at consonant showed significant difference among consonants and significant interaction between subject and types of consonant. The glottal resistance at vowel showed significant difference among consonants, and e interaction occured between subject and types of consonant. The electroglottography(EGG) has been used for investigating e functioning of e vocal folds during its vibration. The EGG should be related to the patterns of the vocal fold vibration during phonation in characterizing the temporal patterns of each vibratory cycle. The purpose of this study is to investigate the dynamic change of EGG waveforms during continuous speech. The dynamic changes of EGG waveforms fir the three-way distinction of Korean stops were characterized that the aspirated stop appears to be characterized by largest open quotient and smallest glottal contact area of the vocal folds in e initial portion of vocal fold vibration ; the lenis stop by moderate open quotient and glottal contact area ; but the glottalized stop by smallest open quotient and largest glottal contact area. There may be close relationship between the OQ(open quotient) in the initial voice onset and the glottal width at the time of consonant production, the larger glottal width just before vocal fold vibration results in the smaller OQ of the vocal fold vibration in the initial voice onset. The EGG changes of baseline shift during continuous speech production were characterized by the different patterns for the three types of Korean consonants. The small and less stiffness change of baseline shift was found for the lenis and the glottalized, and the largest and stiffest change was found for the aspirated. On the baseline shift for the initial voice onset, they showed so similar patterns with for the consonant production, larger changed in the aspirated. for the lenis and the glottalized during the initial voice onset, three subjects showed individual difference each other. I suggest at s characteristics were strongly related with articulatory activity of vocal tract for the production of consonant, especially for the aspirated stop. The suspecting factors to affect EGG waveforms are glottal width, vertical laryngeal movement and the intrapharyngeal pressure to neighboring tissue during connected spech. So the EGG may be an useful method to describe laryngeal activity to classify pulsing conditions of the larynx during speech production, and EGG research can be controls for monitoring the vocal tract articulation, although above factors to affect EGG would have played such a potentially role on vocal fold vibratory behavior obtained using consonant production.
PDF

선형 스펙트럼쌍을 이용한 성문특성이 제거된 성도특성 추출법에 관한 연구 (A Study on Extraction of Vocal Tract Characteristic After Canceling the Vocal Cord Property Using the Line Spectrum Pairs)

민소연;장경아;배명진
- 한국음향학회지
- /
- 제21권7호
- /
- pp.665-670
- /
- 2002
프리엠퍼시스 필터의 일반적인 형태는 y(n)=s(n)-As(n-1)이고, 여기서 A값은 유성음의 경우 0.9∼l.0사이의 값이다. 또한 A값은 프리엠퍼시스의 기울기 값을 반영하고 기존의 방법에서는 자기상관계수 값인 R(1)/R(0)를 사용한다. 본 논문에서는 성문특성으로 인해 고주파특성이 약화되는 것을 보상하기 위하여 새로운 평탄화 기법을 제안한다. 우선 포만트 주파수 예측을 위해 LSP 파라미터의 간격정보를 사용하였다. 찾아진 포만트 주파수들간의 선형보간을 통해 기울기와 역기울기 값을 구하여 평탄화 과정을 수행한다. 실험결과에서는 제안한 방법이 기존의 방법보다 평탄화 특성이 우수한 것으로 나타났다. 즉 본 논문에서는 약화된 고주파 성분을 보상하는 과정에서 평탄화 요소로 LSP의 간격정보를 사용하였다.
PDF KSCI

마비성구어장애 화자의 조음밸브 교호운동에 관한 공기역학 및 음향학적 특징 (A Study on the Aerodynamic and Acoustic Characteristics in Dysarthria Speakers' Diadochokinesis by Articulation Valves in Vocal Tract)

박희준;권순복;왕수건;정옥란
- 음성과학
- /
- 제15권2호
- /
- pp.177-189
- /
- 2008
This study was to investigate diadochokinetic (DDK) rate, regularity and mean flow rate of articulation valves in dysarthria. DDK rate, mean airflow rate (MFR) and regularity of DDK syllable repetitions of vocal function /ihi/, tongue function /ta/, velopharyngeal function /bm/, and labial function /pa/ in 24 normal and dysarthric speakers were measured. Aerophone Ⅱ and Motor Speech Profile were used for data recording and analysis. The results of the findings were as follows: First, there were significant differences between the dysarthria and the normal group in DDK rate. DDK rates in ataxic dysarthria were the lowest and spastic, flaccid, and hypokinetic dysarthria followed in sequence. Second, there was a significant difference between the dysarthria and the normal group in DDK regularity. Third, there was a significant difference between dysarthria groups and normal group in DDK MFR. Finally, there was a significant difference between the 4 groups of dysarthria and the normal group in DDK air flow tracking. The results of this study can be guidelines for normal DDK rate, regularity and flow rate in dysarthria groups. In addition, their differential diagnoses and descriptions are important to make a decision on medical and behavioral management of the individuals with disorders according to DDK characteristics.
PDF

A Study on Comparison of Pronunciation Accuracy of Soprano Singers

Song, Uk-Jin;Park, Hyungwoo;Bae, Myung-Jin
- International journal of advanced smart convergence
- /
- 제6권2호
- /
- pp.59-64
- /
- 2017
There are three sorts of voices of female vocalists: soprano, mezzo-soprano, and contralto according to the transliteration. Among them, the soprano has the highest vocal range. Since the voice is generated through the human vocal tract based on the voice generation model, it is greatly influenced by the vocal tract. The structure of vocal organs differs from person to person, and the formants characteristic of vocalization differ accordingly. The formant characteristic refers to a characteristic in which a specific frequency band appears distinctly due to resonance occurring in each vocal tract in the vocal process. Formant characteristics include personality that occurs in the throat, jaw, lips, and teeth, as well as phonological properties of phonemes. The first formant is the throat, the second formant is the jaw, the third formant and the fourth formant are caused by the resonance phenomenon in the lips and the teeth. Among them, pronunciation is influenced not only by phonological information but also by jaws, lips and teeth. When the mouth is small or the jaw is stiff when pronouncing, pronunciation becomes unclear. Therefore, the higher the accuracy of the pronunciation characteristics, the more clearly the formant characteristics appear in the grammar spectrum. However, many soprano singers can not open their mouths because their jaws, lips, teeth, and facial muscles are rigid to maintain high tones when singing, which makes the pronunciation unclear and thus the formant characteristics become unclear. In this paper, in order to confirm the accuracy of the pronunciation characteristics of soprano singers, the experimental group was selected as the soprano singers A, B, C, D, E of Korea and analyzed the grammar spectrum and conducted the MOS test for pronunciation recognition. As a result, soprano singer B showed a clear recognition from F1 to F5 and MOS test result showed the highest recognition rate with 4.6 points. Soprano singers A, C, and D appear from F1 to F3, but it was difficult to find formants above 2kHz. Finally, the soprano singer E had difficulty in finding the formant as a whole, and MOS test showed the lowest recognition rate at 2.1 points. Therefore, we confirmed that the soprano singer B, which exhibits the most distinct formant characteristics in the grammar spectrum, has the best pronunciation accuracy.
https://doi.org/10.7236/IJASC.2017.6.2.59 인용 PDF KSCI

고음질을 갖는 음색변경에 관한 연구 (A Study on the Voice Conversion Algorithm with High Quality)

박형빈;배명진
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2000년도 제13회 신호처리 합동 학술대회 논문집
- /
- pp.157-160
- /
- 2000
In the generally a voice conversion has used VQ(Vector Quantization) for partitioning the spectral feature and has performed by adding an appropriate offset vector to the source speaker's spectral vector. But there is not represented the target speaker's various characteristics because of discrete characteristics of transformed parameter. In this paper, these problems are solved by using the LMR(Linear Multivariate Regression) instead of the mapping codebook which is determined to the relationship of source and target speaker vocal tract characteristics. Also we propose the method for solved the discontinuity which is caused by applying to time aligned parameters using Dynamic Time Warping the time or pitch-scale modified speech. In our proposed algorithm for overcoming the transitional discontinuities, first of all, we don't change time or pitch scale and by using the LMR change a speaker's vocal tract characteristics in speech with non-modified time or pitch. Compared to existed methods based on VQ and LMR, we have much better voice quality in the result of the proposed algorithm.
PDF

감정에 강인한 음성 인식을 위한 음성 파라메터 (Speech Parameters for the Robust Emotional Speech Recognition)

김원구
- 제어로봇시스템학회논문지
- /
- 제16권12호
- /
- pp.1137-1142
- /
- 2010
This paper studied the speech parameters less affected by the human emotion for the development of the robust speech recognition system. For this purpose, the effect of emotion on the speech recognition system and robust speech parameters of speech recognition system were studied using speech database containing various emotions. In this study, mel-cepstral coefficient, delta-cepstral coefficient, RASTA mel-cepstral coefficient and frequency warped mel-cepstral coefficient were used as feature parameters. And CMS (Cepstral Mean Subtraction) method were used as a signal bias removal technique. Experimental results showed that the HMM based speaker independent word recognizer using vocal tract length normalized mel-cepstral coefficient, its derivatives and CMS as a signal bias removal showed the best performance of 0.78% word error rate. This corresponds to about a 50% word error reduction as compare to the performance of baseline system using mel-cepstral coefficient, its derivatives and CMS.
https://doi.org/10.5302/J.ICROS.2010.16.12.1137 인용 PDF KSCI

Spectral Characteristics and Nasalance Scores of Hypernasality in Patient with Cleft Palate

Soh, Byung-Soo;Shin, Hyo-Keun;Kim, Hyun-Gi
- 음성과학
- /
- 제12권1호
- /
- pp.27-35
- /
- 2005
Differential instrumentation for the diagnoses of individuals with Cleft palate has been used to objectively measure speech problems. The Cepstrum Method was used to study the vocal tract transfer function. The vocal tract transfer function and the source spectrum should be considered in the evaluation of nasal resonance. The aim of this study was to collect quantitative data on the acoustic Instrumentation used for evaluating hypernasality. Normal subjects (9 male, 21 female; 37 male children, 20 female children) and individuals with VPI (13 male, 8 female; 16 male children, 9 female) participated in this study. The vowel /i/ was selected to gauge the severances of hypernasality Spectral and Cepstral studies using CSL was used to identify the acoustic characteristics. Cepstrum analysis shows significant differences in quefrency and amplitude. The quefrency of normal groups was shorter than that of the VPI groups, while the amplitude of normal groups was lower than that of the VPI groups. This may have significance in the evaluation 'of nasal resonance.
PDF

검색결과 172건 처리시간 0.022초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)