Search | Korea Science

Voice transformation for HTS using correlation between fundamental frequency and vocal tract length (기본주파수와 성도길이의 상관관계를 이용한 HTS 음성합성기에서의 목소리 변환)

Yoo, Hyogeun;Kim, Younggwan;Suh, Youngjoo;Kim, Hoirin
- Phonetics and Speech Sciences
- /
- v.9 no.1
- /
- pp.41-47
- /
- 2017
The main advantage of the statistical parametric speech synthesis is its flexibility in changing voice characteristics. A personalized text-to-speech(TTS) system can be implemented by combining a speech synthesis system and a voice transformation system, and it is widely used in many application areas. It is known that the fundamental frequency and the spectral envelope of speech signal can be independently modified to convert the voice characteristics. Also it is important to maintain naturalness of the transformed speech. In this paper, a speech synthesis system based on Hidden Markov Model(HMM-based speech synthesis, HTS) using the STRAIGHT vocoder is constructed and voice transformation is conducted by modifying the fundamental frequency and spectral envelope. The fundamental frequency is transformed in a scaling method, and the spectral envelope is transformed through frequency warping method to control the speaker's vocal tract length. In particular, this study proposes a voice transformation method using the correlation between fundamental frequency and vocal tract length. Subjective evaluations were conducted to assess preference and mean opinion scores(MOS) for naturalness of synthetic speech. Experimental results showed that the proposed voice transformation method achieved higher preference than baseline systems while maintaining the naturalness of the speech quality.
https://doi.org/10.13064/KSSS.2017.9.1.041 인용 PDF KSCI

Estimation of Fundamental Frequency Using an Instantaneous Frequency Based on the Symmetric Higher Order Differential Energy Operator (대칭구조를 갖는 일반적인 고차의 미분 에너지함수를 기반한 순간주파수를 이용한 음성의 기본주파수 추정)

Iem, Byeong-Gwan
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.60 no.12
- /
- pp.2374-2379
- /
- 2011
The fundamental frequency of the voiced speech is estimated using the instantaneous frequency based on the symmetric higher order differential energy operator. The instantaneous frequency based on the symmetric higher order energy operator shows better frequency estimation result since it is aligned to the time instance of the signal. The speech is pre-processed by a lowpass filter to remove higher frequency components. Then, it is processed by the instantaneous frequency to obtain the fundamental frequency estimates. The symmetric higher order energy operator has been used as an indicator to determine the voiced/unvoiced speech. The fundamental frequency estimates are further processed by a moving average filter to obtain the monotonically changed estimates. The obtained fundamental frequency estimates have been compared with the spectrogram of the speech to confirm its accuracy.
https://doi.org/10.5370/KIEE.2011.60.12.2374 인용 PDF KSCI

The Effect of Visual Feedback Intervention on Voice Pitch of Adult with Hearing Impairment (선천성 청각장애성인의 시각적피드백 이용 음도치료 효과)

Euh, Su-Ji;Yoon, Mi-Sun
- Speech Sciences
- /
- v.12 no.4
- /
- pp.215-226
- /
- 2005
This study is an attempt to investigate effect of pitch treatment program using visual feedback for profound deaf adults. Dr. Speech program was applied as a training tool. The subjects of this study were 3 profound deaf adults. Speech samples for evaluation were vowel prolongations and connected speech. Analysis was performed under the principle of single subject research design. As results of this study, all subjects showed the treatment effects which were represented by lowering fundamental frequency and speaking fundamental frequency.
PDF

Fundamental Frequency Estimation of Voiced Speech Signals Based on the Inflection Point Detection (변곡점 검출에 기반한 음성의 기본 주파수 추정)

Byeonggwan Iem
- Journal of IKEEE
- /
- v.27 no.4
- /
- pp.472-476
- /
- 2023
Fundamental frequency/pitch period are major characteristics of speech signals. They are used in many speech applications like speech coding, speech recognition, speaker identification, and so on. In this paper, some of inflection points are used to estimate the pitch which is the inverse of the fundamental frequency. The inflection points are defined as points where local maxima, local minima or the slope changes occur. The speech signal is preprocessed to remove unnecessary inflection points due to the high frequency components using a low pass filter. Only the inflection points from local maxima are used to get the pitch period. While the existing pitch estimation methods process speech signals in blockwise, the proposed method detects the inflection points in sample and produces the pitch period/fundamental frequency estimates along the time. Computer simulation shows the usefulness of the proposed method as a fundamental frequency estimator.
https://doi.org/10.7471/ikeee.2023.27.4.472 인용 PDF

Filtering of a Dissonant Frequency for Speech Enhancement

Kang, Sang-Ki;Baek, Seong-Joon;Lee, Ki-Yong;Sun, Koeng-Mo
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.3E
- /
- pp.110-112
- /
- 2003
There have been numerous studies on the enhancement of the noisy speech signal. In this paper, we propose a completely new speech enhancement scheme, that is, a filtering of a dissonant frequency (especially F# in each octave of the tempered scale) based on the fundamental frequency which is developed in frequency domain. In order to evaluate the performance of the proposed enhancement scheme, subjective tests (MOS tests) were conducted. The subjective test results indicate that the proposed method provides a significant gain in audible improvement especially for speech contaminated by colored noise and speaking in a husky voice. Therefore when the filter is employed as a pre-filter for speech enhancement, the output speech quality and intelligibility is greatly enhanced.
PDF KSCI

Fundamental Frequency Estimation based on Time-Frequency Analysis (시주파수 분석법을 이용한 음성의 기본주파수 검출)

Iem Byeong-Gwan
- The Transactions of the Korean Institute of Electrical Engineers D
- /
- v.55 no.1
- /
- pp.31-34
- /
- 2006
A simple robust fundamental frequency estimator on the time-frequency domain is proposed. Combined with the appropriately designed low-pass filter, the instantaneous frequency estimator based on the Teager-Kaiser energy function can detect the fundamental frequency of speech signal. The Teager-Kaiser function can be obtained through real computation and show the change of frequency as time goes. And when a speech block with N samples is processed with a lowpass fille. with length of L, it requires $O(N{\cdot}(L+5))operations,$ compared to $O(N{\cdot}2log_2N+L))operations$ in the recently introduced wavelet and conventional instantaneous frequency method. The computer simulation confirms the usefulness of the proposed fundamental frequency estimation method.
PDF KSCI

Vowel Fundamental Frequency in Manner Differentiation of Korean Stops and Affricates

Jang, Tae-Yeoub
- Speech Sciences
- /
- v.7 no.1
- /
- pp.217-232
- /
- 2000
In this study, I investigate the role of post-consonantal fundamental frequency (F0) as a cue for automatic distinction of types of Korean stops and affricates. Rather than examining data obtained by restricting contexts to a minimum to prevent the interference of irrelevant factors, a relatively natural speaker independent speech corpus is analysed. Automatic and statistical approaches are adopted to annotate data, to minimise speaker variability, and to evaluate the results. In spite of possible loss of information during those automatic analyses, statistics obtained suggest that vowel F0 is a useful cue for distinguishing manners of articulation of Korean non-continuant obstruents having the same place of articulation, especially of lax and aspirated stops and affricates. On the basis of the statistics, automatic classification is attempted over the relevant consonants in a specific context where the micro-prosodic effects appear to be maximised. The results confirm the usefulness of this effect in application for Korean phone recognition.
PDF

A Study of FO's realization in Emotional speech (감정에 따른 음성의 기본주파수 실현 연구)

Park, Mi-Young;Park, Mi-Kyoung
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.79-85
- /
- 2005
In this Paper, we are trying to compare the normal speech with emotional speech -happy, sad, and angry states- through the changes of fundamental frequency. Based on the distribution charts of the normal and emotional speech, there are distinctive cues such as range of distribution, average, maximum, minimum, and so on. On the whole, the range of the fundamental frequency is extended in happy and angry states. On the other hand, sad states make the range relatively lessened. Nevertheless, the ranges of the 10 frequency in sad states are wider than the normal speech. In addition, we can verify that ending boundary tones reflect the information of whole speech.
PDF

A comparison of the absolute error of estimated speaking fundamental frequency (AEF0) among etiological groups of voice disorders (음성장애의 병인 집단 간 추정 발화 기본주파수 절대 오차 비교)

Seung Jin Lee;Jae-Yol Lim;Jaeock Kim
- Phonetics and Speech Sciences
- /
- v.15 no.4
- /
- pp.53-60
- /
- 2023
This study compared the absolute error of estimated fundamental frequency (AEF0) using voice - (VRP) and speech range profile (SRP) tasks across various etiological groups with voice disorders. Additionally, we explored the association between AEF0 and related voice parameters within each specific etiological group. The participants included 120 individuals, comprising 30 each from the functional (FUNC), organic (ORGAN), and eurological (NEUR) voice disorder groups, and a normal control group (NC). Each participant performed voice and SRP tasks, and the fundamental frequency of connected speech was measured using electroglottography (EGG). When comparing the AEF0 measures across the etiological groups, there were no differences in Grade and Severity among the patients. However, variations were observed in AEF0_VRP and AEF0_SUM. Specifically, AEF0_VRP was higher in the ORGAN group than in the FUNC and NC groups, whereas AEF0_SUM was higher in the ORGAN group than in the NC group. Furthermore, within FUNC and NEUR, AEF0 showed a positive correlation with Grade, while in ORGAN, it exhibited a positive correlation with the mean closed quotient (CQ). Attention should be paid to the application of AEF0 measures and related voice variables based on the etiological group. This study provides foundational information for the clinical application of AEF0 measures.
https://doi.org/10.13064/KSSS.2023.15.4.053 인용 PDF

AM-FM Decomposition and Estimation of Instantaneous Frequency and Instantaneous Amplitude of Speech Signals for Natural Human-robot Interaction (자연스런 인간-로봇 상호작용을 위한 음성 신호의 AM-FM 성분 분해 및 순간 주파수와 순간 진폭의 추정에 관한 연구)

Lee, He-Young
- Speech Sciences
- /
- v.12 no.4
- /
- pp.53-70
- /
- 2005
A Vowel of speech signals are multicomponent signals composed of AM-FM components whose instantaneous frequency and instantaneous amplitude are time-varying. The changes of emotion states cause the variation of the instantaneous frequencies and the instantaneous amplitudes of AM-FM components. Therefore, it is important to estimate exactly the instantaneous frequencies and the instantaneous amplitudes of AM-FM components for the extraction of key information representing emotion states and changes in speech signals. In tills paper, firstly a method decomposing speech signals into AM - FM components is addressed. Secondly, the fundamental frequency of vowel sound is estimated by the simple method based on the spectrogram. The estimate of the fundamental frequency is used for decomposing speech signals into AM-FM components. Thirdly, an estimation method is suggested for separation of the instantaneous frequencies and the instantaneous amplitudes of the decomposed AM - FM components, based on Hilbert transform and the demodulation property of the extended Fourier transform. The estimates of the instantaneous frequencies and the instantaneous amplitudes can be used for modification of the spectral distribution and smooth connection of two words in the speech synthesis systems based on a corpus.
PDF

Search Result 203, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)