Search | Korea Science

Pause Predictor for Korean Text-to-Speech conversion (한국어 음성합성기용 끊어읽기 추정기)

이정철;김상훈;성굉모
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.5
- /
- pp.51-56
- /
- 1998
문장내 휴지구간의 위치와 길이는 합성음의 자연성을 결정짓는 주요 운율 파라미터 중 하나이다. 본 연구에서는 한국어 음성합성기의 합성음 생성에서 자연성 개선을 위해서 문장내 끊어읽기 위치 및 길이를 추정하기 위한 방법을 제안한다. 먼저 실제 발화에서 끊어 읽기가 발생하는 요인을 검토하였다. 그리고 이들 요인에 부합하여 텍스트에 4단계의 끊어 읽기를 표기함으로써 다량의 데이터를 확보하고 이를 이용한 NN 학습 결과와 HMM 추정 기의 성능을 비교 검토한다. 현재까지의 결과로는 NN 학습의 경우 끊어읽기 없는 경우와 긴 끊어읽기의 추정에서는 우수한 예측능력을 보이지만 짧은 끊어읽기, 중간 끊어읽기의 경 우는 HMM의 성능이 우수한 것으로 판명되었다. 전반적인 성능에서는 HMM이 우수하며 끊어읽기 종류에 따라 추정오차가 10∼25%로서 안정적인 결과를 얻었으며 TTS에의 활용 가능성을 보였다.
PDF

Automatic Music Transcription Considering Time-Varying Tempo (가변 템포를 고려한 자동 음악 채보)

Ju, Youngho;Babukaji, Baniya;Lee, Joonwhan
- The Journal of the Korea Contents Association
- /
- v.12 no.11
- /
- pp.9-19
- /
- 2012
Time-varying tempo of a song is one of the error sources for the identification of a note duration in automatic music recognition. This paper proposes an improved music transcription scheme equipped with the identification of note duration considering the time-varying tempo. In the proposed scheme the measures are found at first and the tempo, the playing time of each measure, is then estimated. The tempo is then used for resizing each IOI(Inter Onset Interval) length and considered to identify the accurate note duration, which increases the degree of correspondence to the music piece. In the experiment the proposed scheme found the accurate measure position for 14 monophonic children songs out of 16 ones recorded by men and women. Also, it achieved about 89.4% and 84.8% of the degree of matching to the original music piece for identification of note duration and pitch, respectively.
https://doi.org/10.5392/JKCA.2012.12.11.009 인용 PDF KSCI

Prosodic Break Index Estimation using LDA and Tri-tone Model (LDA와 tri-tone 모델을 이용한 운율경계강도 예측)

강평수;엄기완;김진영
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.7
- /
- pp.17-22
- /
- 1999
In this paper we propose a new mixed method of LDA and tri-tone model to predict Korean prosodic break indices(PBI) for a given utterance. PBI can be used as an important cue of syntactic discontinuity in continuous speech recognition(CSR). The model consists of three steps. At the first step, PBI was predicted with the information of syllable and pause duration through the linear discriminant analysis (LDA) method. At the second step, syllable tone information was used to estimate PBI. In this step we used vector quantization (VQ) for coding the syllable tones and PBI is estimated by tri-tone model. In the last step, two PBI predictors were integrated by a weight factor. The proposed method was tested on 200 literal style spoken sentences. The experimental results showed 72% accuracy.
PDF

Prosody Boundary Index Prediction Model for Continuous Speech Recognition and Speech Synthesis (연속음성 인식 및 합성을 위한 운율 경계강도 예측 모델)

강평수
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06c
- /
- pp.99-102
- /
- 1998
본 연구에서는 연속음 인식과 합성을 위한 경계강도 예측 모델을 제안한다. 운율 경계 강도는 음성 합성에서는 운율구 사이의 휴지기의 길이 조절로 합성음의 자연도에 기여를 하고 연속음 인식에서는 인식과정에서 나타나는 후보문장의 선별 과정에 특징변수가 되어 인식률 향상에 큰 역할을 한다. 음성학적으로 발화된 문장은 큰 경계 단위로 볼 때 운율구 형태로 이루어졌다고 볼 수 있으며 구의 경계는 문장의 문법적인 특징과 관련을 지을 수 있게 된다. 본 논문에서는 운율 경계 강도 수준을 4로 하고 문법적인 특징으로는 트리구조 방법으로 결정된 오른쪽 가지의 수식의 깊이(rd)와 link grammar방법으로 결정된 음절수(syl), 연결거리(torig)를 bigram 모형과 결합하여 운율적 경계 강도를 예측한다. 예측 모형으로는 다중 회귀 모형과 Marcov 모형을 제안한다. 이들 모형으로 낭독체 200 문장에 대해 실험한 결과 76%로 경계 강도를 예측할 수 있었다.
PDF

Age classification of emergency callers based on behavioral speech utterance characteristics (발화행태 특징을 활용한 응급상황 신고자 연령분류)

Son, Guiyoung;Kwon, Soonil;Baik, Sungwook
- The Journal of Korean Institute of Next Generation Computing
- /
- v.13 no.6
- /
- pp.96-105
- /
- 2017
In this paper, we investigated the age classification from the speaker by analyzing the voice calls of the emergency center. We classified the adult and elderly from the call center calls using behavioral speech utterances and SVM(Support Vector Machine) which is a machine learning classifier. We selected two behavioral speech utterances through analysis of the call data from the emergency center: Silent Pause and Turn-taking latency. First, the criteria for age classification selected through analysis based on the behavioral speech utterances of the emergency call center and then it was significant(p <0.05) through statistical analysis. We analyzed 200 datasets (adult: 100, elderly: 100) by the 5 fold cross-validation using the SVM(Support Vector Machine) classifier. As a result, we achieved 70% accuracy using two behavioral speech utterances. It is higher accuracy than one behavioral speech utterance. These results can be suggested age classification as a new method which is used behavioral speech utterances and will be classified by combining acoustic information(MFCC) with new behavioral speech utterances of the real voice data in the further work. Furthermore, it will contribute to the development of the emergency situation judgment system related to the age classification.

Automatic Music Transcription System Using SIDE (SIDE를 이용한 자동 음악 채보 시스템)

Hyoung, A-Young;Lee, Joon-Whoan
- The KIPS Transactions:PartB
- /
- v.16B no.2
- /
- pp.141-150
- /
- 2009
This paper proposes a system that can automatically write singing voices to music notes. First, the system uses Stabilized Diffusion Equation(SIDE) to divide the song to a series of syllabic parts based on pitch detection. By the song segmentation, our method can recognize the sound length of each fragment through clustering based on genetic algorithm. Moreover, this study introduces a concept called 'Relative Interval' so as to recognize interval based on pitch of singer. And it also adopted measure extraction algorithm using pause data to implement the higher precision of song transcription. By the experiments using 16 nursery songs, it is shown that the measure recognition rate is 91.5% and DMOS score reaches 3.82. These findings demonstrate effectiveness of system performance.
https://doi.org/10.3745/KIPSTB.2009.16-B.2.141 인용 PDF KSCI

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

Kim, Sang-hun; Park, Jun;Lee, Young-jik
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.24-33
- /
- 2001
this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.
PDF

Hybrid Power-Saving Mode Considering VoIP Traffic in IEEE 802.16e Systems (IEEE 802.16e 시스템에서 VoIP 트래픽을 고려한 혼합 전원 절약 모드)

Lee, Jung-Ryun
- Journal of Korea Multimedia Society
- /
- v.10 no.4
- /
- pp.450-461
- /
- 2007
In this paper, we propose the method to use power-saving mode (PSM) applicable to non real-time traffic and PSM applicable to real-time traffic simultaneously, for VoIP traffic with silence suppression. The proposed method uses PSC II during talk-spurt interval of parties A and/or B and uses PSC I or probabilistic sleep interval decision (PSID) method during mutual silence interval, respectively. To evaluate the performance of hybrid PSM (HPSM) based on PSC II or PSID method, we present average buffering delay, energy consumption of mobile station and VoIP packet drop probability with simulation runs. Results shows that proposed HPSM decreases energy consumption of mobile station up to 25 % while satisfying the packet drop probability within QoS requirement in case of end-to-end VoIP connection.
PDF

Reproductive Cycle of the Spring-Spawning Bitterling, Rhodeus uyekii(Pisces : Cyprinidae) (각시붕어, Rhodeus uyekii의 생식주기)

An, Cheul-Min
- Korean Journal of Ichthyology
- /
- v.7 no.1
- /
- pp.33-42
- /
- 1995
The reproductive cycle of the bitterling, Rhodeus uyekii was studied to observe the annual variations of gonadosomatic index(GSI), size frequency distribution of egg, ovipositor length and histological changes of gonad. GSI began to increase from February when the water temperature started to increase, and reached the maximum value in May, whereas it began to decrease from July and reached the minimum value in August which in the highest water temperature season. It began to incerase again but showed low value from September to November. The GSI remained stable thereafter. Monthly changes in GSI, ovipositor length, frequency of egg diameter and gonadal histology showed that the annual reproductive cycle was classified into the following successive phases : primary growing phase from September to November, quiescent phase in December, secondary growing and mature phase from January to February, ripe and spawning phase from March to June, and recovery and resting phase from July to August.
PDF

A Scheme for Reuse of Residual Energy in a Multi-cell Battery System (다중전지 시스템에서 잔류 에너지의 재활용 방법)

Yun, Woong-Jin;Baek, Je-In
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.46 no.6
- /
- pp.21-27
- /
- 2009
As portable electronic systems being used more often, it becomes a more important issue to lengthen the lifetime of the power battery of the system, for instance, by developing batteries of a higher efficiency. A simple as well as practical method to lengthen the lifetime is to use multiple batteries that are connected in parallel. But in this paper we present a new idea in using multiple batteries, with which the residual energy of the battery can be used in the sense of recycling. The idea is based on a usual phenomenon that a battery cell that has been used until its voltage has dropped below a reference level may still have some residual energy, due to which the voltage can recover when the cell takes a rest for a while. As a practical realization scheme of this idea, a multi-cell configuration method with a cell selection switch is introduced, and its feasibility has been examined by performing experimental observations on the behavior of battery discharge. It has been found that the lifetime of an Alkaline primary battery cell can be lengthened approximately by one or two hours with the proposed method.
PDF KSCI

Search Result 25, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)