Context-adaptive Phoneme Segmentation for a TTS Database

;;

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 22 Issue 2
/
Pages.135-144
/
2003
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

Context-adaptive Phoneme Segmentation for a TTS Database

문자-음성 합성기의 데이터 베이스를 위한 문맥 적응 음소 분할

이기승 (건국대학교 정보통신대학 전자공학과) ;
김정수 (삼성전자㈜ 종합기술원 휴먼-컴퓨터 인터엑티브 연구실)

Published : 2003.02.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Market Model(HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.

본 논문에서는 문-음성 합성기에서 사용되는 대용량 데이터 베이스의 구성을 목적으로 하는 음성 신호의 자동 분할기법을 기술하였다. 주된 내용은 은닉 마코프 모델에 기반을 둔 음소 분할과 여기서 얻어진 결과를 초기 음소 경계로 사용하여 이를 자동으로 수정하는 방법으로 구성되어 있다. 다층 퍼셉트론이 음성 경계의 검출기로 사용되었으며, 음소 분할의 성능을 증가시키기 위해, 음소의 천이 패턴에 따라 다층 퍼셉트론을 개별적으로 학습시키는 방법이 제안되었다. 음소 천이 패턴은 수작업에 의해 생성된 레이블 정보를 기준 음소 경계로 사용하여, 기준 음소 경계와 추정된 음소 경계간의 전체 오차를 최소화하는 관점에서 분할되도록 하였다. 단일 화자를 대상으로 하는 실험에서 제안된 기법을 통해 생성된 음소 경계는 기준 경계와 비교하여 95%의 음소가 20 msec 이내의 경계 오차를 갖는 것으로 나타났으며, 평균 자승 제곱근 오차면에서 수정 작업을 통해 25% 향상된 결과를 나타내었다.

Keywords

References

IEEE Communications Magazine v.28 no.1 Speech synthesis from text Y.Sagisaka
Proc. ICASSP '96 v.1 Unit selection in a concatenative speech synthesis system using a large speech database A.J.Hunt;A.W.Black
Proc. EUROSPEECH '97 Diphone concatenation using a harmonic plus noise model of speech Y.Stylianou;T.Dutoit;J.Schroeter
Draft paper Concatenative speech synthesis using units selected from a large speech database A.J.Hunt;A.W.Black
Proc. Joint Meeting of ASA, EAA, and DAGA The AT＼&T Next-Gen TTS system M.Beutnagel;A.Conkie;J.Schroeter;Y.Stylianou;A.Syrdal
IEEE Trans. Signal Processing v.39 no.4 Automatic segmentation of speech Jan P. van Hermert
Journal of Acoust. Soc. Amer. v.71 A bootstaping training technique for obtaining demisyllable reference patterns L.R.Rabiner;A.E.Rosenberg;J.G.Wilpon;T.M.Zampini
Proc. IEEE Int. Conf. Spoken Language Processing Explicit segmentation of speech using gaussian models A.Bonafonte;A.Nogueiras;A.R.Garrido
Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing Neural network boundary refining for automatic speech segmentation D.T.Toledano
Proc. International Joint Conference on Neural Networks Automatic speech synthesis unit generation with MLP based postprocessor against auto-segemented phoneme errors E.Y.Park;S.H.Kim;J.H.Chung
Proc. IEEE-SP Workshop on Neural Networks for Signal Processing Nonlinear predictive vector quantization with recurrent neural nets L.Wu;M.Niranjan;F.Fallside
Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing Spectral stability based event localizing temporal decomposition A.C.R.Nandasena;M.Akagi
IEEE Trans. on Speech and Audio Singal Processing v.9 no.1 Reducing audible spectral discontinuities E.Klabbers;R.Veldhuis
IEEE ASSP Magazine An introduction to computing with neural nets R.Lippmann

The Journal of the Acoustical Society of Korea (한국음향학회지)

Context-adaptive Phoneme Segmentation for a TTS Database

문자-음성 합성기의 데이터 베이스를 위한 문맥 적응 음소 분할

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)