• 제목/요약/키워드: text-to-speech system

검색결과 246건 처리시간 0.027초

A New Vocoder based on AMR 7.4Kbit/s Mode for Speaker Dependent System (화자 의존 환경의 AMR 7.4Kbit/s모드에 기반한 보코더)

  • Min, Byung-Jae;Park, Dong-Chul
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • 제33권9C호
    • /
    • pp.691-696
    • /
    • 2008
  • A new vocoder of Code Excited Linear Predictive (CELP) based on Adaptive Multi Rate (AMR) 7.4kbit/s mode is proposed in this paper. The proposed vocoder achieves a better compression rate in an environment of Speaker Dependent Coding System (SDSC) and is efficiently used for systems, such as OGM(Outgoing message) and TTS(Text To Speech), which needs only one person's speech. In order to enhance the compression rate of a coder, a new Line Spectral Pairs(LSP) code-book is employed by using Centroid Neural Network (CNN) algorithm. In comparison with original(traditional) AMR 7.4 Kbit/s coder, the new coder shows 27% higher compression rate while preserving synthesized speech quality in terms of Mean Opinion Score(MOS).

design and Implementation of English part of speech tagging system by transformation rule base. (변형 규칙 기반 영어 품사 태깅 시스템의 설계 및 구현)

  • 이태식;이상윤최병욱김한우
    • Proceedings of the IEEK Conference
    • /
    • 대한전자공학회 1998년도 추계종합학술대회 논문집
    • /
    • pp.527-530
    • /
    • 1998
  • In this paper, a transformation-based English part of speech tagging system is designed and implemented. The tagging system tags raw corpus at first and the transformation rule correct the errors. Apart from traditional rule based tagging system, this system makes rules automatically. Using 60,000 words of corpus as a training corpus, the transformation rules are generated automatically by iterative training. The idea how to calculate positive effect of transformation and select transformation rules is proposed to generate more effective and correct transformations. In this paper, part of the Brown corpus and English text is used for experimental data. And the performance of transformation based tagging system is demonstrated by the calculation of accuracy.

  • PDF

Speaker Identification in Small Training Data Environment using MLLR Adaptation Method (MLLR 화자적응 기법을 이용한 적은 학습자료 환경의 화자식별)

  • Kim, Se-hyun;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 대한음성학회 2005년도 추계 학술대회 발표논문집
    • /
    • pp.159-162
    • /
    • 2005
  • Identification is the process automatically identify who is speaking on the basis of information obtained from speech waves. In training phase, each speaker models are trained using each speaker's speech data. GMMs (Gaussian Mixture Models), which have been successfully applied to speaker modeling in text-independent speaker identification, are not efficient in insufficient training data environment. This paper proposes speaker modeling method using MLLR (Maximum Likelihood Linear Regression) method which is used for speaker adaptation in speech recognition. We make SD-like model using MLLR adaptation method instead of speaker dependent model (SD). Proposed system outperforms the GMMs in small training data environment.

  • PDF

A computational algorithm for F0 contour generation in Korean developed with prosodically labeled databases using K-ToBI system (K-ToBI 기호에 준한 F0 곡선 생성 알고리듬)

  • Lee YongJu;Lee Sook-hyang;Kim Jong-Jin;Go Hyeon-Ju;Kim Yeong-Il;Kim Sang-Hun;Lee Jeong-Cheol
    • MALSORI
    • /
    • 제35_36호
    • /
    • pp.131-143
    • /
    • 1998
  • This study describes an algorithm for the F0 contour generation system for Korean sentences and its evaluation results. 400 K-ToBI labeled utterances were used which were read by one male and one female announcers. F0 contour generation system uses two classification trees for prediction of K-ToBI labels for input text and 11 regression trees for prediction of F0 values for the labels. Evaluation results of the system showed 77.2% prediction accuracy for prediction of IP boundaries and 72.0% prediction accuracy for AP boundaries. Information of voicing and duration of the segments was not changed for F0 contour generation and its evaluation. Evaluation results showed 23.5Hz RMS error and 0.55 correlation coefficient in F0 generation experiment using labelling information from the original speech data.

  • PDF

On-Line Linear Combination of Classifiers Based on Incremental Information in Speaker Verification

  • Huenupan, Fernando;Yoma, Nestor Becerra;Garreton, Claudio;Molina, Carlos
    • ETRI Journal
    • /
    • 제32권3호
    • /
    • pp.395-405
    • /
    • 2010
  • A novel multiclassifier system (MCS) strategy is proposed and applied to a text-dependent speaker verification task. The presented scheme optimizes the linear combination of classifiers on an on-line basis. In contrast to ordinary MCS approaches, neither a priori distributions nor pre-tuned parameters are required. The idea is to improve the most accurate classifier by making use of the incremental information provided by the second classifier. The on-line multiclassifier optimization approach is applicable to any pattern recognition problem. The proposed method needs neither a priori distributions nor pre-estimated weights, and does not make use of any consideration about training/testing matching conditions. Results with Yoho database show that the presented approach can lead to reductions in equal error rate as high as 28%, when compared with the most accurate classifier, and 11% against a standard method for the optimization of linear combination of classifiers.

A Preprocessing for the Unlimited Korean Text to Speech System (무제한 음성합성 시스팀을 위한 전처리과정)

  • 강용범
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 한국음향학회 1994년도 제11회 음성통신 및 신호처리 워크샵 논문집 (SCAS 11권 1호)
    • /
    • pp.334-337
    • /
    • 1994
  • 본 논문에서는 형식형태소 사전 및 1300여 단어의 발음예외 사전을 이용하여 무제한 dam성합성을 위한 전처리과정을 구현하였으며, 문자열 정형화부, 형식셩태소 중심의 문장분석 및 자소단위의 음운변동과정에 관하여 논한다.

  • PDF

The Effect of the Telephone Channel to the Performance of the Speaker Verification System (전화선 채널이 화자확인 시스템의 성능에 미치는 영향)

  • 조태현;김유진;이재영;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • 제18권5호
    • /
    • pp.12-20
    • /
    • 1999
  • In this paper, we compared speaker verification performance of the speech data collected in clean environment and in channel environment. For the improvement of the performance of speaker verification gathered in channel, we have studied on the efficient feature parameters in channel environment and on the preprocessing. Speech DB for experiment is consisted of Korean doublet of numbers, considering the text-prompted system. Speech features including LPCC(Linear Predictive Cepstral Coefficient), MFCC(Mel Frequency Cepstral Coefficient), PLP(Perceptually Linear Prediction), LSP(Line Spectrum Pair) are analyzed. Also, the preprocessing of filtering to remove channel noise is studied. To remove or compensate for the channel effect from the extracted features, cepstral weighting, CMS(Cepstral Mean Subtraction), RASTA(RelAtive SpecTrAl) are applied. Also by presenting the speech recognition performance on each features and the processing, we compared speech recognition performance and speaker verification performance. For the evaluation of the applied speech features and processing methods, HTK(HMM Tool Kit) 2.0 is used. Giving different threshold according to male or female speaker, we compare EER(Equal Error Rate) on the clean speech data and channel data. Our simulation results show that, removing low band and high band channel noise by applying band pass filter(150~3800Hz) in preprocessing procedure, and extracting MFCC from the filtered speech, the best speaker verification performance was achieved from the view point of EER measurement.

  • PDF

The Study of Breath Competence Depending on Utterance Condition by Healthy Speakers: a Preliminary Study (발화조건에 따른 정상 성인의 호흡 능력 차이 비교: 예비연구)

  • Lee, In-Ae;Lee, Hye-Eun;Hwang, Young-Jin
    • Phonetics and Speech Sciences
    • /
    • 제4권2호
    • /
    • pp.115-120
    • /
    • 2012
  • This study sought to compare breath competence in three different utterance conditions when reading a passage aloud, making a spontaneous speech, and singing. We tested 15 normal females (ages averaging $24{\pm}4.4$) and measured breath competence through an objective, aero-mechanical instrument called PAS (Phonatory aerodynamic system, model 6600, KAY Electronics, Inc). Breathing sets of inspiration and expiration were measured by breath group number, breath group duration, and the ratio of inspiration to expiration. The results from this study led us to the following conclusion: The breath group number and the breath group duration showed no significant difference. However, the only variance that we could find was in the ratio of inspiration and expiration. In significantly different speech patterns, singing resulted in the most varied ratio of inspiration and expiration, followed by reading a text aloud, and spontaneous speech. The average frequency rates and maximum intensity levels varied with regards to varying utterance conditions. This thus shows that breath competence and phonation competence have a closely interrelated relationship.

A study on the Stochastic Model for Sentence Speech Understanding (문장음성 이해를 위한 확률모델에 관한 연구)

  • Roh, Yong-Wan;Hong, Kwang-Seok
    • The KIPS Transactions:PartB
    • /
    • 제10B권7호
    • /
    • pp.829-836
    • /
    • 2003
  • In this paper, we propose a stochastic model for sentence speech understanding using dictionary and thesaurus. The proposed model extracts words from an input speech or text into a sentence. A computer is sellected category of dictionary database compared the word extracting from the input sentence calculating a probability value to the compare results from stochastic model. At this time, computer read out upper dictionary information from the upper dictionary searching and extracting word compared input sentence caluclating value to the compare results from stochastic model. We compare adding the first and second probability value from the dictionary searching and the upper dictionary searching with threshold probability that we measure the sentence understanding rate. We evaluated the performance of the sentence speech understanding system by applying twenty questions game. As the experiment results, we got sentence speech understanding accuracy of 79.8%. In this case, probability ($\alpha$) of high level word is 0.9 and threshold probability ($\beta$) is 0.38.

Performance Improvement of a Text-Independent Speaker Identification System Using MCE Training (MCE 학습 알고리즘을 이용한 문장독립형 화자식별의 성능 개선)

  • Kim Tae-Jin;Choi Jae-Gil;Kwon Chul-Hong
    • MALSORI
    • /
    • 제57호
    • /
    • pp.165-174
    • /
    • 2006
  • In this paper we use a training algorithm, MCE (Minimum Classification Error), to improve the performance of a text-independent speaker identification system. The MCE training scheme takes account of possible competing speaker hypotheses and tries to reduce the probability of incorrect hypotheses. Experiments performed on a small set speaker identification task show that the discriminant training method using MCE can reduce identification errors by up to 54% over a baseline system trained using Bayesian adaptation to derive GMM (Gaussian Mixture Models) speaker models from a UBM (Universal Background Model).

  • PDF