• 제목/요약/키워드: Korean speech

검색결과 5,286건 처리시간 0.261초

자동 음성 분할을 위한 음향 모델링 및 에너지 기반 후처리 (Acoustic Modeling and Energy-Based Postprocessing for Automatic Speech Segmentation)

  • 박혜영;김형순
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.137-150
    • /
    • 2002
  • Speech segmentation at phoneme level is important for corpus-based text-to-speech synthesis. In this paper, we examine acoustic modeling methods to improve the performance of automatic speech segmentation system based on Hidden Markov Model (HMM). We compare monophone and triphone models, and evaluate several model training approaches. In addition, we employ an energy-based postprocessing scheme to make correction of frequent boundary location errors between silence and speech sounds. Experimental results show that our system provides 71.3% and 84.2% correct boundary locations given tolerance of 10 ms and 20 ms, respectively.

  • PDF

혼합여기모델을 이용한 대역 확장된 음성신호의 음질 개선 (Quality Improvement of Bandwidth Extended Speech Using Mixed Excitation Model)

  • 최무열;김형순
    • 대한음성학회지:말소리
    • /
    • 제52호
    • /
    • pp.133-144
    • /
    • 2004
  • The quality of narrowband speech can be enhanced by the bandwidth extension technology. This paper proposes a mixed excitation and an energy compensation method based on Gaussian Mixture Model (GMM). First, we employ the mixed excitation model having both periodic and aperiodic characteristics in frequency domain. We use a filter bank to extract the periodicity features from the filtered signals and model them based on GMM to estimate the mixed excitation. Second, we separate the acoustic space into the voiced and unvoiced parts of speech to compensate for the energy difference between narrowband speech and reconstructed highband, or lowband speech, more accurately. Objective and subjective evaluations show that the quality of wideband speech reconstructed by the proposed method is superior to that by the conventional bandwidth extension method.

  • PDF

고품질 음성합성을 위한 합성 DB 구축 (Speech Database Design and Structuring for High Quality TTS)

  • 강동규;이승훈;류원호
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.33-36
    • /
    • 2002
  • As the telematics service that is the integration of information technology approaches commercialization, the necessity and gravity of speech technology is rapidly growing. The speech technology occupies important position in the telematics service because it informs the starting of service and the retrieved result. This service must provide high accuracy of speech recognition and natural synthesis of human speech in a driving environment and it is especially true for the fee-for-service. For high quality TTS, the speech synthesis technique that makes optimal synthesis database and uses efficiently this database is required. In this paper, we describe the design of phonetically balanced sentences used for speech database, the selection of service-suitable-speaker, the extraction methods of accurate phoneme boundary, and the factors which are taken into consideration in the extraction stage of prosody. Finally we show the real case that has commercially implemented.

  • PDF

4800bps CELP 음성 부호화기에 적용한 대역폭 확장에 관한 연구 (A Study on the Bandwidth Extension Adopted for 4800 bps CELP Speech Coder)

  • 박진수;김형순
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.175-178
    • /
    • 2002
  • Most existing telephone networks transmit narrowband speech witch has been bandlimited below 4 kHz. Compared with wideband speech up to 8 kHz, narrowband speech shows reduced intelligibility and a muffled quality. Bandwidth extension is a technique to generate wideband speech by reconstructing 4-8 kHz highband speech without any additional information. This paper presents experimental results of the bandwidth extension adopted for 4800 bps CELP speech coder. In this experiment, we examine various methods for reconstruction of wideband spectrum and excitation signal, compare and analyze their performance by performing the subjective preference test and measuring the cepstral distortion.

  • PDF

Harmonics-based Spectral Subtraction and Feature Vector Normalization for Robust Speech Recognition

  • Beh, Joung-Hoon;Lee, Heung-Kyu;Kwon, Oh-Il;Ko, Han-Seok
    • 음성과학
    • /
    • 제11권1호
    • /
    • pp.7-20
    • /
    • 2004
  • In this paper, we propose a two-step noise compensation algorithm in feature extraction for achieving robust speech recognition. The proposed method frees us from requiring a priori information on noisy environments and is simple to implement. First, in frequency domain, the Harmonics-based Spectral Subtraction (HSS) is applied so that it reduces the additive background noise and makes the shape of harmonics in speech spectrum more pronounced. We then apply a judiciously weighted variance Feature Vector Normalization (FVN) to compensate for both the channel distortion and additive noise. The weighted variance FVN compensates for the variance mismatch in both the speech and the non-speech regions respectively. Representative performance evaluation using Aurora 2 database shows that the proposed method yields 27.18% relative improvement in accuracy under a multi-noise training task and 57.94% relative improvement under a clean training task.

  • PDF

Low Frequency Perception of Rhythm and Intonation Speech Patterns by Normal Hearing Adults

  • Kim, Young-Sun;Asp, Carl-W.
    • 음성과학
    • /
    • 제9권1호
    • /
    • pp.7-16
    • /
    • 2002
  • This study tested normal hearing adults' auditory perception of rhythm and intonation patterns, with low-frequency speech energy. The results showed that the narrow-band low-frequency zones of 125, 250, or 500 Hz provided the same important rhythm and intonation cues as did the wide-band condition. This suggested that an auditory training strategy that uses low-frequency filters would be effective for structuring or re-structuring the perception of rhythm and intonation patterns. These filters force the client to focus on these patterns, because the speech intelligibility is drastically reduced. This strategy can be used with both normal-hearing and hearing impaired children and adults with poor listening skills, and possibly poor speech intelligibility.

  • PDF

단순작업으로 인한 정신피로도 측정을 위한 음성기술을 이용한 CART 기반 진단모델 (A CART-based diagnostic model using speech technology for evaluating mental fatigue caused by monotonous work)

  • 권철홍
    • 말소리와 음성과학
    • /
    • 제8권4호
    • /
    • pp.97-101
    • /
    • 2016
  • This paper presents a CART(Classification and Regression Tree)-based model to diagnose mental fatigue using speech technology. The parameters used in the model are the significant speech parameters highly correlated to the fatigue and questionnaire responses obtained before and after imposing the fatigue. It is shown from the experiments that the proposed model achieves classification accuracies of 96.67% and 98.33% using the speech parameters and questionnaire responses, respectively. This implies that the proposed model can be used as a tool to diagnose the mental fatigue, and that speech technology is useful to diagnose the fatigue.

음성 인식을 이용한 지능망 기반 일기예보 서비스 개발 (Development of a Weather Forecast Service Based on AIN Using Speech Recognition)

  • 박성준;김재인;구명완;전주식
    • 대한음성학회지:말소리
    • /
    • 제51호
    • /
    • pp.137-149
    • /
    • 2004
  • A weather forecast service with speech recognition is described. This service allows users to get the weather information of all the cities by saying the city names with just one phone call, which was not provided in the previous weather forecast service. Speech recognition is implemented in the intelligent peripheral (IP) of the advanced intelligent network (AIN). The AIN is a telephone network architecture that separates service logic from switching equipment, allowing new services to be added without having to redesign switches to support new services. Experiments in speech recognition show that the recognition accuracy is 90.06% for the general users' speech database. For the laboratory members' speech database, the accuracies are 95.04% and 93.81%, respectively in simulation and in the test on the developed system.

  • PDF

SiTEC의 공동 이용을 위한 음성 코퍼스 구축 현황 및 계획 (Current States and Future Plans at SiTEC for Speech Corpora for Common Use)

  • 김봉완;최대림;김영일;이광현;이용주
    • 대한음성학회지:말소리
    • /
    • 제46호
    • /
    • pp.175-185
    • /
    • 2003
  • To support speech information technology industry it is vital to create and distribute standardized speech corpora to be used for the development of products and technologies. In this article we introduce speech corpora created by Speech Information Technology & Industry Promotion Center(SiTEC) during its 1st and 2nd fiscal years (2001/5/1-2003/4/30) and plans for those corpora which is being created currently or will be created in near future. We introduce the corpus for car application to expand speech information technology to the field of traditional industry, the corpora for foreign languages to support exportation, the corpus for basic research for the sake of application in the industry, the corpora for common use, and others.

  • PDF

VoIP 코더들의 프레임손실은닉 알고리즘 성능평가 (Performance Evaluation of Frame Erasure Concealment Algorithms in VoIP Coders)

  • 한승호;문광;한민수
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2004년도 춘계 학술대회 발표논문집
    • /
    • pp.235-238
    • /
    • 2004
  • Frame erasures cause speech quality degradation in wireless communication networks or packet networks. The degradation becomes worse when consecutive frame erasures occur. Speech coders have a frame erasure concealment(FEC) mechanism to compensate for frame erasures. It is meaningful to evaluate the performance of FEC mechanisms for frame erasures that occur in communications networks. In this paper, various frame erasures are designed. And the FEC algorithms of speech coders are evaluated and analyzed with the Perceptual Evaluation of Speech Quality(PESQ). It is found that the performances vary in accordance with frame erasure types, frame erasure rates, and utterance lengths.

  • PDF