• Title/Summary/Keyword: Human speech

Search Result 569, Processing Time 0.029 seconds

A New Pruning Method for Synthesis Database Reduction Using Weighted Vector Quantization

  • Kim, Sanghun;Lee, Youngjik;Keikichi Hirose
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4E
    • /
    • pp.31-38
    • /
    • 2001
  • A large-scale synthesis database for a unit selection based synthesis method usually retains redundant synthesis unit instances, which are useless to the synthetic speech quality. In this paper, to eliminate those instances from the synthesis database, we proposed a new pruning method called weighted vector quantization (WVQ). The WVQ reflects relative importance of each synthesis unit instance when clustering the similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through the objective and subjective evaluations of the synthetic speech quality: one to simply limit maximum number of instance, and the other based on normal VQ-based clustering. The proposed method showed the best performance under 50% reduction rates. Over 50% of reduction rates, the synthetic speech quality is not seriously but perceptibly degraded. Using the proposed method, the synthesis database can be efficiently reduced without serious degradation of the synthetic speech quality.

  • PDF

Some effects of audio-visual speech in perceiving Korean

  • Kim, Jee-Sun;Davis, Chris
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.335-342
    • /
    • 1999
  • The experiments reported here investigated whether seeing a speaker's face (visible speech) affects the perception and memory of Korean speech sounds. In order to exclude the possibility of top-down, knowledge-based influences on perception and memory, the experiments tested people with no knowledge of Korean. The first experiment examined whether visible speech (Auditory and Visual - AV) assists English native speakers (with no knowledge of Korean) in the detection of a syllable within a Korean speech phrase. It was found that a syllable was more likely to be detected within a phrase when the participants could see the speaker's face. The second experiment investigated whether English native speakers' judgments about the duration of a Korean phrase would be affected by visible speech. It was found that in the AV condition participant's estimates of phrase duration were highly correlated with the actual durations whereas those in the AO condition were not. The results are discussed with respect to the benefits of communication with multimodal information and future applications.

  • PDF

Adaptive Noise Suppression system based on Human Auditory Model (인간의 청각모델에 기초한 잡음환경에 적응된 잡음억압 시스템)

  • Choi, Jae-Seung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.05a
    • /
    • pp.421-424
    • /
    • 2008
  • This paper proposes an adaptive noise suppression system based on human auditory model to enhance speech signal that is degraded by various background noises. The proposed system detects voiced and unvoiced sections for each frame and implements the adaptive auditory process, then reduces the noise speech signal using neural network including amplitude component and phase component. Base on measuring signal-to-noise ratios, experiments confirm that the proposed system is effective for speech signal that is degraded by various noises.

  • PDF

The Status and Research Themes of Speech based Multimodal Interface Technology (음성기반 멀티모달 인터페이스 기술 현황 및 과제)

  • Lee ChiGeun;Lee EunSuk;Lee HaeJung;Kim BongWan;Joung SukTae;Jung SungTae;Lee YongJoo;Han MoonSung
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.111-114
    • /
    • 2002
  • Complementary use of several modalities in human-to-human communication ensures high accuracy, and only few communication problem occur. Therefore, multimodal interface is considered as the next generation interface between human and computer. This paper presents the current status and research themes of speech-based multimodal interface technology, It first introduces about the concept of multimodal interface. It surveys the recognition technologies of input modalities and synthesis technologies of output modalities. After that it surveys integration technology of modality. Finally, it presents research themes of speech-based multimodal interface technology.

  • PDF

A study of speech. enhancement through wavelet analysis using auditory mechanism (인간의 청각 메커니즘을 적용한 웨이블렛 분석을 통한 음성 향상에 대한 연구)

  • 이준석;길세기;홍준표;홍승홍
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.397-400
    • /
    • 2002
  • This paper has been studied speech enhancement method in noisy environment. By mean of that we prefer human auditory mechanism which is perfect system and applied wavelet transform. Multi-resolution of wavelet transform make possible multiband spectrum analysis like human ears. This method was verified very effective way in noisy speech enhancement.

  • PDF

The Relationship between 3- and 5-year-old children' private speech and their mothers' scaffolding (3세와 5세 유아의 혼잣말과 어머니의 비계설정과의 관계)

  • Park, Young-Soon;Yoo, An-Jin
    • Korean Journal of Human Ecology
    • /
    • v.14 no.1
    • /
    • pp.59-68
    • /
    • 2005
  • The purposes of this study were to investigate the relationship between children's private speech during the individual session and maternal scaffolding during mother-child session. Subjects were twenty 3-year-old children and twenty 5-year-old children and their mothers recruited from day-care centers in Seoul. Mother-child interaction was videotaped for 15 minutes and maternal utterances were transcribed for analysis maternal scaffolding. Individual session of child after 3-5days was videotaped for 15 minutes and children's utterance was transcribed. Subcategories of maternal scaffolding were significantly related with children's private speech during individual session. There did appear to be an age difference in this relationship. In verbal strategy for scaffolding that 3-year-old's mother used, other-regulation and control, praise strategy was significantly related with children's private speech. In verbal strategy for scaffolding that 5-year-old's mother used, other-regulation and control, teaching strategy was significantly related with children's private speech. In maternal physical control strategy, withdrawal of mother physical control the maze task over time was significantly related with children's private speech. Withdrawal of mother physical control 5-year-old's physical performance was significantly related with children's private speech.

  • PDF

AM-FM Decomposition and Estimation of Instantaneous Frequency and Instantaneous Amplitude of Speech Signals for Natural Human-robot Interaction (자연스런 인간-로봇 상호작용을 위한 음성 신호의 AM-FM 성분 분해 및 순간 주파수와 순간 진폭의 추정에 관한 연구)

  • Lee, He-Young
    • Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.53-70
    • /
    • 2005
  • A Vowel of speech signals are multicomponent signals composed of AM-FM components whose instantaneous frequency and instantaneous amplitude are time-varying. The changes of emotion states cause the variation of the instantaneous frequencies and the instantaneous amplitudes of AM-FM components. Therefore, it is important to estimate exactly the instantaneous frequencies and the instantaneous amplitudes of AM-FM components for the extraction of key information representing emotion states and changes in speech signals. In tills paper, firstly a method decomposing speech signals into AM - FM components is addressed. Secondly, the fundamental frequency of vowel sound is estimated by the simple method based on the spectrogram. The estimate of the fundamental frequency is used for decomposing speech signals into AM-FM components. Thirdly, an estimation method is suggested for separation of the instantaneous frequencies and the instantaneous amplitudes of the decomposed AM - FM components, based on Hilbert transform and the demodulation property of the extended Fourier transform. The estimates of the instantaneous frequencies and the instantaneous amplitudes can be used for modification of the spectral distribution and smooth connection of two words in the speech synthesis systems based on a corpus.

  • PDF

On the Implementation of Articulatory Speech Simulator Using MRI (MRI를 이용한 조음모델시뮬레이터 구현에 관하여)

  • Jo, Cheol-Woo
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.45-55
    • /
    • 1997
  • This paper describes the procedure of implementing an articulatory speech simulator, in order to model the human articulatory organs and to synthesize speech from this model after. Images required to construct the vocal tract model were obtained from MRI, they were then used to construct 2D and 3D vocal tract shapes. In this paper 3D vocal tract shapes were constructed by spatially concatenating and interpolating sectional MRI images. 2D vocal tract shapes were constructed and analyzed automatically into a digital filter model. Following this speech sounds corresponding to the model were then synthesized from the filter. All procedures in this study were using MATLAB.

  • PDF

Voiced, Unvoiced, and Silence Classification of human speech signals by enphasis characteristics of spectrum (Spectrum 강조특성을 이용한 음성신호에서 Voicd - Unvoiced - Silence 분류)

  • 배명수;안수길
    • The Journal of the Acoustical Society of Korea
    • /
    • v.4 no.1
    • /
    • pp.9-15
    • /
    • 1985
  • In this paper, we describe a new algorithm for deciding whether a given segment of a speech signal is classified as voiced speech, unvoiced speech, or silence, based on parameters made on the signal. The measured parameters for the voiced-unvoiced classfication are the areas of each Zero crossing interval, which is given by multiplication of the magnitude by the inverse zero corssing rate of speech signals. The employed parameter for the unvoiced-silence classification, also, are each of positive area summation during four milisecond interval for the high frequency emphasized speech signals.

  • PDF