• 제목/요약/키워드: Speech function

검색결과 696건 처리시간 0.025초

Interference Suppression Using Principal Subspace Modification in Multichannel Wiener Filter and Its Application to Speech Recognition

  • Kim, Gi-Bak
    • ETRI Journal
    • /
    • 제32권6호
    • /
    • pp.921-931
    • /
    • 2010
  • It has been shown that the principal subspace-based multichannel Wiener filter (MWF) provides better performance than the conventional MWF for suppressing interference in the case of a single target source. It can efficiently estimate the target speech component in the principal subspace which estimates the acoustic transfer function up to a scaling factor. However, as the input signal-to-interference ratio (SIR) becomes lower, larger errors are incurred in the estimation of the acoustic transfer function by the principal subspace method, degrading the performance in interference suppression. In order to alleviate this problem, a principal subspace modification method was proposed in previous work. The principal subspace modification reduces the estimation error of the acoustic transfer function vector at low SIRs. In this work, a frequency-band dependent interpolation technique is further employed for the principal subspace modification. The speech recognition test is also conducted using the Sphinx-4 system and demonstrates the practical usefulness of the proposed method as a front processing for the speech recognizer in a distant-talking and interferer-present environment.

AMDF 함수를 이용한 음성 신호의 피치 추정 Algorithm들에 관한 연구 (A Study of the Pitch Estimation Algorithms of Speech Signal by Using Average Magnitude Difference Function (AMDF))

  • 소신애;이강희;유광복;임하영;박지수
    • 예술인문사회 융합 멀티미디어 논문지
    • /
    • 제7권4호
    • /
    • pp.235-242
    • /
    • 2017
  • 본 논문은 음성 신호의 Average Magnitude Difference Function (AMDF)에서 peaks (혹은 nulls)들을 찾는 알고리즘들을 제안하였다. AMDF 함수는 Autocorrelation Function (ACF)과 같이 음성 신호의 피치를 추정하는 함수로 널리 사용 하고 있다. 음성신호에서 fundamental frequency (F0)를 estimation하는 것은 매우 중요한 task이며 또한 상당한 어려움이 따른다는 것이 여러 연구들을 통해서 잘 알려진 사실이다. 본 논문에서는 AMDF 함수의 특성을 이용하여 개발한 두 가지의 알고리즘을 제시하였다. 첫째는 Local Minima에 Threshold 값을 적용하여 피치 주기를 측정 할 수 있는 nulls들을 찾아내는 알고리즘이고, 다음은 AMDF 함수와 ACF 함수 사이의 관계식을 응용한 알고리즘이다. 한국어의 감정 표현 언어들로 구성된 제시문을 널리 사용하고 있는 상용 기기로 녹음한 음성 신호를 본 논문이 제안한 알고리즘들에 적용하여서 시뮬레이션을 통해 음성 신호의 피치 주기를 측정하여서 그 성능을 알아보았다.

키프레임 얼굴영상을 이용한 시청각음성합성 시스템 구현 (Implementation of Text-to-Audio Visual Speech Synthesis Using Key Frames of Face Images)

  • 김명곤;김진영;백성준
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.73-88
    • /
    • 2002
  • In this paper, for natural facial synthesis, lip-synch algorithm based on key-frame method using RBF(radial bases function) is presented. For lips synthesizing, we make viseme range parameters from phoneme and its duration information that come out from the text-to-speech(TTS) system. And we extract viseme information from Av DB that coincides in each phoneme. We apply dominance function to reflect coarticulation phenomenon, and apply bilinear interpolation to reduce calculation time. At the next time lip-synch is performed by playing the synthesized images obtained by interpolation between each phonemes and the speech sound of TTS.

  • PDF

Particle Swarm 기반 최적화 멤버쉽 함수에 의한 잡음 환경에서의 화자인식 성능향상 (Performance Enhancement of Speaker Identification in Noisy Environments by Optimization Membership Function Based on Particle Swarm)

  • 민소희;송민규;나승유;김진영
    • 음성과학
    • /
    • 제14권2호
    • /
    • pp.105-114
    • /
    • 2007
  • The performance of speaker identifier is severely degraded in noisy environments. A study suggested the concept of observation membership for enhancing performances of speaker identifier with noisy speech [1]. The method scaled observation probabilities of input speech by observation identification values decided by SNR. In the paper [1], the authors suggested heuristic parameter values for membership function. In this paper we attempt to apply particle swarm optimization (PSO) for obtaining the optimal parameters for speaker identification in noisy environments. With the speaker identification experiments using the ETRI database we prove that the optimization approach can yield better performance than using only the original membership function.

  • PDF

An evaluation of Korean students' pronunciation of an English passage by a speech recognition application and two human raters

  • Yang, Byunggon
    • 말소리와 음성과학
    • /
    • 제12권4호
    • /
    • pp.19-25
    • /
    • 2020
  • This study examined thirty-one Korean students' pronunciation of an English passage using a speech recognition application, Speechnotes, and two Canadian raters' evaluations of their speech according to the International English Language Testing System (IELTS) band criteria to assess the possibility of using the application as a teaching aid for pronunciation education. The results showed that the grand average percentage of correctly recognized words was 77.7%. From the moderate recognition rate, the pronunciation level of the participants was construed as intermediate and higher. The recognition rate varied depending on the composition of the content words and the function words in each given sentence. Frequency counts of unrecognized words by group level and word type revealed the typical pronunciation problems of the participants, including fricatives and nasals. The IELTS bands chosen by the two native raters for the rainbow passage had a moderately high correlation with each other. A moderate correlation was reported between the number of correctly recognized content words and the raters' bands, while an almost a negligible correlation was found between the function words and the raters' bands. From these results, the author concludes that the speech recognition application could constitute a partial aid for diagnosing each individual's or the group's pronunciation problems, but further studies are still needed to match human raters.

회의실내 유리창 진동의 도청에 대한 연구 (A Study on the Eavesdropping of the Glass Window Vibration in a Conference Room)

  • 김석현;김윤호;허욱
    • 산업기술연구
    • /
    • 제27권A호
    • /
    • pp.55-60
    • /
    • 2007
  • Possibility of the eavesdropping is investigated on a conference room-glass window coupled system. Speech intelligibility analysis is performed on the eavesdropping sound of the glass window. Using MLS(Maximum Length Sequency) signal as a sound source, acceleration and velocity responses of the glass window are measured by accelerometer and laser doppler vibrometer. MTF(Modulation Transfer Function) is used to identify the speech transmission characteristics of the room and window system. STI(Speech Transmission Index) is calculated by using MTF and speech intelligibility of the vibration sound is estimated. Speech intelligibilities by the acceleration signal and the velocity signal are compared.

  • PDF

MMSE Estimator 기반의 적응 콤 필터링을 이용한 잡음 제거 (Noise Reduction Using MMSE Estimator-based Adaptive Comb Filtering)

  • 박정식;오영환
    • 대한음성학회지:말소리
    • /
    • 제60호
    • /
    • pp.181-190
    • /
    • 2006
  • This paper describes a speech enhancement scheme that leads to significant improvements in recognition performance when used in the ASR front-end. The proposed approach is based on adaptive comb filtering and an MMSE-related parameter estimator. While adaptive comb filtering reduces noise components remarkably, it is rarely effective in reducing non-stationary noises. Furthermore, due to the uniformly distributed frequency response of the comb-filter, it can cause serious distortion to clean speech signals. This paper proposes an improved comb-filter that adjusts its spectral magnitude to the original speech, based on the speech absence probability and the gain modification function. In addition, we introduce the modified comb filtering-based speech enhancement scheme for ASR in mobile environments. Evaluation experiments carried out using the Aurora 2 database demonstrate that the proposed method outperforms conventional adaptive comb filtering techniques in both clean and noisy environments.

  • PDF

DOA 기반 학습률 조절을 이용한 다채널 음성개선 알고리즘 (Multi-Channel Speech Enhancement Algorithm Using DOA-based Learning Rate Control)

  • 김수환;이영재;김영일;정상배
    • 말소리와 음성과학
    • /
    • 제3권3호
    • /
    • pp.91-98
    • /
    • 2011
  • In this paper, a multi-channel speech enhancement method using the linearly constrained minimum variance (LCMV) algorithm and a variable learning rate control is proposed. To control the learning rate for adaptive filters of the LCMV algorithm, the direction of arrival (DOA) is measured for each short-time input signal and the likelihood function of the target speech presence is estimated to control the filter learning rate. Using the likelihood measure, the learning rate is increased during the pure noise interval and decreased during the target speech interval. To optimize the parameter of the mapping function between the likelihood value and the corresponding learning rate, an exhaustive search is performed using the Bark's scale distortion (BSD) as the performance index. Experimental results show that the proposed algorithm outperforms the conventional LCMV with fixed learning rate in the BSD by around 1.5 dB.

  • PDF

음성에 의한 경로교시 기능과 충돌회피 기능을 갖춘 자율이동로봇의 개발 (Development of an Autonomous Mobile Robot with the Function of Teaching a Moving Path by Speech and Avoiding a Collision)

  • 박민규;이민철;이석
    • 한국정밀공학회지
    • /
    • 제17권8호
    • /
    • pp.189-197
    • /
    • 2000
  • This paper addresses that the autonomous mobile robot with the function of teaching a moving path by speech and avoiding a collision is developed. The use of human speech as the teaching method provides more convenient user-interface for a mobile robot. In speech recognition system a speech recognition algorithm using neural is proposed to recognize Korean syllable. For the safe navigation the autonomous mobile robot needs abilities to recognize a surrounding environment and to avoid collision with obstacles. To obtain the distance from the mobile robot to the various obstacles in surrounding environment ultrasonic sensors is used. By the navigation algorithm the robot forecasts the collision possibility with obstacles and modifies a moving path if it detects a dangerous obstacle.

  • PDF

Progress, challenges, and future perspectives in genetic researches of stuttering

  • Kang, Changsoo
    • Journal of Genetic Medicine
    • /
    • 제18권2호
    • /
    • pp.75-82
    • /
    • 2021
  • Speech and language functions are highly cognitive and human-specific features. The underlying causes of normal speech and language function are believed to reside in the human brain. Developmental persistent stuttering, a speech and language disorder, has been regarded as the most challenging disorder in determining genetic causes because of the high percentage of spontaneous recovery in stutters. This mysterious characteristic hinders speech pathologists from discriminating recovered stutters from completely normal individuals. Over the last several decades, several genetic approaches have been used to identify the genetic causes of stuttering, and remarkable progress has been made in genome-wide linkage analysis followed by gene sequencing. So far, four genes, namely GNPTAB, GNPTG, NAGPA, and AP4E1, are known to cause stuttering. Furthermore, thegeneration of mouse models of stuttering and morphometry analysis has created new ways for researchers to identify brain regions that participate in human speech function and to understand the neuropathology of stuttering. In this review, we aimed to investigate previous progress, challenges, and future perspectives in understanding the genetics and neuropathology underlying persistent developmental stuttering.