• 제목/요약/키워드: Speech function

검색결과 696건 처리시간 0.024초

Noise Suppression Using Normalized Time-Frequency Bin Average and Modified Gain Function for Speech Enhancement in Nonstationary Noisy Environments

  • Lee, Soo-Jeong;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • 제27권1E호
    • /
    • pp.1-10
    • /
    • 2008
  • A noise suppression algorithm is proposed for nonstationary noisy environments. The proposed algorithm is different from the conventional approaches such as the spectral subtraction algorithm and the minimum statistics noise estimation algorithm in that it classifies speech and noise signals in time-frequency bins. It calculates the ratio of the variance of the noisy power spectrum in time-frequency bins to its normalized time-frequency average. If the ratio is greater than an adaptive threshold, speech is considered to be present. Our adaptive algorithm tracks the threshold and controls the trade-off between residual noise and distortion. The estimated clean speech power spectrum is obtained by a modified gain function and the updated noisy power spectrum of the time-frequency bin. This new algorithm has the advantages of simplicity and light computational load for estimating the noise. This algorithm reduces the residual noise significantly, and is superior to the conventional methods.

복소 라플라시안 확률 밀도 함수에 기반한 음성 향상 기법 (Noisy Speech Enhancement Based on Complex Laplacian Probability Density Function)

  • 박윤식;조규행;장준혁
    • 대한전자공학회논문지SP
    • /
    • 제44권6호
    • /
    • pp.111-117
    • /
    • 2007
  • 본 논문에서는 복소 라플라시안 확률밀도함수 (PDF, Probability Density Function)에 기반한 새로운 음성 향상 기법을 제시한다. 적용된 복소 라플라시안 PDF가 기존의 가우시안 PDF보다 오염된 음성 분포를 정확하게 표현한다는 것을 Goodness-of-Fit (GOF) 테스트로 확인하였고, 음성 향상 알고리즘의 음성부재확률을 위해 우도비 (LR, Likelihood Ratio)를 적용하였다. 제시된 알고리즘의 성능은 객관적 테스트에 의해 평가하였고 기존의 가우시안 PDF보다 개선된 음성 향상 결과를 나타내었다.

음성인식을 위한 성도 길이 정규화 (Vocal Tract Length Normalization for Speech Recognition)

  • 지상문
    • 한국정보통신학회논문지
    • /
    • 제7권7호
    • /
    • pp.1380-1386
    • /
    • 2003
  • 화자들 사이의 성도의 길이의 변이에 의하여 음성 인식기의 성능이 저하된다. 본 연구에서는 입력 음성에서 추출한 단구간 스펙트럼의 주파수축을 확대하거나 축소하여 음성인식기에 미치는 화자사이의 성도 길이의 영향을 최소화하는 방법을 사용한다 성도의 길이를 정규화하기 위한 주파수 변환 함수로서, 선형의 주파수 변환 함수와 조각적 선형적인 변환 함수를 고려하였다. 또한, 커다란 성도길이의 변이에 따른 주파수축의 척도변화를 보다 효과적으로 모의할 수 있는 가변구간 조각적 선형함수를 제안한다. TIDIGITS 연결 숫자음 음성자료에 대하여 제안한 방법을 적용한 결과, 단어의 오인식률을 2.15%에서 0.53%로 크게 감소시킴으로서, 성도 길이 정규화가 화자 독립 음성인식기의 성능 향상에 필수적임을 알 수 있었다.

Relationship between executive function and cue weighting in Korean stop perception across different dialects and ages

  • Kong, Eun Jong;Lee, Hyunjung
    • 말소리와 음성과학
    • /
    • 제13권3호
    • /
    • pp.21-29
    • /
    • 2021
  • The present study investigated how one's cognitive resources are related to speech perception by examining Korean speakers' executive function (EF) capacity and its association with voice onset time (VOT) and f0 sensitivity in identifying Korean stop laryngeal categories (/t'/ vs. /t/ vs. /th/). Previously, Kong et al. (under revision) reported that Korean listeners (N = 154) in Seoul and Changwon (Gyeongsang) showed differential group patterns in dialect-specific cue weightings across educational institutions (college, high school, and elementary school). We follow up this study by further relating their EF control (working memory, mental flexibility, and inhibition) to their speech perception patterns to examine whether better cognitive ability would control attention to multiple acoustic dimensions. Partial correlation analyses revealed that better EFs in Korean listeners were associated with greater sensitivity to available acoustic details and with greater suppression of irrelevant acoustic information across subgroups, although only a small set of EF components turned out to be relevant. Unlike Seoul participants, Gyeongsang listeners' f0 use was not correlated with any EF task scores, reflecting dialect-specific cue primacy using f0 as a secondary cue. The findings confirm the link between speech perception and general cognitive ability, providing experimental evidence from Korean listeners.

MTF-STI를 이용한 유리창 도청음의 명료도 분석 (Intelligibility Analysis on the Eavesdropping Sound of Glass Windows Using MTF-STI)

  • 김희동;김윤호;김석현
    • 한국음향학회지
    • /
    • 제26권1호
    • /
    • pp.8-15
    • /
    • 2007
  • 음향 공동-유리창 연성계를 대상으로 도청음의 음성 명료도를 검토한다. MLS신호를 음원으로 유리창의 가속도와 속도 응답을 가속도계와 레이저 도플러 진동계를 사용하여 측정한다. 변조전송함수 (MTF)를 사용하여 공동-유리창 진동계의 음성전달특성을 규명한다. 변조전송함수에 근거하여 음성전송지수 (STI)를 구하고, 유리창 진동음의 음성명료도를 평가한다. 가속도음과 속도음의 음성명료도를 비교하고, 최종적으로 대화음의 명료도를 주관적 평가로 확인한다.

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

  • Lee, Soo-Jeong;Kim, Soon-Hyob
    • International Journal of Control, Automation, and Systems
    • /
    • 제6권6호
    • /
    • pp.818-827
    • /
    • 2008
  • In this paper, we propose a new noise estimation and reduction algorithm for stationary and nonstationary noisy environments. This approach uses an algorithm that classifies the speech and noise signal contributions in time-frequency bins. It relies on the ratio of the normalized standard deviation of the noisy power spectrum in time-frequency bins to its average. If the ratio is greater than an adaptive estimator, speech is considered to be present. The propose method uses an auto control parameter for an adaptive estimator to work well in highly nonstationary noisy environments. The auto control parameter is controlled by a linear function using a posteriori signal to noise ratio(SNR) according to the increase or the decrease of the noise level. The estimated clean speech power spectrum is obtained by a modified gain function and the updated noisy power spectrum of the time-frequency bin. This new algorithm has the advantages of much more simplicity and light computational load for estimating the stationary and nonstationary noise environments. The proposed algorithm is superior to conventional methods. To evaluate the algorithm's performance, we test it using the NOIZEUS database, and use the segment signal-to-noise ratio(SNR) and ITU-T P.835 as evaluation criteria.

CASA 기반 음성분리 성능 향상을 위한 형태 분석 기술의 응용 (Application of Shape Analysis Techniques for Improved CASA-Based Speech Separation)

  • 이윤경;권오욱
    • 대한음성학회지:말소리
    • /
    • 제65호
    • /
    • pp.153-168
    • /
    • 2008
  • We propose a new method to apply shape analysis techniques to a computational auditory scene analysis (CASA)-based speech separation system. The conventional CASA-based speech separation system extracts speech signals from a mixture of speech and noise signals. In the proposed method, we complement the missing speech signals by applying the shape analysis techniques such as labelling and distance function. In the speech separation experiment, the proposed method improves signal-to-noise ratio by 6.6 dB. When the proposed method is used as a front-end of speech recognizers, it improves recognition accuracy by 22% for the speech-shaped stationary noise condition and 7.2% for the two-talker noise condition at the target-to-masker ratio than or equal to -3 dB.

  • PDF

청각 장애자를 위한 시각 음성 처리 시스템에 관한 연구 (A study on the Visible Speech Processing System for the Hearing Impaired)

  • 김원기;김남현
    • 대한의용생체공학회:의공학회지
    • /
    • 제11권1호
    • /
    • pp.75-82
    • /
    • 1990
  • The purpose of this study is to help the hearing Impaired's speech training with a visible speech processing system. In brief, this system converts the features of speech signals into graphics on monitor, and adjusts the features of hearing impaired to normal ones. There are formant and pitch in the features used for this system. They are extracted using the digital signal processing such as linear predictive method or AMDF(Average Magnitude Difference Function). In order to effectively train for the hearing impaired's abnormal speech, easilly visible feature has been being studied.

  • PDF

Maximum Likelihood Training and Adaptation of Embedded Speech Recognizers for Mobile Environments

  • Cho, Young-Kyu;Yook, Dong-Suk
    • ETRI Journal
    • /
    • 제32권1호
    • /
    • pp.160-162
    • /
    • 2010
  • For the acoustic models of embedded speech recognition systems, hidden Markov models (HMMs) are usually quantized and the original full space distributions are represented by combinations of a few quantized distribution prototypes. We propose a maximum likelihood objective function to train the quantized distribution prototypes. The experimental results show that the new training algorithm and the link structure adaptation scheme for the quantized HMMs reduce the word recognition error rate by 20.0%.

PROSODY CONTROL BASED ON SYNTACTIC INFORMATION IN KOREAN TEXT-TO-SPEECH CONVERSION SYSTEM

  • Kim, Yeon-Jun;Oh, Yung-Hwan
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1994년도 FIFTH WESTERN PACIFIC REGIONAL ACOUSTICS CONFERENCE SEOUL KOREA
    • /
    • pp.937-942
    • /
    • 1994
  • Text-to-Speech(TTS) conversion system can convert any words or sentences into speech. To synthesize the speech like human beings do, careful prosody control including intonation, duration, accent, and pause is required. It helps listeners to understand the speech clearly and makes the speech sound more natural. In this paper, a prosody control scheme which makes use of the information of the function word is proposed. Among many factors of prosody, intonation, duration, and pause are closely related to syntactic structure, and their relations have been formalized and embodied in TTS. To evaluate the synthesized speech with the proposed prosody control, one of the subjective evaluation methods-MOS(Mean Opinion Score) method has been used. Synthesized speech has been tested on 10 listeners and each listener scored the speech between 1 and 5. Through the evaluation experiments, it is observed that the proposed prosody control helps TTS system synthesize the more natural speech.

  • PDF