• Title/Summary/Keyword: Simulation speech

Search Result 302, Processing Time 0.808 seconds

강인한 음성인식을 위한 이중모드 센서의 결합방식에 관한 연구 (A Study on Combining Bimodal Sensors for Robust Speech Recognition)

  • 이철우;계영철;고인선
    • 한국음향학회지
    • /
    • 제20권6호
    • /
    • pp.51-56
    • /
    • 2001
  • 최근 잡음이 심한 환경에서 음성인식을 신뢰성있게 하기 위하여 입모양의 움직임과 음성을 같이 사용하는 방법이 활발히 연구되고 있다 본 논문에서도 이러한 목적으로 영상언어인식기와 음성인식기의 결과에 각각 가중치를 주어 결합하는 방법을 제안한다. 특히 가중치를 입력음성의 잡음의 정도에 따라 자동적으로 결정하는 방법을 제안한다. 가중치의 결정을 위하여 입력샘플간의 상관도와 LPC분석의 잔여 오차를 이용한다. 모의실험 결과, 이런 방식으로 결합된 인식기는 잡음이 심한 환경에서도 약 83%의 인식성능을 보이고 있다.

  • PDF

잡음 환경에 강인한 이중모드 음성인식 시스템에 관한 연구 (A Study on the Robust Bimodal Speech-recognition System in Noisy Environments)

  • 이철우;고인선;계영철
    • 한국음향학회지
    • /
    • 제22권1호
    • /
    • pp.28-34
    • /
    • 2003
  • 최근 잡음이 심한 환경에서 음성인식을 신뢰성 있게 하기 위하여 입 모양의 움직임 (영상언어)과 음성을 같이 사용하는 방법이 활발히 연구되고 있다 본 논문에서는 영상언어 인식기의 결과와 음성인식기의 결과에 각각 가중치를 주어 결합하는 방법을 연구하였다. 각각의 인식 결과에 적절한 가중치를 결정하는 방법을 제안하였으며, 특히 음성정보에 들어있는 잡음의 정도와 영상정보의 화질에 따라 자동적으로 가중치를 결정하도록 하였다. 모의 실험 결과 제안된 방법에 의한 결합 인식률이 잡음이 심한 환경에서도 84% 이상의 인식률을 나타내었으며, 영상에 번짐효과가 있는 경우 영상의 번짐 정도를 고려한 결합 방법이 그렇지 않은 경우보다 우수한 인식 성능을 나타내었다.

자동 음성 인식기를 위한 단채널 음질 향상 알고리즘의 성능 분석 (Performance Analysis of a Class of Single Channel Speech Enhancement Algorithms for Automatic Speech Recognition)

  • 송명석;이창헌;이석필;강홍구
    • The Journal of the Acoustical Society of Korea
    • /
    • 제29권2E호
    • /
    • pp.86-99
    • /
    • 2010
  • This paper analyzes the performance of various single channel speech enhancement algorithms when they are applied to automatic speech recognition (ASR) systems as a preprocessor. The functional modules of speech enhancement systems are first divided into four major modules such as a gain estimator, a noise power spectrum estimator, a priori signal to noise ratio (SNR) estimator, and a speech absence probability (SAP) estimator. We investigate the relationship between speech recognition accuracy and the roles of each module. Simulation results show that the Wiener filter outperforms other gain functions such as minimum mean square error-short time spectral amplitude (MMSE-STSA) and minimum mean square error-log spectral amplitude (MMSE-LSA) estimators when a perfect noise estimator is applied. When the performance of the noise estimator degrades, however, MMSE methods including the decision directed module to estimate a priori SNR and the SAP estimation module helps to improve the performance of the enhancement algorithm for speech recognition systems.

Adaptive Multi-Rate(AMR) 음성부호화 알고리즘 (Adaptive Multi-Rate(AMR) Speech Coding Algorithm)

  • 서정욱;배건성
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2000년도 하계종합학술대회 논문집(4)
    • /
    • pp.92-97
    • /
    • 2000
  • An AMR(Adaptive Multi-Rate) speech coding algorithm has been adopted as a standard speech codec for IMT-2000. It is based on the algebraic CELP, and consists of eight speech coding modes having the bit rate from 4.75 kbit/s to 12.2 kbit/s. It also contains the VAD(Voice Activity Detector), SCR (Source Controlled Rate) operation, and error concealment scheme for robustness in a radio channel. The bit rate of AMR is changed on a frame basis depending on the channel condition. In this paper, we introduced AMR speech coding algorithm and performed the real-time implementation using TMS320C6201, i.e., a Texas Instrument's fixed-point DSP. With the ANSI C source code released from ETSI and 3GPP, we convert and optimize the program to make it run in real time using the C compiler and assembly language. It is verified that the decoded result of the implemented speech codec on the DSP is identical with the PC simulation result using ANSI C code for test sequences. Also, actual sound input/output test using microphone and speaker demonstrates its proper real-time operation without distortions or delays.

  • PDF

Performance Improvement of Adaptive Noise Cancellation Using a Speech Detector

  • Park, Jang-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • 제15권2E호
    • /
    • pp.39-44
    • /
    • 1996
  • The performance of two-channel adaptive noise canceller is ofter degraded by the weights perturbation due to the speech signal. In this paper, an adaptive noise canceller employing a speech detector and two adaptation algorithms which are switched according to the speech detector is proposed. When highly correlated speech signal is detected, the tap weights of the adaptive filter are adapted by the sign algorithm. On the other hand, the weights are adapted by the NLMS algorithm when silence is detected or when the characteristics of the noise propagation channel is changed. The employed speech detector utilizes the power ratio of the input and the output of an adaptive linear prediction-error filter. According to the computer simulation, the proposed method yields better performance than conventional ones.

  • PDF

모듈화한 신경 회로망을 이용한 광대역 음성 복원 (Wideband Speech Reconstruction Using Modular Neural Networks)

  • 우동헌;고참한;강현민;정진희;김유신;김형순
    • 대한음성학회지:말소리
    • /
    • 제48호
    • /
    • pp.93-105
    • /
    • 2003
  • Since telephone channel has bandlimited frequency characteristics, speech signal over the telephone channel shows degraded speech quality. In this paper, we propose an algorithm using neural network to reconstruct wideband speech from its narrowband version. Although single neural network is a good tool for direct mapping, it has difficulty in training for vast and complicated data. To alleviate this problem, we modularize the neural networks based on appropriate clustering of the acoustic space. We also introduce fuzzy computing to compensate for probable misclassification at the cluster boundaries. According to our simulation, the proposed algorithm showed improved performance over the single neural network and conventional codebook mapping method in both objective and subjective evaluations.

  • PDF

The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments

  • Yoon, Kyu-Chul
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.234-239
    • /
    • 2007
  • The purpose of this paper is to examine the viability of simulating one dialect with the speech segments of another dialect through prosody cloning. The hypothesis is that, among Korean regional dialects, it is not the segmental differences but the prosodic differences that play a major role in authentic dialect perception. This work intends to support the hypothesis by simulating Masan dialect with the speech segments from Seoul dialect. The dialect simulation was performed by transplanting the prosodic features of Masan utterances unto the same utterances produced by a Seoul speaker. Thus, the simulated Masan utterances were composed of Seoul speech segments but their prosody came from the original Masan utterances. The prosodic features involved were the fundamental frequency contour, the segmental durations, and the intensity contour. The simulated Masan utterances were evaluated by four native Masan speakers and the role of prosody in dialect authentication and speech synthesis was discussed.

  • PDF

음성통신 중 웨이브렛 계수 양자화를 이용한 비밀정보 통신 방법 (Secret Data Communication Method using Quantization of Wavelet Coefficients during Speech Communication)

  • 이종관
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2006년도 가을 학술발표논문집 Vol.33 No.2 (D)
    • /
    • pp.302-305
    • /
    • 2006
  • In this paper, we have proposed a novel method using quantization of wavelet coefficients for secret data communication. First, speech signal is partitioned into small time frames and the frames are transformed into frequency domain using a WT(Wavelet Transform). We quantize the wavelet coefficients and embedded secret data into the quantized wavelet coefficients. The destination regard quantization errors of received speech as seceret dat. As most speech watermark techniques have a trade off between noise robustness and speech quality, our method also have. However we solve the problem with a partial quantization and a noise level dependent threshold. In additional, we improve the speech quality with de-noising method using wavelet transform. Since the signal is processed in the wavelet domain, we can easily adapt the de-noising method based on wavelet transform. Simulation results in the various noisy environments show that the proposed method is reliable for secret communication.

  • PDF

초기 수직반사음의 역할을 고려한 새로운 명료도 지표 (A new acoustical parameter for speech intelligibility with regard to early vertical reflections)

  • 박종영;한명호;정대업;오양기
    • KIEAE Journal
    • /
    • 제7권3호
    • /
    • pp.63-70
    • /
    • 2007
  • It is known that early reflections, their energy and delay times after the arrival of direct sound are important factors for speech intelligibility. In this basis, acoustical parameters like D50 and C80 had been proposed and are widely used for assessing the listening condition of rooms. These parameters are focused on the fraction of the early energy to the total, regardless of the spatial characteristics of the early reflections. This means that all the early reflections, arrived in certain time boundary. from front, behind, down and upside have the same impact on speech intelligibility. From the questionable simplicity, the influence of the direction of early reflections on speech intelligibility is examined in this study. A computer simulation speech intelligibility test, conducted for 22 university students, found that the reflection of vertical direction with method of the Paired comparison also the preference of 0.746 degree was visible an increase.

16Kbps와 40Kbps의 Dual Rate G.723 ADPCM 음성 codec 구현 (Implementation of Dual Rate G.723 ADPCM Speech codec)

  • 김재오;한경호
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1998년도 하계학술대회 논문집 G
    • /
    • pp.2480-2482
    • /
    • 1998
  • In this paper, the implementation of dual rate ADPCM using G.723 16Kbps and 40Kbps speech codec algorithm is handled. For small signals, the low rate 16Kbps coding algorithm shows the same SNR as the high rate 40Kbps coding algorithm, while the low rate 16Kbps coding algorithm shows the lower SNR than the high rate 40Kbps coding algorithm for large signal. To obtain the good trade-off between the data rate and synthesized speech quality, we applied low rate 16Kbps for the small signal and high rate 40Kbps for the large signal. Various threshold values determining the rate are tested for good trade off data rate and speech quality. Also the low pass filter effect of speech input and output devices is simulated at several cut-off frequencies. To simulation result shows the good speech quality at a low rate comparing with 16Kbps & 40Kbps.

  • PDF