• Title/Summary/Keyword: Speech pause detection

Search Result 5, Processing Time 0.029 seconds

Automatic Detection of Intonational and Accentual Phrases in Korean Standard Continuous Speech (한국 표준어 연속음성에서의 억양구와 강세구 자동 검출)

  • Lee, Ki-Young;Song, Min-Suck
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.209-224
    • /
    • 2000
  • This paper proposes an automatic detection method of intonational and accentual phrases in Korean standard continuous speech. We use the pause over 150 msec for detecting intonational phrases, and extract accentual phrases from the intonational phrases by analyzing syllables and pitch contours. The speech data for the experiment are composed of seven male voices and two female voices which read the texts of the fable 'the ant and the grasshopper' and a newspaper article 'manmulsang' in normal speed and in Korean standard variation. The results of the experiment shows that the detection rate of intonational phrases is 95% on the average and that of accentual phrases is 73%. This detection rate implies that we can segment the continuous speech into smaller units(i.e. prosodic phrases) by using the prosodic information and so the objects of speech recognition can narrow down to words or phrases in continuous speech.

  • PDF

Voice Activity Detection Method Using Psycho-Acoustic Model Based on Speech Energy Maximization in Noisy Environments (잡음 환경에서 심리음향모델 기반 음성 에너지 최대화를 이용한 음성 검출 방법)

  • Choi, Gab-Keun;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.447-453
    • /
    • 2009
  • This paper introduces the method for detect voices and exact end point at low SNR by maximizing voice energy. Conventional VAD (Voice Activity Detection) algorithm estimates noise level so it tends to detect the end point inaccurately. Moreover, because it uses relatively long analysis range for reflecting temporal change of noise, computing load too high for application. In this paper, the SEM-VAD (Speech Energy Maximization-Voice Activity Detection) method which uses psycho-acoustical bark scale filter banks to maximize voice energy within frames is introduced. Stable threshold values are obtained at various noise environments (SNR 15 dB, 10 dB, 5 dB, 0 dB). At the test for voice detection in car noisy environment, PHR (Pause Hit Rate) was 100%accurate at every noise environment, and FAR (False Alarm Rate) shows 0% at SNR15 dB and 10 dB, 5.6% at SNR5 dB and 9.5% at SNR0 dB.

A Parametric Voice Activity Detection Based on the SPD-TE for Nonstationary Noises (비정체성 잡음을 위한 SPD-TE 기반 계수형 음성 활동 탐지)

  • Koo, Boneung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.4
    • /
    • pp.310-315
    • /
    • 2015
  • A single channel VAD (Voice Activity Detection) algorithm for nonstationary noise environment is proposed in this paper. Threshold values of the feature parameter for VAD decision are updated adaptively based on estimates of means and standard deviations of past non-speech frames. The feature parameter, SPD-TE (Spectral Power Difference-Teager Energy), is obtained by applying the Teager energy to the WPD (Wavelet Packet Decomposition) coefficients. It was reported previously that the SPD-TE is robust to noise as a feature for VAD. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that decision accuracy of the proposed algorithm is comparable to several typical VAD algorithms including standards for SNR values ranging from 10 to -10 dB.

A Single Channel Voice Activity Detection for Noisy Environments Using Wavelet Packet Decomposition and Teager Energy (웨이블렛 패킷 변환과 Teager 에너지를 이용한 잡음 환경에서의 단일 채널 음성 판별)

  • Koo, Boneung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.2
    • /
    • pp.139-145
    • /
    • 2014
  • In this paper, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients. The threshold value is obtained based on means and standard deviations of nonspeech frames. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that the proposed algorithm is superior to the typical VAD algorithm. The ROC(Receiver Operating Characteristics) curves are used to compare performance of VAD's for SNR values of ranging from 10 to -10 dB.

Voice Recognition Performance Improvement using a convergence of Voice Energy Distribution Process and Parameter (음성 에너지 분포 처리와 에너지 파라미터를 융합한 음성 인식 성능 향상)

  • Oh, Sang-Yeob
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.313-318
    • /
    • 2015
  • A traditional speech enhancement methods distort the sound spectrum generated according to estimation of the remaining noise, or invalid noise is a problem of lowering the speech recognition performance. In this paper, we propose a speech detection method that convergence the sound energy distribution process and sound energy parameters. The proposed method was used to receive properties reduce the influence of noise to maximize voice energy. In addition, the smaller value from the feature parameters of the speech signal The log energy features of the interval having a more of the log energy value relative to the region having a large energy similar to the log energy feature of the size of the voice signal containing the noise which reducing the mismatch of the training and the recognition environment recognition experiments Results confirmed that the improved recognition performance are checked compared to the conventional method. Car noise environment of Pause Hit Rate is in the 0dB and 5dB lower SNR region showed an accuracy of 97.1% and 97.3% in the high SNR region 10dB and 15dB 98.3%, showed an accuracy of 98.6%.