• Title/Summary/Keyword: PESQ

Search Result 84, Processing Time 0.015 seconds

Speech Recognition Accuracy Prediction Using Speech Quality Measure (음성 특성 지표를 이용한 음성 인식 성능 예측)

  • Ji, Seung-eun;Kim, Wooil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.3
    • /
    • pp.471-476
    • /
    • 2016
  • This paper presents our study on speech recognition performance prediction. Our initial study shows that a combination of speech quality measures effectively improves correlation with Word Error Rate (WER) compared to each speech measure alone. In this paper we demonstrate a new combination of various types of speech quality measures shows more significantly improves correlation with WER compared to the speech measure combination of our initial study. In our study, SNR, PESQ, acoustic model score, and MFCC distance are used as the speech quality measures. This paper also presents our speech database verification system for speech recognition employing the speech measures. We develop a WER prediction system using Gaussian mixture model and the speech quality measures as a feature vector. The experimental results show the proposed system is highly effective at predicting WER in a low SNR condition of speech babble and car noise environments.

VoIP Quality Metric and Quality-based Accounting Scheme (VoIP 품질 측량 도구 및 품질 기반의 요금 부과 방안 연구)

  • Jung, Youn-Chan;Ann, Ibanez Al
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.1B
    • /
    • pp.27-34
    • /
    • 2010
  • As VoIP systems move to wireless environments with much higher average packet loss rates than wired networks, it becomes less possible for the network to assure a reasonable QoS. So, real-time quality monitoring for mobile VoIP applications is an important issue to be explored. In this paper, we explore perceptual quality dependency on two parameters: the burst loss rate and average burst length. Also, we propose a simple 'moving average' approach with $\alpha$ aiming to measure those parameters on real-time basis. In order to find how accurately the two parameters measured estimate the real perceptual quality, we compare actual measured PESQ scores with estimated value by matching the measured quality metric to the trained MOS table. Finally, we propose the quality-based accounting system, which can set obvious continuities between quality and billing.

A NMF-Based Speech Enhancement Method Using a Prior Time Varying Information and Gain Function (시간 변화에 따른 사전 정보와 이득 함수를 적용한 NMF 기반 음성 향상 기법)

  • Kwon, Kisoo;Jin, Yu Gwang;Bae, Soo Hyun;Kim, Nam Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38C no.6
    • /
    • pp.503-511
    • /
    • 2013
  • This paper presents a speech enhancement method using non-negative matrix factorization. In training phase, we can obtain each basis matrix from speech and specific noise database. After training phase, the noisy signal is separated from the speech and noise estimate using basis matrix in enhancement phase. In order to improve the performance, we model the change of encoding matrix from training phase to enhancement phase using independent Gaussian distribution models, and then use the constraint of the objective function almost same as that of the above Gaussian models. Also, we perform a smoothing operation to the encoding matrix by taking into account previous value. Last, we apply the Log-Spectral Amplitude type algorithm as gain function.

Efficient TTS Database Compression Based on AMR-WB Speech Coder (AMR-WB 음성 부호화기를 이용한 TTS 데이터베이스의 효율적인 압축 기법)

  • Lim, jong-Wook;Kim, Ki-Chul;Kim, Kyeong-Sun;Lee, Hang-Seop;Park, Hae-Young;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.290-297
    • /
    • 2009
  • This paper presents an improved adaptive multi-rate wideband (AMR-WB) algorithm for the efficient Text-To-Speech (TTS) database compression. The proposed algorithm includes unnecessary common bit-stream (CBS) removal and parameter delta coding combined with speaker-dependent huffman coding to reduce the required bit-rate without any quality degradation. We also propose lossy coding schemes to produce the maximum bit-rate reduction with negligible quality degradation. The proposed lossless algorithm including CBS removal can reduce bit-rate by 12.40% without quality degradation compared with the 12.65 kbps AMR-WB mode. The proposed lossy algorithm can reduce bit-rate by 20.00% with 0.12 PESQ degradation.