Search | Korea Science

Ji, Seung-eun;Kim, Wooil
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.20 no.3
- /
- pp.471-476
- /
- 2016
This paper presents our study on speech recognition performance prediction. Our initial study shows that a combination of speech quality measures effectively improves correlation with Word Error Rate (WER) compared to each speech measure alone. In this paper we demonstrate a new combination of various types of speech quality measures shows more significantly improves correlation with WER compared to the speech measure combination of our initial study. In our study, SNR, PESQ, acoustic model score, and MFCC distance are used as the speech quality measures. This paper also presents our speech database verification system for speech recognition employing the speech measures. We develop a WER prediction system using Gaussian mixture model and the speech quality measures as a feature vector. The experimental results show the proposed system is highly effective at predicting WER in a low SNR condition of speech babble and car noise environments.
https://doi.org/10.6109/jkiice.2016.20.3.471 인용 PDF KSCI

Jung, Youn-Chan;Ann, Ibanez Al
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.35 no.1B
- /
- pp.27-34
- /
- 2010
As VoIP systems move to wireless environments with much higher average packet loss rates than wired networks, it becomes less possible for the network to assure a reasonable QoS. So, real-time quality monitoring for mobile VoIP applications is an important issue to be explored. In this paper, we explore perceptual quality dependency on two parameters: the burst loss rate and average burst length. Also, we propose a simple 'moving average' approach with $\alpha$ aiming to measure those parameters on real-time basis. In order to find how accurately the two parameters measured estimate the real perceptual quality, we compare actual measured PESQ scores with estimated value by matching the measured quality metric to the trained MOS table. Finally, we propose the quality-based accounting system, which can set obvious continuities between quality and billing.
PDF KSCI

Kwon, Kisoo;Jin, Yu Gwang;Bae, Soo Hyun;Kim, Nam Soo
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.38C no.6
- /
- pp.503-511
- /
- 2013
This paper presents a speech enhancement method using non-negative matrix factorization. In training phase, we can obtain each basis matrix from speech and specific noise database. After training phase, the noisy signal is separated from the speech and noise estimate using basis matrix in enhancement phase. In order to improve the performance, we model the change of encoding matrix from training phase to enhancement phase using independent Gaussian distribution models, and then use the constraint of the objective function almost same as that of the above Gaussian models. Also, we perform a smoothing operation to the encoding matrix by taking into account previous value. Last, we apply the Log-Spectral Amplitude type algorithm as gain function.
https://doi.org/10.7840/kics.2013.38C.6.503 인용 PDF KSCI

Lim, jong-Wook;Kim, Ki-Chul;Kim, Kyeong-Sun;Lee, Hang-Seop;Park, Hae-Young;Kim, Moo-Young
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.3
- /
- pp.290-297
- /
- 2009
This paper presents an improved adaptive multi-rate wideband (AMR-WB) algorithm for the efficient Text-To-Speech (TTS) database compression. The proposed algorithm includes unnecessary common bit-stream (CBS) removal and parameter delta coding combined with speaker-dependent huffman coding to reduce the required bit-rate without any quality degradation. We also propose lossy coding schemes to produce the maximum bit-rate reduction with negligible quality degradation. The proposed lossless algorithm including CBS removal can reduce bit-rate by 12.40% without quality degradation compared with the 12.65 kbps AMR-WB mode. The proposed lossy algorithm can reduce bit-rate by 20.00% with 0.12 PESQ degradation.
https://doi.org/10.7776/ASK.2009.28.3.290 인용 PDF KSCI