Speech Recognition Accuracy Prediction Using Speech Quality Measure

Ji, Seung-eun;Kim, Wooil;

doi:10.6109/jkiice.2016.20.3.471

Journal of the Korea Institute of Information and Communication Engineering (한국정보통신학회논문지)

Volume 20 Issue 3
/
Pages.471-476
/
2016
/
2234-4772(pISSN)
/
2288-4165(eISSN)

The Korea Institute of Information and Commucation Engineering (한국정보통신학회)

DOI QR Code

Speech Recognition Accuracy Prediction Using Speech Quality Measure

음성 특성 지표를 이용한 음성 인식 성능 예측

Ji, Seung-eun (Department of Computer Science & Engineering, Incheon National University) ;
Kim, Wooil (Department of Computer Science & Engineering, Incheon National University)

지승은 ;
김우일

Received : 2016.01.25
Accepted : 2016.03.02
Published : 2016.03.31

https://doi.org/10.6109/jkiice.2016.20.3.471 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper presents our study on speech recognition performance prediction. Our initial study shows that a combination of speech quality measures effectively improves correlation with Word Error Rate (WER) compared to each speech measure alone. In this paper we demonstrate a new combination of various types of speech quality measures shows more significantly improves correlation with WER compared to the speech measure combination of our initial study. In our study, SNR, PESQ, acoustic model score, and MFCC distance are used as the speech quality measures. This paper also presents our speech database verification system for speech recognition employing the speech measures. We develop a WER prediction system using Gaussian mixture model and the speech quality measures as a feature vector. The experimental results show the proposed system is highly effective at predicting WER in a low SNR condition of speech babble and car noise environments.

본 논문에서는 음성 특성 지표를 이용한 음성 인식 성능 예측 실험의 내용을 소개한다. 선행 실험에서 효과적인 음성 인식 성능 예측을 위해 대표적인 음성 인식 성능 지표인 단어 오인식률과 상관도가 높은 여러 가지 특성 지표들을 조합하여 새로운 성능 지표를 제안하였다. 제안한 지표는 각 음성 특성 지표를 단독으로 사용할 때 보다 단어 오인식률과 높은 상관도를 나타내 음성 인식 성능을 예측하는데 효과적임을 보였다. 본 실험에서는 이 결과를 근거하여 조합에 사용된 음성 특성 지표를 채택하여 4차원 특징 벡터를 생성하고 GMM 기반의 음성 인식 성능 예측기를 구축한다. 가우시안 요소를 증가시키며 실험한 결과 제안된 시스템은 babble 잡음, 자동차 잡음에서 모두 SNR이 낮을수록 단어 오인식률을 높은 확률로 예측함을 확인하였다.

Keywords

References

S. -Y. Yoon, L. Chen and K. Zechner, "Predicting Word Accuracy for the Automatic Speech Recognition of Non-native Speech," Interspeech-2010, pp. 773-776, 2010.
W. Kim and J. H. L. Hansen, "Phonetic Distance Based Confidence Measure," Signal Processing Letters, IEEE vol. 17, no.2, pp. 121-124, Feb. 2010. https://doi.org/10.1109/LSP.2009.2034551
S. Ji and W. Kim, "A Study on Speech Measure Analysis for Speech Recognition Accuracy Estimation in Noisy Environments," A Conference of Acoustical Society of Korea, vol. 34, no. 1, pp. 46, May 2015.
S. Ji, J. Cho and W. Kim, "Development of Database Verification System for Automatic Speech Recognition," KCC2015, vol. 34, pp. 719-720, June 2015.
S. Ji and W. Kim, "A Study on Effective Speech Recognition Performance Measure using MFCC Similarity," KSCSP-2015, vol. 32, no. 1, pp.220-222, Aug. 2015.
S. Ji, M. Song, J. Yoon and W. Kim, "Speech Recognition Performance Prediction employing Speech Quality Measure," A Conference of Acoustical Society of Korea, vol. 34, no. 2, pp. 46, Nov. 2015.
STNR technique provided by National Institute of Standards and Technology(NIST) [Internet]. Available: http://www.nist.gov/speech
Y. Hu and P. C. Loizou, "Evaluation of Objective Measure for Speech Enhancement," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 16, no.1, pp. 229-238, Sep. 2008. https://doi.org/10.1109/TASL.2007.911054
Hidden Markov Model Toolkit (HTK) developed by Cambridge University. HTK software and tutorial download page [Internet]. Available: http://htk.eng.cam.ac.uk
TIMIT speech database provided by Linguistic Data Consortium(LDC) of University of Pennsylvania [Internet]. Available: https://catalog.ldc.upenn.edu/LDC93S1

Cited by

효과적인 음성 인식 평가를 위한 심층 신경망 기반의 음성 인식 성능 지표 vol.21, pp.12, 2016, https://doi.org/10.6109/jkiice.2017.21.12.2291
심한 소음환경에서 언어장애인 음성 인식률 향상을 위한 단어선정 방법 및 장치 개선에 관한 연구 vol.23, pp.5, 2016, https://doi.org/10.6109/jkiice.2019.23.5.555
Adaptive recognition of different accents conversations based on convolutional neural network vol.78, pp.21, 2016, https://doi.org/10.1007/s11042-018-6590-4
신경학적 손상에 의한 언어장애인 음성 인식률 개선(H/W, S/W)에 관한 연구 vol.23, pp.11, 2019, https://doi.org/10.6109/jkiice.2019.23.11.1397

Journal of the Korea Institute of Information and Communication Engineering (한국정보통신학회논문지)

Speech Recognition Accuracy Prediction Using Speech Quality Measure

음성 특성 지표를 이용한 음성 인식 성능 예측

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)