DOI QR코드

DOI QR Code

음성 에너지 분포 처리와 에너지 파라미터를 융합한 음성 인식 성능 향상

Voice Recognition Performance Improvement using a convergence of Voice Energy Distribution Process and Parameter

  • 오상엽 (가천대학교 IT대학 컴퓨터공학과)
  • Oh, Sang-Yeob (Dept. of Computer Engineering, Gachon University)
  • 투고 : 2015.08.02
  • 심사 : 2015.10.20
  • 발행 : 2015.10.28

초록

전통적인 음성 향상 방법은 잘못된 잡음의 추정에 따라 남아있는 잡음이 발생하여 음성 스펙트럼을 왜곡하거나 음성 프레임을 찾지 못하여 음성 인식 성능을 저하시키는 문제가 발생된다. 본 논문에서는 음성 에너지 분포 처리와 음성 에너지 파라미터를 융합한 음성 검출 방법을 제안하였다. 제안한 방법은 음성 에너지를 최대화시켜 잡음의 영향을 적게 받는 특성을 이용하였다. 또한, 음성 신호의 특징 파라미터 중에서 작은 값을 가지는 로그에너지 특징의 구간에서는 큰 에너지를 가지는 구간에 비해 상대적으로 로그에너지 값을 더 많이 키워서 잡음이 포함한 음성신호의 로그에너지 특징의 크기와 비슷하게 하여 훈련과 인식 환경의 불일치를 융합으로 인해 줄여준다. 인식 실험 결과 기존 방법에 비해 향상된 인식 성능을 확인할 수 있었으며, car 잡음 환경의 음성 구간 적중률은 낮은 SNR구간인 0dB과 5dB에서는 97.1%와 97.3%의 정확도를 보였으며, 높은 SNR구간인 10dB와 15dB에서는 98.3%, 98.6%의 정확도를 보였다.

A traditional speech enhancement methods distort the sound spectrum generated according to estimation of the remaining noise, or invalid noise is a problem of lowering the speech recognition performance. In this paper, we propose a speech detection method that convergence the sound energy distribution process and sound energy parameters. The proposed method was used to receive properties reduce the influence of noise to maximize voice energy. In addition, the smaller value from the feature parameters of the speech signal The log energy features of the interval having a more of the log energy value relative to the region having a large energy similar to the log energy feature of the size of the voice signal containing the noise which reducing the mismatch of the training and the recognition environment recognition experiments Results confirmed that the improved recognition performance are checked compared to the conventional method. Car noise environment of Pause Hit Rate is in the 0dB and 5dB lower SNR region showed an accuracy of 97.1% and 97.3% in the high SNR region 10dB and 15dB 98.3%, showed an accuracy of 98.6%.

키워드

참고문헌

  1. Chan-Shik Ahn, Sang-Yeob Oh. Gaussian Model Optimization using Configuration Thread Control In CHMM Vocabulary Recognition. The Journal of Digital Policy and Management. Vol. 10, No. 7, pp. 167-172, 2012.
  2. Chan-Shik Ahn, Sang-Yeob Oh. Echo Noise Robust HMM Learning Model using Average Estimator LMS Algorithm. The Journal of Digital Policy and Management. Vol. 10, No. 10, pp. 277-282, 2012.
  3. Chan-Shik Ahn, Sang-Yeob Oh. CHMM Modeling using LMS Algorithm for Continuous Speech Recognition Improvement. The Journal of digital policy and management Vol. 10, No. 11, pp. 377-382, 2012.
  4. Sang-Yeob Oh. Selective Speech Feature Extraction using Channel Similarity in CHMM Vocabulary Recognition. The Journal of digital policy and management. Vol. 11, No. 10, pp. 453-458, 2013.
  5. A. Srinivasan, Speech Recognition Using Hidden Markov Model, Applied Mathematical Sciences, vol. 5, no. 79, pp. 3943-3948, 2011.
  6. Campbell, W. M., Sturim, D. E., Reynolds, D. A., Solomonoff, A. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. Proc. ICASSP, No. 1, pp. 97-100, 2006.
  7. Zhang, Y., Xu, J., Yan, Z. J., & Huo, Q. An i-vector based approach to training data clustering for improved speech recognition. Proc. Interspeech, pp. 1247-1250. 2011.
  8. Beaufays, F., Vanhoucke, V., & Strope, B. Unsupervised discovery and training of maximally dissimilar cluster models. Proc. Interspeech, pp. 66-69, 2010.
  9. Sang-Yeob Oh. Improving Phoneme Recognition based on Gaussian Model using Bhattacharyya Distance Measurement Method. Journal of Korea Multimedia Society. Vol. 14, No. 1, pp. 85-93, 2011. https://doi.org/10.9717/kmms.2011.14.1.085
  10. Caban, A. Dolinska, B. Budzinski, G. Oczkowicz, G. Ostrozka-Cieslik, A. Cierpka, L. Ryszka, F. The Effect of HTK Solution Modification by Addition of Thyrotropin and Corticotropin on Biochemical Indices Reflecting Ischemic Damage to Porcine Kidney. Transplantation proceedings. Vol. 45, No. 5, pp. 1720-1722, 2013 https://doi.org/10.1016/j.transproceed.2013.01.094
  11. Myoung-hwan Ahn, Joon-hee Kwon. Ontology based Context-Aware Recommendation System using Concept Hierarchy. Journal of Korean Society for Internet Information. Vol. 8, No. 5, pp. 81-89, 2007.
  12. Chan-Shik Ahn, Sang-Yeob Oh. Vocabulary Recognition Retrieval Optimized System using MLHF Model. Journal of the Korea Society of Computer and Information. Vol. 14, No. 10, pp. 217-223, 2009.
  13. Sang-Yeob Oh. Noise Removal using a Convergence of the posteriori probability of the Bayesian techniques vocabulary recognition model to solve the problems of the prior probability based on HMM, The Journal of digital policy and management. Vol. 13, No. 8 pp. 295-300, 2015
  14. Sang-Yeob Oh. Bayesian Method Improve Recognition Rates using HMM Vocabulary Recognition Model Optimization. The Journal of digital policy and management. Vol. 12, No. 7, pp. 273-278, 2014.
  15. Sang-Yeob Oh. Decision Tree State Tying Modeling Using Parameter Estimation of Bayesian Method The Journal of Digital Policy and Management. Vol. 13, No. 1, pp. 1243-248, 2015.
  16. C.-C. Wang, C.-A. Pan, and J.-W. Hung, "Silence Feature Normalization for Robust Speech Recognition in Additive Noise Environments," Proc. ICSLP, pp. 1028-1031, 2008.