DOI QR코드

DOI QR Code

가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상

Vector Quantization based Speech Recognition Performance Improvement using Maximum Log Likelihood in Gaussian Distribution

  • 정경용 (경기대학교 컴퓨터공학부) ;
  • 오상엽 (가천대학교 컴퓨터공학과)
  • Chung, Kyungyong (Division of Computer Science and Engineering, Kyonggi University) ;
  • Oh, SangYeob (Division of Computer Engineering, Gachon University)
  • 투고 : 2018.10.10
  • 심사 : 2018.11.20
  • 발행 : 2018.11.28

초록

정확한 인식률을 보이고 있는 상업적인 음성인식 시스템은 화자종속 고립데이터로부터 학습 모델을 사용한다. 그러나 잡음 환경에서 데이터양에 따라 음성인식의 성능이 저하되는 문제점이 있다. 본 논문에서는 가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상을 제안한다. 제안하는 방법은 음성에 대한 특징을 가지고 벡터 양자화와 Maximum Log Likelihood 음성 특징 추출 방법을 이용하여 유사 음성에 대한 음성 인식의 정확성을 높이는 최적 학습 모델 구성 방법이다. 이를 위해 HMM을 기반으로 음성 특징을 추출하는 방법을 사용한다. 제안하는 방법을 사용하여 기존 시스템에서 생성되어 사용되는 음성 모델에 대한 부정확한 음성 모델에 대한 정확성을 향상시킬 수 있으므로 음성 인식에 강인한 모델을 구성할 수 있다. 제안하는 방법은 음성 인식 시스템에서 향상된 인식의 정확도를 보인다.

Commercialized speech recognition systems that have an accuracy recognition rates are used a learning model from a type of speaker dependent isolated data. However, it has a problem that shows a decrease in the speech recognition performance according to the quantity of data in noise environments. In this paper, we proposed the vector quantization based speech recognition performance improvement using maximum log likelihood in Gaussian distribution. The proposed method is the best learning model configuration method for increasing the accuracy of speech recognition for similar speech using the vector quantization and Maximum Log Likelihood with speech characteristic extraction method. It is used a method of extracting a speech feature based on the hidden markov model. It can improve the accuracy of inaccurate speech model for speech models been produced at the existing system with the use of the proposed system may constitute a robust model for speech recognition. The proposed method shows the improved recognition accuracy in a speech recognition system.

키워드

DJTJBT_2018_v16n11_335_f0001.png 이미지

Fig. 1. 3-State of HMM

Table 1. Non-noise environment recognition rate

DJTJBT_2018_v16n11_335_t0001.png 이미지

Table 2. Noise environment recognition rate

DJTJBT_2018_v16n11_335_t0002.png 이미지

참고문헌

  1. C. S. Ahn & S. Y. Oh. (2012). Gaussian model optimization using configuration thread control In CHMM vocabulary recognition. Journal of Digital Policy and Management, 10(7), 167-172.
  2. C. S. Ahn & S. Y. Oh. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. Journal of Digital Policy and Management, 10(10), 277-282.
  3. C. S. Ahn & S. Y. Oh. (2010). Vocabulary recognition post-processing system using phoneme similarity error correction. Journal of the Korea Society of Computer and Information. 15(7), 83-90. https://doi.org/10.9708/jksci.2010.15.7.083
  4. W. Kim & H. T. Chou. (1988). Versions of schema for object-oriented databases. In Proc. of 14th International Conference on Very Large Data Base. 148-159.
  5. C. S. Ahn & S. Y. Oh. (2010). Phoneme similarity error correction system using bhattacharyya distance measurement method. Journal of the Korea Society of Computer and Information, 15(6), 73-80. https://doi.org/10.9708/jksci.2010.15.6.073
  6. M. F. Gales. (1995). Model-based techniques for nosie robust speech recognition, Ph. D. dissertation, University of Cambridge.
  7. W. Reichl & W. Chou. (1998). Decision tree state tying based on segmental clustering for acoustic modeling. Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing. 801-804.
  8. A. S. Manos & V. W. Zue. (1996). A study on out-of-vocabulary word modeling for a segment-based keyword spotting system. Master Thesis, MIT.
  9. R. Agrawal, S. J. Buroff, N. H. Gehani & D. Shasha. (1991). Object versioning in ode. In Proc. of 7th International Conference on Data Engineering, 446-455.
  10. T. Jitsuhiro, S. Takatoshi & K. Aikawa. (1998). Rejection of out-of-vocabulary words using phoneme confidence likelihood. In Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing, 217-220.
  11. K. E. Gorlen. (1987). An object-oriented class library for C++ program. software-practice and experience, 17(12), 899-922. https://doi.org/10.1002/spe.4380171204
  12. S. Young, D. Kershaw, J. Odell, D. Ollason, Valtcher & P. Woodland. (2002). The HTK Book, Cambridge University Engineering Department.
  13. K. Chung & S. Y. Oh. (2016). Voice activity detection using improvement unvoiced feature normalization process in noisy environment. Wireless Personal Communications, 89(3), 747-759. https://doi.org/10.1007/s11277-015-3169-5
  14. S. Y. Oh & K. Chung. (2014). Target speech feature extraction using non-parametric correlation coefficient. Cluster Computing, 17(3), 893-899. https://doi.org/10.1007/s10586-013-0284-5
  15. S. Y. Oh & K. Chung. (2014). Improvement of speech detection using ERB feature extraction. Wireless Personal Communications, 79(4), 2439-2451. https://doi.org/10.1007/s11277-014-1752-9
  16. K. Chung & S. Y. Oh. (2016). Vocabulary optimization process using similar phoneme recognition and feature extraction. Cluster Computing, 19(3), 1683-1690. https://doi.org/10.1007/s10586-016-0619-0
  17. K. Chung & S. Y. Oh. (2015). Improvement of speech signal extraction method using detection filter of energy spectrum entropy. Cluster Computing, 18(2), 629-635. https://doi.org/10.1007/s10586-015-0429-9
  18. C. S. Ahn & S. Y. Oh. (2012). CHMM modeling using LMS algorithm for continuous speech recognition improvement. Journal of Digital Policy and Management. 10(11), 377-382.
  19. H. Yoo & K. Chung. (2018). Mining-based lifecare recommendation using peer-to-peer dataset and adaptive decision feedback. Peer-to-Peer Networking and Applications, 11(6), 1309-1320. https://doi.org/10.1007/s12083-017-0620-2
  20. J. C. Kim & K. Chung. (2018). Mining health-risk factors using PHR similarity in a hybrid P2P network. Peer-to-Peer Networking and Applications, 11(6), 1278-1287. https://doi.org/10.1007/s12083-018-0631-7
  21. S. Y. Oh & K. Chung. (2018). Performance evaluation of silence-feature normalization model using cepstrum features of noise signals. Wireless Personal Communications, 98(4), 3287-3297. https://doi.org/10.1007/s11277-017-4645-x