DOI QR코드

DOI QR Code

Vector Quantization based Speech Recognition Performance Improvement using Maximum Log Likelihood in Gaussian Distribution

가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상

  • Chung, Kyungyong (Division of Computer Science and Engineering, Kyonggi University) ;
  • Oh, SangYeob (Division of Computer Engineering, Gachon University)
  • 정경용 (경기대학교 컴퓨터공학부) ;
  • 오상엽 (가천대학교 컴퓨터공학과)
  • Received : 2018.10.10
  • Accepted : 2018.11.20
  • Published : 2018.11.28

Abstract

Commercialized speech recognition systems that have an accuracy recognition rates are used a learning model from a type of speaker dependent isolated data. However, it has a problem that shows a decrease in the speech recognition performance according to the quantity of data in noise environments. In this paper, we proposed the vector quantization based speech recognition performance improvement using maximum log likelihood in Gaussian distribution. The proposed method is the best learning model configuration method for increasing the accuracy of speech recognition for similar speech using the vector quantization and Maximum Log Likelihood with speech characteristic extraction method. It is used a method of extracting a speech feature based on the hidden markov model. It can improve the accuracy of inaccurate speech model for speech models been produced at the existing system with the use of the proposed system may constitute a robust model for speech recognition. The proposed method shows the improved recognition accuracy in a speech recognition system.

Keywords

Speech Recognition;HMM;Feature Extraction;Speech Model;Gaussian Distribution

DJTJBT_2018_v16n11_335_f0001.png 이미지

Fig. 1. 3-State of HMM

Table 1. Non-noise environment recognition rate

DJTJBT_2018_v16n11_335_t0001.png 이미지

Table 2. Noise environment recognition rate

DJTJBT_2018_v16n11_335_t0002.png 이미지

Acknowledgement

Supported by : IITP(Institute for Information & communications Technology Promotion)

References

  1. C. S. Ahn & S. Y. Oh. (2012). Gaussian model optimization using configuration thread control In CHMM vocabulary recognition. Journal of Digital Policy and Management, 10(7), 167-172.
  2. C. S. Ahn & S. Y. Oh. (2012). Echo noise robust HMM learning model using average estimator LMS algorithm. Journal of Digital Policy and Management, 10(10), 277-282.
  3. C. S. Ahn & S. Y. Oh. (2010). Vocabulary recognition post-processing system using phoneme similarity error correction. Journal of the Korea Society of Computer and Information. 15(7), 83-90. https://doi.org/10.9708/jksci.2010.15.7.083
  4. W. Kim & H. T. Chou. (1988). Versions of schema for object-oriented databases. In Proc. of 14th International Conference on Very Large Data Base. 148-159.
  5. C. S. Ahn & S. Y. Oh. (2010). Phoneme similarity error correction system using bhattacharyya distance measurement method. Journal of the Korea Society of Computer and Information, 15(6), 73-80. https://doi.org/10.9708/jksci.2010.15.6.073
  6. M. F. Gales. (1995). Model-based techniques for nosie robust speech recognition, Ph. D. dissertation, University of Cambridge.
  7. W. Reichl & W. Chou. (1998). Decision tree state tying based on segmental clustering for acoustic modeling. Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing. 801-804.
  8. A. S. Manos & V. W. Zue. (1996). A study on out-of-vocabulary word modeling for a segment-based keyword spotting system. Master Thesis, MIT.
  9. R. Agrawal, S. J. Buroff, N. H. Gehani & D. Shasha. (1991). Object versioning in ode. In Proc. of 7th International Conference on Data Engineering, 446-455.
  10. T. Jitsuhiro, S. Takatoshi & K. Aikawa. (1998). Rejection of out-of-vocabulary words using phoneme confidence likelihood. In Proc of the IEEE International Conference on Acoustics, Speech and Signal Processing, 217-220.
  11. K. E. Gorlen. (1987). An object-oriented class library for C++ program. software-practice and experience, 17(12), 899-922. https://doi.org/10.1002/spe.4380171204
  12. S. Young, D. Kershaw, J. Odell, D. Ollason, Valtcher & P. Woodland. (2002). The HTK Book, Cambridge University Engineering Department.
  13. K. Chung & S. Y. Oh. (2016). Voice activity detection using improvement unvoiced feature normalization process in noisy environment. Wireless Personal Communications, 89(3), 747-759. https://doi.org/10.1007/s11277-015-3169-5
  14. S. Y. Oh & K. Chung. (2014). Target speech feature extraction using non-parametric correlation coefficient. Cluster Computing, 17(3), 893-899. https://doi.org/10.1007/s10586-013-0284-5
  15. S. Y. Oh & K. Chung. (2014). Improvement of speech detection using ERB feature extraction. Wireless Personal Communications, 79(4), 2439-2451. https://doi.org/10.1007/s11277-014-1752-9
  16. K. Chung & S. Y. Oh. (2016). Vocabulary optimization process using similar phoneme recognition and feature extraction. Cluster Computing, 19(3), 1683-1690. https://doi.org/10.1007/s10586-016-0619-0
  17. K. Chung & S. Y. Oh. (2015). Improvement of speech signal extraction method using detection filter of energy spectrum entropy. Cluster Computing, 18(2), 629-635. https://doi.org/10.1007/s10586-015-0429-9
  18. C. S. Ahn & S. Y. Oh. (2012). CHMM modeling using LMS algorithm for continuous speech recognition improvement. Journal of Digital Policy and Management. 10(11), 377-382.
  19. H. Yoo & K. Chung. (2018). Mining-based lifecare recommendation using peer-to-peer dataset and adaptive decision feedback. Peer-to-Peer Networking and Applications, 11(6), 1309-1320. https://doi.org/10.1007/s12083-017-0620-2
  20. J. C. Kim & K. Chung. (2018). Mining health-risk factors using PHR similarity in a hybrid P2P network. Peer-to-Peer Networking and Applications, 11(6), 1278-1287. https://doi.org/10.1007/s12083-018-0631-7
  21. S. Y. Oh & K. Chung. (2018). Performance evaluation of silence-feature normalization model using cepstrum features of noise signals. Wireless Personal Communications, 98(4), 3287-3297. https://doi.org/10.1007/s11277-017-4645-x