DOI QR코드

DOI QR Code

Speaker Independent Recognition Algorithm based on Parameter Extraction by MFCC applied Wiener Filter Method

위너필터법이 적용된 MFCC의 파라미터 추출에 기초한 화자독립 인식알고리즘

  • Choi, Jae-Seung (Division of Smart Electrical and Electronic Engineering, Silla University)
  • Received : 2017.02.01
  • Accepted : 2017.02.20
  • Published : 2017.06.30

Abstract

To obtain good recognition performance of speech recognition system under background noise, it is very important to select appropriate feature parameters of speech. The feature parameter used in this paper is Mel frequency cepstral coefficient (MFCC) with the human auditory characteristics applied to Wiener filter method. That is, the feature parameter proposed in this paper is a new method to extract the parameter of clean speech signal after removing background noise. The proposed method implements the speaker recognition by inputting the proposed modified MFCC feature parameter into a multi-layer perceptron network. In this experiments, the speaker independent recognition experiments were performed using the MFCC feature parameter of the 14th order. The average recognition rates of the speaker independent in the case of the noisy speech added white noise are 94.48%, which is an effective result. Comparing the proposed method with the existing methods, the performance of the proposed speaker recognition is improved by using the modified MFCC feature parameter.

배경잡음 하에서 음성인식 시스템의 우수한 인식성능을 얻기 위해서 적절한 음성의 특징 파라미터를 선택하는 것이 매우 중요하다. 본 논문에서 사용한 특징 파라미터는 위너필터 방법이 적용된 인간의 청각 특성을 이용한 멜 주파수 켑스트럼 계수(Mel frequency cepstral coefficient, MFCC)를 사용한다. 즉, 본 논문에서 제안하는 특징 파라미터는 배경잡음을 제거한 후에 깨끗한 음성신호의 파라미터를 추출하는 새로운 방법이다. 제안한 수정된 MFCC 특징 파라미터를 다층 퍼셉트론 네트워크에 입력하여 학습시킴으로써 화자인식을 구현한다. 본 실험에서는 14차의 MFCC 특징 파라미터를 사용하여 화자독립 인식실험을 실시하였으며, 백색잡음이 혼합된 경우의 음성의 화자독립인식률은 평균 94.48%로 효과적인 결과를 구할 수 있었다. 본 논문에서 제안한 방법과 기존의 방법들을 비교하였을 때 본 논문에서 제안한 화자인식 성능이 수정된 MFCC 특징 파라미터를 사용함으로써 향상되었다.

Keywords

References

  1. L. R. Gottlieb and G. Friedland, "On the Use of Artificial Conversation Data for Speaker Recognition in Cars," IEEE International Conference on Semantic Computing, pp. 124-128, Sept. 2009.
  2. P. Day and A. K. Nandi, "Robust Text-Independent Speaker Verification Using Genetic Programming," IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 1, pp. 285-295, January 2007. https://doi.org/10.1109/TASL.2006.876765
  3. P. Song, Y. Jin, C. Zha and L. Zhao, "Speech emotion recognition method based on hidden factor analysis," Electronics Letters, vol. 51, no. 1, pp. 112-114, Jan. 2015. https://doi.org/10.1049/el.2014.3339
  4. T. Yamada, M. Kumakura and N. Kitawaki, "Performance Estimation of Speech Recognition System Under Noise Conditions Using Objective Quality Measures and Artificial Voice," IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 6, pp. 2006-2013, October 2006. https://doi.org/10.1109/TASL.2006.883254
  5. J. L. Carmona, J. Barker, A. M. Gomez and Ning Ma, "Speech Spectral Envelope Enhancement by HMM-Based Analysis/Resynthesis," IEEE Signal Processing Letters, vol. 20, no. 6, pp. 563-566, June 2013. https://doi.org/10.1109/LSP.2013.2255125
  6. J. Chen, J. Benesty, Y. Huang and S. Doclo, "New insights into the noise reduction Wiener filter," IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1218-1234, July 2006. https://doi.org/10.1109/TSA.2005.860851
  7. M. Krawczyk-Becker and T. Gerkmann, "On MMSE-Based Estimation of Amplitude and Complex Speech Spectral Coefficients Under Phase-Uncertainty," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 12, pp. 2251-2262, December 2016. https://doi.org/10.1109/TASLP.2016.2602549
  8. H. K. Kim, S. H. Choi and H. S. Lee, "On approximating line spectral frequencies to LPC cepstral coefficients," IEEE Transactions on Speech and Audio Processing, vol. 8, no. 2, pp. 195-199, March 2000. https://doi.org/10.1109/89.824705
  9. W. W. Hung and H. C. Wang, "On the use of weighted filter bank analysis for the derivation of robust MFCCs," IEEE Signal Processing Letters, vol. 8, no. 3, pp. 70-73, Mar. 2001. https://doi.org/10.1109/97.905943
  10. K. V. Veena and M. Dominic, "Speaker Identification and Verification of Noisy Speech Using Multitaper MFCC and Gaussian Models," IEEE International Conference on Power, Instrumentation, Control and Computing, pp. 1-4, Dec. 2015.
  11. M. Holmberg, D. Gelbart and W. Hemmert, "Automatic speech recognition with an adaptation model motivated by auditory processing," IEEE Trans. on Audio, Speech, and Language Processing, vol. 14, no. 1, pp. 43-49, Jan. 2006. https://doi.org/10.1109/TSA.2005.860349
  12. S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Processing, vol.27, no.2, pp. 113-120, April 1979. https://doi.org/10.1109/TASSP.1979.1163209
  13. S. K. Pal and S. Mitra, "Multilayer perceptron, fuzzy sets, and classification," IEEE Transaction on Neural Networks, vol. 3, no. 5, pp. 683-697, Sep. 1992. https://doi.org/10.1109/72.159058
  14. A. Kurematsu, K.Takeda, Y. Sagisaka, S. Katagiri, H. Kuwabara, and K. Shikano, "ATR Japanese speech database as a tool of speech recognition and synthesis," Speech Communication, vol. 9, pp.357-363, 1990. https://doi.org/10.1016/0167-6393(90)90011-W
  15. D. Rumelhart, G. Hinton and R. Williams, "Learning representations by back-propagation errors," Nature, vol. 323, pp. 533-536, Oct. 1986. https://doi.org/10.1038/323533a0