DOI QR코드

DOI QR Code

Speech/Mixed Content Signal Classification Based on GMM Using MFCC

MFCC를 이용한 GMM 기반의 음성/혼합 신호 분류

  • Kim, Ji-Eun (Department of Radio Engineering, ChungBuk University) ;
  • Lee, In-Sung (Department of Radio Engineering, ChungBuk University)
  • 김지은 (충북대학교 전파통신 공학과) ;
  • 이인성 (충북대학교 전파통신 공학과)
  • Received : 2012.08.24
  • Published : 2013.02.25

Abstract

In this paper, proposed to improve the performance of speech and mixed content signal classification using MFCC based on GMM probability model used for the MPEG USAC(Unified Speech and Audio Coding) standard. For effective pattern recognition, the Gaussian mixture model (GMM) probability model is used. For the optimal GMM parameter extraction, we use the expectation maximization (EM) algorithm. The proposed classification algorithm is divided into two significant parts. The first one extracts the optimal parameters for the GMM. The second distinguishes between speech and mixed content signals using MFCC feature parameters. The performance of the proposed classification algorithm shows better results compared to the conventionally implemented USAC scheme.

본 논문에서는 MFCC를 이용한 GMM 기반의 음성과 혼합 신호 분류 알고리즘을 MPEG의 표준 코덱인 USAC에 적용하였다. 효과적인 패턴 인식을 위해 GMM을 이용하였고, EM알고리즘을 사용하여 최적의 GMM 파라미터를 추출하였다. 제안하는 분류 알고리즘은 두 가지 중요한 부분으로 나뉜다. 첫째는 GMM을 통해 최적의 파라미터를 추출하는 것 이고, 두 번째는 MFCC 값을 이용한 패턴인식을 통해 음성/혼합 신호를 분류하였다. 제안된 알고리즘의 성능을 평가한 결과 MFCC를 이용한 GMM 기반의 제안된 방법이 기존 USAC의 방법보다 우수한 음성/혼합 신호 분류 성능을 보였다.

Keywords

References

  1. ISO/IEC SC29 WG11 N9519, Call for Proposals on Unified Speech and Audio Coding, 82nd MPEG Meeting, October, 2007.
  2. 송정욱, 오현오, 강홍구, "통합 음성/오디오 부호화를 위한 새로운 MPEG 참조 모델," 전자공학회논문지, 제47권 SP편, 제5호, 74-80쪽, 2010년 9월
  3. R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classificaion, Wiley-Interscience, 2001.
  4. N.Scaringella, G. Zoia, and D.G.Stork, Pattern Classification Wiley-interscience, 2001.
  5. J. Bergstra, N.Casagrande,D. Erhan, D. Eck, and B. Kegl, "Aggregate features and ADABOOST for music calssificatio." Machine Learning, vol. 65, no. 2, pp. 474-484, Dec. 2006.
  6. Martin F. Mcknney, Jeroen Breebaart, "Features for audio and music calssification" in Proc. lnt. Conf. on Music lnfo. Retrieval (ISMIR-03), 2003.
  7. K. West, S. Cox, "Features and classifiers for the automatic classification of musical audio signals," in Proc. lnt. Conf. on Music lnfo. Retrieval (ISMIR-08), 2004.
  8. Bernd Geiser et al, "Candidate Proposal for ITU-T Super-wideband Speech and Audio Coding", ICASSP, pp.4121-4124. 2009.
  9. M. Neuendorf, et al. ,"A novel scheme for low bitrate unified speech and audio coding-MPEG RM0," in Proceedings of the 126th AES Convention, Munich, Germany, May 2009.
  10. 원양희, 이형일, 강상원, "ARM Core(R)를 이용한 AMR-WB+오디오 부호화기의 실시간 구현," 전자공학회논문지, 제46권 제 3호, 119-124쪽, 2009년 5월
  11. B.Atal, "Automatic recognition of speakers from their voices" proc.IEEE vol.64 pp 460-475 apr.1976 https://doi.org/10.1109/PROC.1976.10155
  12. Thomas F. Quantieri, Discrete-Time Speech Signal Processing, Prentice Hall, 2001
  13. J. Makinen, B. Bessette, S. Bruhn, P. Ojala, R. Salami, and A.Taleb, "AMR-WB+: a new audio coding standard for 3RD generation mobile audioservices," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), vol. 2, pp. 1109-1112, March 2005.
  14. A.P.Dempster; N.M.Laird, et al.,"Maximum Likelihood from Incomplete Data via the EM Algorithm", Journal of the Royal Statistical Society. Series B (Methodological),Vol.39,No.1.
  15. ITU-T Recommendation (1996). "Methods for subjective determination of transmission quality", P.800, 08.

Cited by

  1. Detecting Prominent Content in Unstructured Audio using Intensity-based Attack/release Patterns vol.50, pp.12, 2013, https://doi.org/10.5573/ieek.2013.50.12.224