Fine-tuning SVM for Enhancing Speech/Music Classification

SVM의 미세조정을 통한 음성/음악 분류 성능향상

  • 임정수 (인하대학교 전자공학부) ;
  • 송지현 (인하대학교 전자공학부) ;
  • 장준혁 (한양대학교 융합전자공학부)
  • Received : 2010.08.02
  • Accepted : 2010.12.02
  • Published : 2011.03.25

Abstract

Support vector machines have been extensively studied and utilized in pattern recognition area for years. One of interesting applications of this technique is music/speech classification for a standardized codec such as 3GPP2 selectable mode vocoder. In this paper, we propose a novel approach that improves the speech/music classification of support vector machines. While conventional support vector machine optimization techniques apply during training phase, the proposed technique can be adopted in classification phase. In this regard, the proposed approach can be developed and employed in parallel with conventional optimizations, resulting in synergistic boost in classification performance. We first analyze the impact of kernel width parameter on the classifications made by support vector machines. From this analysis, we observe that we can fine-tune outputs of support vector machines with the kernel width parameter. To make the most of this capability, we identify strong correlation among neighboring input frames, and use this correlation information as a guide to adjusting kernel width parameter. According to the experimental results, the proposed algorithm is found to have potential for improving the performance of support vector machines.

Support vector machine (SVM)은 패턴인식 분야에 많이 사용되어지고 있다. 한 예로서 3GPP2 selectable mode vocoder (SMV)와 같은 규격화된 코덱에 쓰여 코덱의 음성/음악 분류 성능을 향상시킬 수 있다. 본 논문에서는 SVM을 개선시켜 음성/음악의 분류성능을 향상시키는 새로운 방법을 제안한다. SVM을 학습시킬 때 적용되는 기존의 기법들과는 달리 제안되는 기법은 SVM이 패턴분류를 행할 때 사용된다. 그렇기 때문에 기존의 기법들과 독립적으로 개발되고 사용될 수 있고, 따라서 패턴분류의 성능을 한층 더 향상시킬 수 있다. 이를 위해 먼저 radial basis function의 커널 width 파라미터가 SVM의 패턴분류에 미치는 영향을 분석해 보았다. 분석한 결과, 커널 width 파라미터를 가지고 SVM의 패턴분류 성향을 미세 조정할 수 있다는 것을 알았다. 또한 음성신호의 각 프레임 간의 상관관계 (correlation)을 확인하고 이를 커널 width 파라미터조절의 길잡이로 삼았다. 실험을 통해, 제안된 기법이 SVM의 성능을 향상시킬 수 있음을 증명하였다.

Keywords

References

  1. 3GPP2 Spec., "Source-controlled variable-rate multimedia wideband speech codec (VMR-WB), service option 62 and 63 for spread spectrum systems," 3GPP2-C.S0052-A, vol. 1.0, April. 2005.
  2. Y. Gao, E. Shlomot, A. Benyassine, J. Hyssen, Huan-yu Su, and C. Murgia, "The SMV algorithm selected by TIA and 3GPP2 for CDMA appications," in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 709-712, May 2001.
  3. S. -K. Kim and J. -H. Chang, "Speech/music classification enhancement for 3GPP2 SMV codec based on support vector machine," IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences, Vol. E92-A, no. 2, February 2009.
  4. X. Wang, J. Chen, P Wang, Z. Huang, "Infrared human face auto locating based on SVM and a smart thermal biometrics system," in Proc. Sixth International Conference on Intelligent Systems Design and Applications (ISDA'06) , vol. 2, pp. 1066-1072, October 2006.
  5. A. Ganapathiraju, J. E. Hamaker, J. Picone, "Applications of support vector machines to speech recognition," IEEE Trans. Signal Processing, vol. 52, pp. 2348-2355, August 2004. https://doi.org/10.1109/TSP.2004.831018
  6. L. -P. Bi, H. Huang, Z. -Y. Zheng, and H. -T. Song, "New heuristic for determination Gaussian kernel's parameter," in Proc. International Conference on Machine Learning and Cybernetics, vol. 7, pp. 4299-4304, August 2005.
  7. S. S. Keerthi, C. -J. Lin, "Asymptotic behaviors of support vector machines with Gaussian kernel," Neural Computation, vol. 15, pp. 1667-1689, 2003. https://doi.org/10.1162/089976603321891855
  8. J. Tian and L. Zhao, "Weighted Gaussian kernel with multiple widths and network kernel pattern," in Proc. International Symposium on Information Engineering and Electronic Commerce, pp. 379-382, May 2009.
  9. N. E. Ayat, M. Cheriet, and C. Y. Suen, "Automatic model selection for the optimization of SVM kernel," Pattern Recognition, vol. 38, pp. 1733-1745, October 2005. https://doi.org/10.1016/j.patcog.2005.03.011
  10. S. -K. Kim and J. -H. Chang, "Discriminative weight training for support vector machine-based speech/music classification in 3GPP2 SMV codec," IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences, vol. E93-A, no. 1, pp. 316-319, January 2010. https://doi.org/10.1587/transfun.E93.A.316
  11. S. C. Greer, and A. Dejaco, "Standardization of the selectable mode vocoder," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2, pp. 953-956, May 2001.
  12. C. V. Goudar, P. Rabha, M. Deshpande, and A. Rao, "SMVLite: reduced complexity selectable mode vocoder," in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 701-704, May 2006.
  13. W. M. Fisher, G. R. Doddington and K. M. Goudie-Marshall, "The DARPA speech recognition research database: Specifications and status," in Proc. DARPA Workshop Speech Recognition, pp. 93-99, February 1986.