Dimension-Reduced Audio Spectrum Projection Features for Classifying Video Sound Clips

  • Published : 2006.09.15

Abstract

For audio indexing and targeted search of specific audio or corresponding visual contents, the MPEG-7 standard has adopted a sound classification framework, in which dimension-reduced Audio Spectrum Projection (ASP) features are used to train continuous hidden Markov models (HMMs) for classification of various sounds. The MPEG-7 employs Principal Component Analysis (PCA) or Independent Component Analysis (ICA) for the dimensional reduction. Other well-established techniques include Non-negative Matrix Factorization (NMF), Linear Discriminant Analysis (LDA) and Discrete Cosine Transformation (DCT). In this paper we compare the performance of different dimensional reduction methods with Gaussian mixture models (GMMs) and HMMs in the classifying video sound clips.

Keywords

References

  1. B. S. Manjunath, P Salembier and T. Sikora, Introduction to MPEG-7, (Wiley 2002)
  2. H.-G. Kim, N. Moreau, T. Sikora, MPEG-7 Audio and beyond, (Wiley 2005)
  3. L. Rabiner and B.-H. Juang, Fundamentals of speech recognition, (Prentice Hall, N.J. 1993)
  4. I. T. Jolliffe, Principal component analysis, (Springer-Verlag 1996)
  5. A. Hyvarinen, E, Oja, 'Independent component analysis: algorithms and applications,' Neural Networks, vol.. 13, 411-430 (2000) https://doi.org/10.1016/S0893-6080(00)00026-5
  6. D.D. Lee and H.S. Seung, 'Algorithms for non-negative matrix factorization' Adv. Neural Info. Proc. Syst. 13, 556-562 (2001)
  7. N. Marhav and C.-H. Lee, 'On the asymptotic statistical behavior of empirical ceptsral coefficients' in IEEE Transactions on Signal Processing 41, 1990-1993 (1993) https://doi.org/10.1109/78.215323
  8. R. Ouda, Pattern classification, (John Wiley 2001)