DOI QR코드

DOI QR Code

Centroid-model based music similarity with alpha divergence

알파 다이버전스를 이용한 무게중심 모델 기반 음악 유사도

  • 서진수 (강릉원주대학교 전자공학과) ;
  • 김정현 (한국전자통신연구원 콘텐츠 연구본부) ;
  • 박지현 (한국전자통신연구원 콘텐츠 연구본부)
  • Received : 2015.08.20
  • Accepted : 2015.12.18
  • Published : 2016.03.31

Abstract

Music-similarity computation is crucial in developing music information retrieval systems for browsing and classification. This paper overviews the recently-proposed centroid-model based music retrieval method and applies the distributional similarity measures to the model for retrieval-performance evaluation. Probabilistic distance measures (also called divergence) compute the distance between two probability distributions in a certain sense. In this paper, we consider the alpha divergence in computing distance between two centroid models for music retrieval. The alpha divergence includes the widely-used Kullback-Leibler divergence and Bhattacharyya distance depending on the values of alpha. Experiments were conducted on both genre and singer datasets. We compare the music-retrieval performance of the distributional similarity with that of the vector distances. The experimental results show that the alpha divergence improves the performance of the centroid-model based music retrieval.

음악 유사도 계산은 음악 검색 및 분류 등의 정보 처리 시스템 구현에 있어서 가장 중요한 부분이다. 본 논문은 최근 제안된 무게중심 모델을 이용한 음악 검색 방법에 대해서 살펴보고, 무게중심 모델의 확률 분포 유사도를 이용하여 음악 검색을 수행하고 성능을 평가하였다. 확률 분포간의 거리는 주어진 두 개의 확률 분포가 특정 기준에서 얼마나 가까운 지를 계산하는 것으로 다이버전스라고 불리기도 한다. 본 논문에서는 무게중심 모델에서 확률 분포 간의 거리 비교 시에 알파 다이버전스를 활용하였다. 알파 다이버전스는 알파 값에 따라 다양한 형태를 가지며, 널리 사용되고 있는 KLD(Kullback-Leibler)와 BD(Bhattacharyya Distance)를 포함한다. 음악 장르와 가수 데이터셋에서 검색 실험을 수행했고, 확률 분포 거리 기반 유사도와 벡터 거리 기반 유사도의 음악 검색 성능을 비교하였다. 알파 다이버전스를 통해서 무게중심 모델 기반 음악 검색 성능을 개선시킬 수 있음을 보였다.

Keywords

References

  1. Z. Fu, G. Lu, K.M. Ting, and D. Zhang, "A survey of audio-based music classification and annotation," IEEE Trans. Multimedia 13, 303-319 (2011). https://doi.org/10.1109/TMM.2010.2098858
  2. M. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, "Content-based music information retrieval: current directions and future challenges," Proceedings of the IEEE 96, 668-696 (2008).
  3. J. Seo, "A robust audio fingerprinting method based on segmentation boundaries" (in Korean) J. Acoust. Soc. Kr. 31, 260-265 (2012). https://doi.org/10.7776/ASK.2012.31.4.260
  4. C. Park, M. Park, S. Kim, and H. Kim, "Music identification using pitch histogram and MFCC-VQ dynamic pattern" (in Korean), J. Acoust. Soc. Kr. 24, 178-185 (2005).
  5. G. Tzanetakis and P. Cook, "Musical genre classification of audio signals," IEEE Trans. Speech and Audio Process. 10, 293-302 (2002). https://doi.org/10.1109/TSA.2002.800560
  6. J. Seo, "A musical genre classification method based on the octave-band order statistics" (in Korean), J. Acoust. Soc. Kr. 33, 81-86 (2014). https://doi.org/10.7776/ASK.2014.33.1.081
  7. B. Logan and A. Salomon, "A music similarity function based on signal analysis," in Proc. ICME-2001, 745-748 (2001).
  8. C. Cao and M. Li, "Thinkit's submissions for MIREX2009 audio music classification and similarity tasks," in Mirex abstracts of ISMIR-2009, (2009).
  9. C. Charbuillet, D. Tardieu, and G. Peeters, "GMM supervector for content based music similarity," in Proc. DAFx-11, 425-428 (2011).
  10. J. Seo, "A music similarity function based on the centroid model," IEICE Trans. Information and Systems 96, 1573-1576 (2013).
  11. D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models," Digital Signal Processing 10, 19-41 (2000).
  12. W. M. Campbell, D. E. Sturim, and D. A. Reynolds, "Support vector machines using GMM supervectors for speaker verification," IEEE Signal Processing Letters 13, 308-311 (2006). https://doi.org/10.1109/LSP.2006.870086
  13. F. Liese and I. Vajda, "On divergences and informations in statistics and information theory," IEEE Trans. Information Theory 52, 4394-4412 (2006). https://doi.org/10.1109/TIT.2006.881731
  14. L. Pardo, "Statistical inference based on divergence measures," CRC press 2005.
  15. M. Gil, F. Alajaji, and T. Linder, "Renyi divergence measures for commonly used univariate continuous distributions," Information Sciences 249, 124-131 (2013). https://doi.org/10.1016/j.ins.2013.06.018
  16. A. Renyi, "On measures of entropy and information," in Proc. Berkeley Symp. Probability Theory and Mathematical Statist., 547-561 (1961).
  17. T. V. Erven and P. Harremoes, "Renyi divergence and Kullback-Leibler divergence," IEEE Trans. Information Theory 60, 3797-3820 (2014). https://doi.org/10.1109/TIT.2014.2320500
  18. V. Hautamaki, T. Kinnunen, I. Karkkainen, J. Saastamoinen, M. Tuononen, and P. Franti, "Maximum a posteriori adaptation of the centroid model for speaker verification," IEEE Signal Process. Letters 15 , 162-165 (2008). https://doi.org/10.1109/LSP.2007.914792
  19. J. Seo, "A speaker change detection method based on a weighted distance measure over the centroid model," IEICE Trans. Information and Systems 95, 1543-1546 (2012).
  20. F. Alajaji, P. N. Chen, and Z. Rached, "Csiszar's cutoff rates for the general hypothesis testing problem," IEEE Trans. Information Theory 50, 663-678 (2004). https://doi.org/10.1109/TIT.2004.825040
  21. P. Harremoes, "Interpretations of Renyi entropies and divergences," Physica A: Statistical Mechanics and its Applications 365, 57-62 (2006). https://doi.org/10.1016/j.physa.2006.01.012
  22. A. O. Hero, B. Ma, O. Michel, and J. Gorman, "Alpha-divergence for classification, indexing and retrieval," Tech. rep., University of Michigan, (2001).
  23. H. G. Kim, and D. Shin, "Speaker verification using SVM kernel with GMM-supervector based on the Mahalanobis distance" (in Korean), J. Acoust. Soc. Kr. 29, 216-221 (2010).
  24. J. Lee, "How similar is too similar?: Exploring users' perceptions of similarity in playlist evaluation," in Proc. ISMIR, 109-114 (2011).