DOI QR코드

DOI QR Code

Design and Implementation of Matching Engine for QbSH System Based on Polyphonic Music

다성음원 기반 QbSH 시스템을 위한 매칭엔진의 설계 및 구현

  • 박성주 (전자부품연구원 디지털미디어연구센터) ;
  • 정광수 (광운대학교 전자통신공학과)
  • Received : 2011.07.14
  • Accepted : 2011.11.22
  • Published : 2012.01.31

Abstract

This paper proposes a matching engine of query-by-singing/humming (QbSH) system which retrieves the most similar music information by comparing the input data with the extracted feature information from polyphonic music like MP3. The feature sequences transcribed from polyphonic music may have many errors. So, to reduce the influence of errors and improve the performance, the chroma-scale representation, compensation and asymmetric DTW (Dynamic Time Warping) are adopted in the matching engine. The performance of various distance metrics are also investigated in this paper. In our experiment, the proposed QbSH system achieves MRR (Mean Reciprocal Rank) of 0.718 for 1000 singing/humming queries when searching from a database of 450 polyphonic musics.

본 논문은 다성음원에서 추출된 특성정보 기반 QbSH (Query-by-Singing/ Humming) 시스템의 매칭엔진에 대해 제안하였다. 다성음원 기반 QbSH 시스템은 사람의 노래나 허밍에서 추출된 특성정보와 MP3 파일과 같은 다성음원에서 추출된 특성정보를 비교하여, 가장 유사한 음원을 검색하는 시스템이다. 제안된 매칭엔진에는 다성음원에서 특성 추출시 발생하는 오류를 줄이고, 매칭성능을 향상시키기 위해 크로마-스케일 표현기법 (Chroma-Scale Representation), 보상기법 (Compensation) 및 비대칭적 DTW (Asymmetric Dynamic Time Warping) 알고리즘을 적용하였다. 또한 다양한 거리 함수 (Distance Metric)를 적용하여 매칭엔진의 성능향상을 확인하였다. 1,000개의 허밍 질의와 450곡의 다성음원 데이터베이스를 기반으로 제안한 QbSH 시스템의 성능 실험을 수행하다. 성능 평가를 통해 제안한 QbSH 시스템이 MRR (Mean Reciprocal Rank) 0.718의 정확도를 가지는 것으로 확인되었다.

Keywords

References

  1. A. Ghias, J Logan, and D Chamberlin, "Query by Humming : Musical Information Retrieval in an Audio Database," Proc. ACM Int. Conf. on Multimedia, pp. 231-236, 1995.
  2. G. Tzanetakis, "Automatic Genre Classification of Audio Signals," IEEE Trans. on Speech and Audio Processing, Vol.10, No.5, pp. 293-302, 2001.
  3. D. Jang, M. Jin, and C. D. Yoo, "Music Genre Classification using Novel Features and a Weighted Voting Method," Proc. Int. Conf. on Multimedia and Expo, pp. 1377-1380, 2008.
  4. G. Poliner, D. Ellis, A. Ehmann, E. Gomez, S. Streich, and B. Ong, "Melody Transcription from Music Audio: Approaches and Evaluation," IEEE Trans. on Audio, Speech, Language Processing, Vol.15, No.4, pp. 1247-1256, 2007. https://doi.org/10.1109/TASL.2006.889797
  5. S. Jo and C. D. Yoo, "Melody Extraction from Polyphonic Audio Based on Particle Filter," Proc. Int. Symp. Music Information Retrieval, pp. 357-362, 2010.
  6. D. P. W. Ellis and G. E. Poliner, "Identifying Cover Songs with Chroma Features and Dynamic Programming Beat Tracking," Proc. Int. Conf. Acoustic, Speech and Signal Processing, Vol.4, pp. 1429-1432, 2007.
  7. S. W. Hainsworth and M. D. Macleod, "Particle Filtering Applied to Musical Tempo Tracking," EURASIP Journal on Applied Signal Processing, Vol.2004, Issue15, pp. 2385-2395, 2004. https://doi.org/10.1155/S1110865704408099
  8. J. S. Seo, M. Jin, S. Lee, D. Jang, S. Lee, and C. D. Yoo, "Audio Fingerprinting Based on Normalized Spectral Subband Moments," IEEE Signal Processing Letters, Vol.13, Issue4, pp. 209-212, 2006. https://doi.org/10.1109/LSP.2005.863678
  9. 허태관, 조황원, 남기표, 이재현, 이석필, 박성주, 박강령, "내용 기반 음원 검출 시스템 구현 에 관한 연구," 한국멀티미디어학회논문지, 제 12권, 제11호, pp. 1581-1592, 2009.
  10. H. Sakoe and S. Chiba, "Dynamic Programming Algorithm Optimization for Spoken Word Recognition," IEEE Trans. on Acoustics, Speech and Signal Processing, Vol.ASSP-26, No.1, pp. 43-49, 1978.
  11. A. Uitdenbogerd and J. Zobel, "Melodic Matching Techniques for Large Music Database," Proc. ACM Int. Conf. on Multimedia, pp. 57-66, 1999.
  12. Haus, G. and Pollstri. E, "An Audio Front end for Query-by-Humming Systems," Proc. Int. Symp. Music Information Retrieval , pp. 65-72, 2001.
  13. I. Cohen, "Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging," IEEE Trans. on Speech and Audio Processing, Vol.11, No.5, pp. 466-475, 2003. https://doi.org/10.1109/TSA.2003.811544
  14. 김기출, 박성주, 이석필, 김무영, "선형 보간법을 이용한 시간과 주파수 조합영역에서의 피치 추정 방법," 전자공학회논문지, 제47권, 제5호, pp. 100-108, 2010.
  15. 윤제열, 이석필, 서경학, 박호종, "하모닉 구조를 이용한 다성 음악의 주요 멜로디 검출," 전자공학회논문지, 제47권, 제5호, pp. 109-116, 2010.
  16. H. M. Yu, W. H. Tsai, and H. M. Wang, "A Query by-Singing System for Retrieving Karaoke Music," IEEE Trans. on Multimedia, Vol.10, No.8, pp. 1626-1637, 2008. https://doi.org/10.1109/TMM.2008.2007345
  17. J. S. R. Jang and H. R. Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming," IEEE Trans. on Audio, Speech, and Language Processing, Vol.16, No.2, pp. 350-358, 2008. https://doi.org/10.1109/TASL.2007.913035
  18. Y. Zhu and D. Shasha, "Warping Indexes with Envelope Transforms for Query by Humming," Proc. Int. Conf. on Management of Data, pp. 181-192, 2003.
  19. X. Nguyen, M. J. Wainwright, and M. I. Jordan, "On Divergences, Surrogate Loss Functions and Decentralized Detection Department of Statistics," Tech. Rep. 695, Dept of Statistics, Univ. of California at Berkeley, 2005.
  20. J. S. R. Jang, N. J. Lee, and C. L. Hsu, "Simple But Effective Methods for QbSH at MIREX 2006," Proc. Int. Symp. Music Information Retrieval , pp. 5-7, 2006.
  21. M. Ryynanen and A. Klapuri, "Query by Humming of MIDI and Audio using Locality Sensitive Hashing," Proc. Int. Conf. Acoustic, Speech and Signal Processing, pp. 2249-2252, 2008.
  22. A. Duda, A. N¨urnberger, and S. Stober, "Towards Query by Humming/Singing on Audio Databases," Proc. Int. Symp. Music Information Retrieval, pp. 331-334, 2007.