DOI QR코드

DOI QR Code

A Design of Matching Engine for a Practical Query-by-Singing/Humming System with Polyphonic Recordings

  • Lee, Seok-Pil (Department of Digital Media Technology, Sangmyung University) ;
  • Yoo, Hoon (Department of Digital Media Technology, Sangmyung University) ;
  • Jang, Dalwon (Korea Electronic Technology Institute)
  • Received : 2014.12.20
  • Accepted : 2014.01.21
  • Published : 2014.02.27

Abstract

This paper proposes a matching engine for a query-by-singing/humming (QbSH) system with polyphonic music files like MP3 files. The pitch sequences extracted from polyphonic recordings may be distorted. So we use chroma-scale representation, pre-processing, compensation, and asymmetric dynamic time warping to reduce the influence of the distortions. From the experiment with 28 hour music DB, the performance of our QbSH system based on polyphonic database is very promising in comparison with the published QbSH system based on monophonic database. It shows 0.725 in MRR(Mean Reciprocal Rank). Our matching engine can be used for the QbSH system based on MIDI DB also and that performance was verified by MIREX 2011.

Keywords

References

  1. Nicola Orio, "Music Retrieval: A Tutorial and Review," Foundations and Trends in Information Retrieval, vol. 1, no 1, 1-90, 2006. https://doi.org/10.1561/1500000002
  2. J. Stephen Downie, "The Music Information Retrieval Evaluation eXchange (MIREX) Next Generation Project," project prospectus, 2011.
  3. R. Typke, F. Wiering and R. C. Veltkamp, "A survey of music information retrieval systems," in Proc. of ISMIR, pp.153-160, 2005.
  4. G. Tzanetakis, G. Essl and P. Cook, "Automatic musical genre classification of audio signals," in Proc. of Int. Conf. Music Information Retrieval, Bloomington, IN, pp. 205-210, 2001.
  5. D. Jang, M. Jin and C. D. Yoo, "Music genre classification using novel features and a weighted voting method," in Proc. of ICME, 2008.
  6. R. Typke, P. Giannopoulos, R. C. Veltkamp, F. Wiering and R. V. Oostrum, "Using transportation distances for measuring melodic similarity," in Proc. of Int. Conf. Music Information Retrieval, pp. 107-114, 2003.
  7. G. Poliner, D. Ellis, A. Ehmann, E. Gomez, S. Streich and B. Ong, "Melody transcription from music audio: Approaches and evaluation," IEEE Trans. on Audio, Speech, Language Processing, vol. 15, no. 4, pp. 1247-1256, 2007. https://doi.org/10.1109/TASL.2006.889797
  8. S. Jo and C. D. Yoo, "Melody extraction from polyphonic audio based on particle filter," in Proc. of ISMIR, 2010.
  9. D. P.W. Ellis and G. E. Poliner, "Identifying cover songs ith chroma features and dynamic programming beat racking," in Proc. of Int. Conf. Acoustic, Speech and Signal processing, Honolulu, HI, 2007.
  10. J. -S. R. Jang and H.-R. Lee, "A general framework of progressive filtering and its application to query by singing/humming," IEEE Trans. on Audio, Speech, and language Processing, vol. 16, no. 2, pp. 350-358, 2008 . https://doi.org/10.1109/TASL.2007.913035
  11. J. S. Seo, M. Jin, S. Lee, D. Jang, S. Lee and C. D. Yoo, "Audio fingerprinting based on normalized spectral subband moments", IEEE Signal Processing letters, vol. 13, issue 4, pp. 209-212, 2006. https://doi.org/10.1109/LSP.2005.863678
  12. D. Jang, C. D. Yoo, S. Lee, S. Kim and T. Kalker, "Pairwise Boosted Audio Fingerprint," IEEE Trans. on Information Forensics and Security, vol. 4, no. 4, pp. 995-1004, 2009. https://doi.org/10.1109/TIFS.2009.2034452
  13. Y. Liu, K. Cho, H. S. Yun, J. W. Shin and N. S. Kim, "DCT based multiple hashing technique for robust audio finger printing," in Proc. of ICCASP, 2009.
  14. P. Cano, E. Batlle, T. Lalker and J. Haitsma, "A review of audio fingerprinting," Journal of VLSI signal processing, vol. 41, no. 3, pp. 271-284, 2005. https://doi.org/10.1007/s11265-005-4151-3
  15. W. Son, H-T. Cho, K. Yoon and S-P Lee, "Sub-fingerprint masking for a robust audio fingerprinting system in a real-noise environment for portable consumer devices," IEEE Trans. on Consumer Electronics, vol. 56, no. 1, pp. 156-160, 2010. https://doi.org/10.1109/TCE.2010.5439139
  16. A. Ghias, J Logan and D Chamberlin, "Query by humming: musical information retrieval in an audio database", In Proc. of ACM Multimedia, pp. 231-236, 1995.
  17. L. Wang, S. Huang, S. Hu, J. Liang and B. Xu, "An effective and efficient method for query by humming system based on multi-similarity measurement fusion," in Proc. of ICALIP, 2008.
  18. H. M. Yu, W. H. Tsai and H. M. Wang, "A query-by-singing system for retrieving karaoke music," IEEE Trans. on multimedia, vol. 10, no. 8, pp. 1626-1637, 2008. https://doi.org/10.1109/TMM.2008.2007345
  19. M. Ryynanen and A. Klapuri, "Query by humming of MIDI and audio using locality sensitive hashing," in Proc. of ICASSP, 2008.
  20. X. Wu and M. Li, "A top down approach to melody match in pitch contour for query by humming," in Proc. of International Symposium of Chinese Spoken Language Processing, 2006.
  21. K. Kim, K. R. Park, S. J. Park, S. P. Lee and M. Y. Kim, "Robust Query-by-Singing/Humming System against Background Noise Environments," IEEE Trans. On Consumer Electronics, vol. 57, no. 2, pp. 720-725, May 2011. https://doi.org/10.1109/TCE.2011.5955213
  22. J. Song, S. Y. Bae and K. Yoon, "Mid-level music melody representation of polyphonic audio for query by humming system," in Proc. of Int. Conf. Music Information Retrieval, 2002.
  23. C. C. Wang, J-S. R. Jang and W. Wang, "An improved query by singing/humming system using melody and lyrics information", in Proc. of Int. Society for Music Information Retrieval Conf., pp. 45-50, 2010.
  24. A. P. Klapuri, "Multiple fundamental frequency estimation based on harmonicity and spectral smoothness," IEEE Trans. on Speech Audio Process., vol. 11, no. 6, pp. 804-816, 2003. https://doi.org/10.1109/TSA.2003.815516
  25. C. M. Bishop, Pattern recognition and machine learning, Springer, 2006.
  26. S. Schapire and Y. Singer, "Improoved boosting algorithms using confidence-rated predictions," Machine Learning, vol. 37, no. 3, pp. 297-336, 1999. https://doi.org/10.1023/A:1007614523901
  27. D. Jang, C. D. Yoo and T. Kalker, "Distance metric learning for content identification," IEEE Trans. on Information Forensics and Security, vol. 5, issue. 4, pp932-944, 2010. https://doi.org/10.1109/TIFS.2010.2064769
  28. I. Cohen, "Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging," IEEE Trans. on Speech and Audio Processing, vo. 11, pp. 466-475, 2003. https://doi.org/10.1109/TSA.2003.811544
  29. Y. D. Cho, M. Y. Kim and S. R. Kim, "A spectrally mixed excitation (SMX) vocoder with robust parameter determination," in Proc. of ICASSP, pp. 601-604, 1998.
  30. Z. Duan, Y. Zhang, C. Zhang and Z. Shi, "Unsupervised single-channel music source separation by average harmonic structure modeling," IEEE Trans. on Audio Speech Language Processing, vol. 16, no. 4, pp. 766-778, 2008. https://doi.org/10.1109/TASL.2008.919073
  31. MIREX website. http://www.musicir.org/mirex/wiki/MIREX HOME.
  32. D. Jang, S.-P. Lee, "Query by singing/humming system based on the combination of DTW distances for MIREX 2011," http://www.musicir.org/mirex/abstracts/2011/JSSLP1.pdf (2011).
  33. Essen associative code and folk database, http://www.esac-data.org.