DOI QR코드

DOI QR Code

A Novel Query-by-Singing/Humming Method by Estimating Matching Positions Based on Multi-layered Perceptron

  • Pham, Tuyen Danh (Division of Electronics and Electrical Engineering, Dongguk University) ;
  • Nam, Gi Pyo (Division of Electronics and Electrical Engineering, Dongguk University) ;
  • Shin, Kwang Yong (Division of Electronics and Electrical Engineering, Dongguk University) ;
  • Park, Kang Ryoung (Division of Electronics and Electrical Engineering, Dongguk University)
  • Received : 2013.05.01
  • Accepted : 2013.07.09
  • Published : 2013.07.31

Abstract

The increase in the number of music files in smart phone and MP3 player makes it difficult to find the music files which people want. So, Query-by-Singing/Humming (QbSH) systems have been developed to retrieve music from a user's humming or singing without having to know detailed information about the title or singer of song. Most previous researches on QbSH have been conducted using musical instrument digital interface (MIDI) files as reference songs. However, the production of MIDI files is a time-consuming process. In addition, more and more music files are newly published with the development of music market. Consequently, the method of using the more common MPEG-1 audio layer 3 (MP3) files for reference songs is considered as an alternative. However, there is little previous research on QbSH with MP3 files because an MP3 file has a different waveform due to background music and multiple (polyphonic) melodies compared to the humming/singing query. To overcome these problems, we propose a new QbSH method using MP3 files on mobile device. This research is novel in four ways. First, this is the first research on QbSH using MP3 files as reference songs. Second, the start and end positions on the MP3 file to be matched are estimated by using multi-layered perceptron (MLP) prior to performing the matching with humming/singing query file. Third, for more accurate results, four MLPs are used, which produce the start and end positions for dynamic time warping (DTW) matching algorithm, and those for chroma-based DTW algorithm, respectively. Fourth, two matching scores by the DTW and chroma-based DTW algorithms are combined by using PRODUCT rule, through which a higher matching accuracy is obtained. Experimental results with AFA MP3 database show that the accuracy (Top 1 accuracy of 98%, with an MRR of 0.989) of the proposed method is much higher than that of other methods. We also showed the effectiveness of the proposed system on consumer mobile device.

Keywords

References

  1. R. Typke, F. Wiering, and R. C. Veltkamp, "A survey of music information retrieval systems," in Proc. of 6th International Conference on Music Information Retrieval, pp. 153-160, September 11-15, 2005. http://ismir2005.ismir.net/proceedings/1020.pdf
  2. X. Wu, M. Li, J. Liu, J. Yang, and Y. Yan, "A top-down approach to melody match in pitch contour for query by humming," in Proc. of International Symposium on Chinese Spoken Language Processing, vol. 2, pp. 669-680, December 13-16, 2006. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.1802
  3. K. Kim, K. R. Park, S.-J. Park, S.-P. Lee, and M. Y. Kim, "Robust query-by-singing/humming system against background noise environments," IEEE Trans. Consumer Electron., vol. 57, no. 2,pp. 720-725, May 2011. https://doi.org/10.1109/TCE.2011.5955213
  4. G. P. Nam, K. R. Park, S.-J. Park, S.-P. Lee, and M.-Y. Kim, "A new query-by-humming system based on the score level fusion of two classifiers," Int. J. Commun. Syst., vol. 25, issue 6, pp. 717-733, June 2012. https://doi.org/10.1002/dac.1187
  5. G. P. Nam, T. T. T. Luong, H. H. Nam, K. R. Park, and S.-J. Park, "Intelligent query by humming system based on score level fusion of multiple classifiers," EURASIP J. Adv. Signal Process., vol. 2011:21, pp. 1-11, July 2011.
  6. A. Kornstadt, "Themefinder: a web-based melodic search tool," Computing in Musicology, MIT Press, 1998, vol. 11, pp. 231-236. http://www.ccarh.org/publications/books/cm/vol/11/contents.html
  7. S. Blackburn and D. DeRoure, "A tool for content based navigation of music," in Proc. of ACM International Conference on Multimedia, pp. 361-368, September 12-16, 1998.
  8. R. Typke, P. Giannopoulos, R. C. Veltkamp, F. Wiering, and R. V. Oostrum, "Using transportation distances for measuring melodic similarity," in Proc. of International Conference on Music Information Retrieval, pp. 107-114, October 26-30, 2003. http://ismir2003.ismir.net/papers/Typke.PDF
  9. J.-S. R. Jang and M.-Y. Gao, "A query-by-singing system based on dynamic programming," in Proc. of International Workshop on Intelligent Systems Resolutions, pp. 85-89, December 11-12, 2000. http://ir.lib.nthu.edu.tw/bitstream/987654321/17662/1/2030226030026.pdf
  10. L. Prechelt and R. Typke, "An interface for melody input," ACM Trans. Computer-Human Interact., vol. 8, no. 2, pp. 133-149, June 2001. https://doi.org/10.1145/376929.376978
  11. M. Ryynanen and A. Klapuri, "Query by humming of MIDI and audio using locality sensitive hashing," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2249-2252, March 31-April 4, 2008.
  12. J.-S. R. Jang and H.-R. Lee, "A general framework of progressive filtering and its application to query by singing/humming," IEEE Trans. Audio Speech Lang. Process., vol. 16, no. 2, pp. 350-358, Feb. 2008. https://doi.org/10.1109/TASL.2007.913035
  13. S.-P. Heo, M. Suzuki, A. Ito, and S. Makino, "An effective music information retrieval method using three-dimensional continuous DP," IEEE Trans. Multimedia, vol. 8, no. 3, pp. 633- 639, June 2006. https://doi.org/10.1109/TMM.2006.870717
  14. N. Phiwma and P. Sanguansat, "A novel method for query-by-humming using distance space," in Proc. of International Conference on Pervasive Computing Signal Processing and Applications, pp. 841-845, September 17-19, 2010.
  15. K. Lemström and E. Ukkonen, "Including interval encoding into edit distance based music comparison and retrieval," in Proc. of Symposium on Creative and Cultural Aspects and Applications of AI and Cognitive Science, pp. 53-60, April 17-20, 2000. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.6339
  16. M. Mongeau and D. Sankoff, "Comparison of musical sequences," Computers and the Humanities, vol. 24, no. 3, pp. 161-175, June 1990. https://doi.org/10.1007/BF00117340
  17. A. Kotsifakos, P. Papapetrou, J. Hollmen, and D. Gunopulos, "A subsequence matching with gaps-range-tolerances framework: a query-by-humming application," in Proc. of the VLDB Endowment, vol. 4, no. 11, pp. 761-771, 2011. http://www.vldb.org/pvldb/vol4/p761-kotsifakos.pdf
  18. M. T. Hagan, H. B. Demuth, and M. H. Beale, Neural Network Design, PWS Publishing Company, 1996. http://dl.acm.org/citation.cfm?id=249049
  19. M. A. Bartsch and G. H. Wakefield, "Audio thumbnailing of popular music using chroma-based representations," IEEE Trans. Multimedia, vol. 7, no. 1, pp. 96-104, Feb. 2005. https://doi.org/10.1109/TMM.2004.840597
  20. D. Jang, C.-J. Song, S. Shin, S.-J. Park, S.-J. Jang, and S.-P. Lee, "Implementation of a matching engine for a practical query-by-singing/humming system," in Proc. of IEEE International Symposium on Signal Processing and Information Technology, pp. 258-263, December 14-17, 2011.
  21. M. Muller, Information Retrieval for Music and Motion, Springer, 2007.
  22. G. P. Nam and K. R. Park, "Multi-classifier based query-by-singing/humming system on mobile device," Multimedia Systems, in submission.
  23. G. P. Nam and K. R. Park, "Fast Query-by-Singing/Humming System that Combines Linear Scaling and Quantized Dynamic Time Warping Algorithm," KSII Transactions on Internet and Information Systems, in submission.

Cited by

  1. Exploring Environmental Factors Affecting Strawberry Yield Using Pattern Recognition Techniques vol.20, pp.1, 2013, https://doi.org/10.7472/jksii.2019.20.1.39