DOI QR코드

DOI QR Code

음악검색을 위한 가변임계치 기반의 음성 질의 변환 기법

A Threshold Adaptation based Voice Query Transcription Scheme for Music Retrieval

  • 한병준 (고려대학교 전자전기공학과) ;
  • 노승민 (고려대학교 전기전자전파공학부) ;
  • 황인준 (고려대학교 전기전자전파공학부)
  • 발행 : 2010.02.01

초록

This paper presents a threshold adaptation based voice query transcription scheme for music information retrieval. The proposed scheme analyzes monophonic voice signal and generates its transcription for diverse music retrieval applications. For accurate transcription, we propose several advanced features including (i) Energetic Feature eXtractor (EFX) for onset, peak, and transient area detection; (ii) Modified Windowed Average Energy (MWAE) for defining multiple small but coherent windows with local threshold values as offset detector; and finally (iii) Circular Average Magnitude Difference Function (CAMDF) for accurate acquisition of fundamental frequency (F0) of each frame. In order to evaluate the performance of our proposed scheme, we implemented a prototype music transcription system called AMT2 (Automatic Music Transcriber version 2) and carried out various experiments. In the experiment, we used QBSH corpus [1], adapted in MIREX 2006 contest data set. Experimental result shows that our proposed scheme can improve the transcription performance.

키워드

참고문헌

  1. J.-S. Roger Jang, "QBSH: A corpus for designing QBSH (query by singing/humming) systems", "QBSH corpus for query by singing/humming", http://www.cs.nthu.edu.tw /~jang.
  2. Ghias, et al., "Query by humming: musical information retrieval in an audio database," Proc. of ACM Multimedia 1995, pp. 231-236, 1995.
  3. M. Ross, et al., "Average magnitude difference function pitch extractor," IEEE Trans. on ASSP, Vol.22, pp.353-362, 1974. https://doi.org/10.1109/TASSP.1974.1162598
  4. B. Han, S. Rho and E. Hwang, "An efficient voice transcription scheme for music retrieval," Proc. of IEEE MUE 2007, pp. 366-371, Apr. 2007.
  5. S. Rho, B. Han, E. Hwang, and M. Kim, "An adaptation framework for QBH-based music retrieval," KES 2007, vol. 4692, Sept. 2007.
  6. S. Park, S. Kim, K. Byeon, E. Hwang, "Automatic voice query transformation for query-by-humming systems," IMSA, Aug. 2005.
  7. W. Zhang, et al., "Pitch extraction based on circular AMDF," Proc. of IEEE ICASSP, 2002.
  8. J. P. Bello, et al., "A tutorial on onset detection in music signals," IEEE Trans. on SAP, vol. 13, no. 5, pp. 1035-1047, Sept. 2005.
  9. J. P. Bello, et al., "On the use of phase and energy for musical onset detection in the complex domain," IEEE SPL, vol. 11, no. 6, Jun. 2004.
  10. C. Duxbury, et al., "Complex domain onset detection for musical signals," DAFx, Sept. 2003.
  11. E. D. Scheirer, "Tempo and beat analysis of acoustic musical signals," Journal of the Acoustical Society of America (JASA), vol. 103, Jan. 1998.
  12. A. Klapuri, "Sound onset detection by applying psychoacoustic knowledge," IEEE ICASSP, 1999.
  13. P. Leveau, et al. "Methodology and tools for the evaluation of automatic onset detection algorithms in music," ISMIR 2004, 2004.
  14. MUSIC-IR community, Annual Music Information Retrieval Evaluation eX-change (MIREX), http://www.music-ir.org/mirex.
  15. M. Sylvain, "An efficient pitch-tracking algorithm using a combination of Fourier transforms," Proc. of DAFx, Dec. 2001.
  16. L. Eric and R. Maddox, "Real-time time-domain pitch tracking using wavelets," REU Report, Univ. of Illinois at Urbana-Champaign, Dept. of Physics, 2005.
  17. LPC-10, arl.wustl.edu/~jaf/lpc/lpc10-1.5.tar.gz
  18. L. Gu and R. Liu, "High performance mandarin pitch estimation," Jnl. of Electronics, vol. 27, 1999.
  19. A.Sterian and G.Wakefield, "Robust automated music transcription systems," Proc. of ICMC, 1996.
  20. M. Ryynanen, et al., "Transcription on the singing melody in polyphonic music," ISMIR, 2006.
  21. A. Savitzkey and M. Golay, "Smoothing and differentiation of data by simplified least squares procedures," Analytical Chemistry, vol.36, 1964.
  22. J. MacQueen, "Some methods for classification and analysis of multivariate observations," Proc. of ACM SIGKDD, pp.16-22, 1967.