DOI QR코드

DOI QR Code

A Study on Vocal Removal Scheme of SAOC Using Harmonic Information

하모닉 정보를 이용한 SAOC의 보컬 신호 제거 방법에 관한 연구

  • 박지훈 (스마트 IT 융합 시스템 연구단) ;
  • 장대근 (한국과학기술원 전기및전자공학과) ;
  • 한민수 (한국과학기술원 전기및전자공학과)
  • Received : 2013.08.30
  • Accepted : 2013.09.22
  • Published : 2013.10.30

Abstract

Interactive audio service provide with audio generating and editing functionality according to user's preference. A spatial audio object coding (SAOC) scheme is audio coding technology that can support the interactive audio service with relatively low bit-rate. However, when the SAOC scheme remove the specific one object such as vocal object signal for Karaoke mode, the scheme support poor quality because the removed vocal object remain in the SAOC-decoded background music. Thus, we propose a new SAOC vocal harmonic extranction and elimination technique to improve the background music quality in the Karaoke service. Namely, utilizing the harmonic information of the vocal object, we removed the harmonics of the vocal object remaining in the background music. As harmonic parameters, we utilize the pitch, MVF(maximum voiced frequency), and harmonic amplitude. To evaluate the performance of the proposed scheme, we perform the objective and subjective evaluation. As our experimental results, we can confirm that the background music quality is improved by the proposed scheme comparing with the SAOC scheme.

IAS는 대게 사용자가 자신의 취향에 맞는 음악을 직접 제작 및 편집 가능한 기능을 제공하는 서비스이다. SAOC는 낮은 전송률로 IAS가 가능한 다객체 오디오 코딩 기술이다. 하지만 SAOC 기법은 특정 객체를 제거하는 경우, 특히 보컬 객체를 제거하는 경우 배경음악에 보컬 객체의 하모닉이 남아있는 문제점이 있다. 그래서 본 논문은 하모닉 추출과 제거를 사용한 보컬 객체 제거 기법을 제안한다. 제안 하는 기법은 부호화기에서 추출한 하모닉 정보를 이용하여 복호화기에서 보컬 객체 신호를 다운믹스 신호에서 제거하는 기법이다. 하모닉 정보로써, 기본 주파수, MVF, 하모닉 크기를 사용한다. 성능평가로 객관적, 주관적 실험을 수행하였으며 모든 실험 결과를 통해 SAOC 기법보다 제안하는 기법이 우수함을 확인한다.

Keywords

References

  1. D. Jang, T. Lee, Y. Lee, and J. Yoo, "A Personalized Preset-based Audio System for Interactive Service," 121st AES Convention, 2006.
  2. Consideration of Interactive Music Service, ISO/IEC JTC1/SC29/WG11 (MPEG), Archamps, Document M15390, 2008.
  3. J. Herre and S. Disch, "New Concepts in Parametric Coding of Spatial Audio: From SAC to SAOC," 2007 International Conference on Multimedia and Expo, pp. 1894-1897, 2007.
  4. J. Engdegard, B. Resch, C. Falch, O. Hellmuth, J. Hilpert, A. Hoelzer, L. Terentiev, J. Breebaart, J. Koppens, E. Schuijers, and W. Oomen, "Spatial Audio Object Coding (SAOC) -The Upcoming MPEG Standard on Parametric Object Based Audio Coding," 124th AES Convention, 2008.
  5. O. Hellmuth, H. Purnhagen, J. Koppens, J. Herre, J. Engdegard, J. Hilpert, L. Villemoes, L. Terentiv, C. Falch, A. Holzer, M.L. Valero, B. Resch, H. Mundt, and H. Oh, "MPEG Spatial Audio Object Coding - the ISO/MPEG Standard for Efficient Coding of Interactive Audio Scenes," 129th AES Convention, 2010.
  6. L.R. Rabiner, M.J. Cheng, A. Rosenberg, and C.A. McGonegal, "A Comparative Performance Study of Several Pitch Detection Algorithms," IEEE Trans. on ASSP , vol. ASSP-24, No. 5, pp. 399-418, 1976.
  7. M. Goto, "A Predominant-F0 Estimation Method for CD Recordings: MAP Estimation using an EM Algorithm for Adaptive Tone Models," Proc. Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 5, pp. 3365 -3368, 2001.
  8. A. de Cheveigne and H. Kawahara, "YIN, a Fundamental Frequency Estimator for Speech and Music." The Journal of the Acoust. Soc. Am., Vol. 111, No. 4, pp. 1917-1930, 2002. https://doi.org/10.1121/1.1458024
  9. M. Wu, D. Wang, and G.J. Brown, "A Multipitch Tracking Algorithm for Noisy Speech," Proc. IEEE Trans. Speech and Audio, Vol. 11, No. 3, pp. 229-241, 2003. https://doi.org/10.1109/TSA.2003.811539
  10. M. Goto, "A Real-Time Music-Scene- Description System: Predominant -F0 Estimation for Detecting Melody and Bass Lines in Real-World Audio Signals," Speech Com., Vol. 43, No. 4, pp. 311-329, 2004. https://doi.org/10.1016/j.specom.2004.07.001
  11. A. Klapuri, "Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes," Proc. International Conference on Music Information Retrieval, pp. 216-212, 2006.
  12. H. Fujihara, M. Goto, J. Ogata, K. Komatani, T. Ogata, and H.G. Okuno, "Automatic Synchronization Between Lyrics and Music CD Recordings Based on Viterbi Alignment of Segregated Vocal Signals," IEEE International Symposium on Multimedia, pp. 257-264, 2006.
  13. S. Kim, J. Kim, and M. Hahn, "HMM-Based Korean Speech Synthesis System for Hand- Held Devices," IEEE Trans. Consumer Electronics, Vol. 52, No. 4, pp. 1384-1390, 2006. https://doi.org/10.1109/TCE.2006.273160
  14. S. Kim, J. Kim, and M. Hahn, "Implementation and Evaluation of an HMM-based Korean Speech Synthesis System," IEICE Transactions on Information and Systems, Vol. E89- D, No. 3, pp. 1116-1119, 2006. https://doi.org/10.1093/ietisy/e89-d.3.1116
  15. S. Kim, J. Kim, and M. Hahn, "Two-band Excitation for HMM-based Speech Synthesis," IEICE Trans. Information and Systems, Vol. E90-D, No. 1, pp. 378-381, 2007. https://doi.org/10.1093/ietisy/e90-1.1.378
  16. S. Han, S. Jeong, and M. Hahn, "Optimum MVF Estimation-Based Two-Band Excitation for HMM-Based Speech Synthesis," ETRI J ournal, Vol. 31, No. 4, pp. 457-459, 2009. https://doi.org/10.4218/etrij.09.0209.0112
  17. P.C. Loizou, Speech Enhancement: Theory and Practice, Talor & Francis, New York, 2009.
  18. ITU-R Recommendation, Method for the Subjective Assessment of Intermediate Sound Quality (MUSHRA), ITU, BS. 1543-1, 2001.
  19. T. Kim and J. Chang " A Study on Speech Period and Pitch Detection for Continuous Speech Recognition," Journal of Korea Multimedia Society, Vol. 8, no. 1, pp. 55-61, 2005.