Vocal and nonvocal separation using combination of kernel model and long-short term memory networks

Cho, Hye-Seung;Kim, Hyoung-Gook;

doi:10.7776/ASK.2017.36.4.261

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 36 Issue 4
/
Pages.261-266
/
2017
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

DOI QR Code

Vocal and nonvocal separation using combination of kernel model and long-short term memory networks

커널 모델과 장단기 기억 신경망을 결합한 보컬 및 비보컬 분리

Cho, Hye-Seung ;
Kim, Hyoung-Gook (Department of Radio Sciences and Engineering, Kwangwoon University)

조혜승 (광운대학교 전파공학과) ;
김형국 (광운대학교 전파공학과)

Received : 2017.02.20
Accepted : 2017.07.31
Published : 2017.07.31

https://doi.org/10.7776/ASK.2017.36.4.261 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose a vocal and nonvocal separation method which uses a combination of kernel model and LSTM (Long-Short Term Memory) networks. Conventional vocal and nonvocal separation methods estimate the vocal component even in sections where only non-vocal components exist. This causes a problem of the source estimation error. Therefore we combine the existing kernel based separation method with the vocal/nonvocal classification based on LSTM networks in order to overcome the limitation of the existing separation methods. We propose a parallel combined separation algorithm and series combined separation algorithm as combination structures. The experimental results verify that the proposed method achieves better separation performance than the conventional approaches.

본 논문에서는 커널 모델과 장단기 기억(Long-Short Term Memory, LSTM) 신경망을 결합한 보컬 및 비보컬 분리 방식을 제안한다. 기존의 음원 분리 방식은 비보컬 음원만 있는 구간에서 음원을 오추정하여 불필요한 비보컬 음원을 출력하는 한계가 있다. 따라서 본 논문에서는 커널 모델 기반의 보컬음 분리 방식에 LSTM 신경망 기반의 보컬 구간 분류 방식을 결합하여 보컬 음원의 오추정 문제를 개선하고 분리 성능을 향상시키고자 하였다. 또한 본 논문에서는 방식간의 결합 구조에 따라 병렬 결합형 분리 알고리즘과 직렬 결합형 분리 알고리즘을 제안하였으며, 실험을 통해 제안하는 방식들이 기존의 방식에 비해 더욱 향상된 분리 성능을 보이는 것을 확인할 수 있었다.

Keywords

References

E. Vincent, N. Bertin, R. Gribonval, and F. Bimbot, "From blind to guided audio source separation: How models and side information can improve the separation of sound," IEEE Signal Processing Magazine 31, 107-115 (2014). https://doi.org/10.1109/MSP.2013.2297440
A. Liutkus, D. Fitzgerald, and Z. Rafii, "Scalable audio separation with light kernel additive modeling," IEEE ICASSP, 76-80 (2015).
S. Hochreiter and J. Schmidhuber, "Long short-termmemory," Neural computation 9, 1735-1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
G. Parascandolo, H. Huttunen, and T. Virtanen, "Recurrent neural networks for polyphonic sound event detection in real life recordings," IEEE ICASSP, 6440-6444 (2016).
E.Vincent, R. Griboncal and C. Fevotte, "Performance measurement in blind audio source separation," IEEE Transactions on Audio, Speech and Language Processing, 1462-1469 (2006).

The Journal of the Acoustical Society of Korea (한국음향학회지)

Vocal and nonvocal separation using combination of kernel model and long-short term memory networks

커널 모델과 장단기 기억 신경망을 결합한 보컬 및 비보컬 분리

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)