On-Line Audio Genre Classification using Spectrogram and Deep Neural Network

Yun, Ho-Won;Shin, Seong-Hyeon;Jang, Woo-Jin;Park, Hochong;

doi:10.5909/JBE.2016.21.6.977

Journal of Broadcast Engineering (방송공학회논문지)

Volume 21 Issue 6
/
Pages.977-985
/
2016
/
1226-7953(pISSN)
/
2287-9137(eISSN)

The Korean Institute of Broadcast and Media Engineers (한국방송∙미디어공학회)

DOI QR Code

On-Line Audio Genre Classification using Spectrogram and Deep Neural Network

스펙트로그램과 심층 신경망을 이용한 온라인 오디오 장르 분류

Yun, Ho-Won (Dept. of Electronics Engineering, Kwangwoon University) ;
Shin, Seong-Hyeon (Dept. of Electronics Engineering, Kwangwoon University) ;
Jang, Woo-Jin (Dept. of Electronics Engineering, Kwangwoon University) ;
Park, Hochong (Dept. of Electronics Engineering, Kwangwoon University)

윤호원 (광운대학교 전자공학과) ;
신성현 (광운대학교 전자공학과) ;
장우진 (광운대학교 전자공학과) ;
박호종 (광운대학교 전자공학과)

Received : 2016.09.06
Accepted : 2016.10.07
Published : 2016.11.30

https://doi.org/10.5909/JBE.2016.21.6.977 Citation PDF KSCI KPUBS

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, we propose a new method for on-line genre classification using spectrogram and deep neural network. For on-line processing, the proposed method inputs an audio signal for a time period of 1sec and classifies its genre among 3 genres of speech, music, and effect. In order to provide the generality of processing, it uses the spectrogram as a feature vector, instead of MFCC which has been widely used for audio analysis. We measure the performance of genre classification using real TV audio signals, and confirm that the proposed method has better performance than the conventional method for all genres. In particular, it decreases the rate of classification error between music and effect, which often occurs in the conventional method.

본 논문은 스펙트로그램과 심층 신경망을 이용한 온라인 오디오 장르 분류 방법을 제안한다. 제안한 방법은 온라인 동작을 위하여 1초 단위로 신호를 입력하여 speech, music, effect 중 하나의 장르로 분류하고, 동작의 범용성을 위하여 기존 오디오 분석에 널리 사용되는 MFCC 대신에 스펙트로그램 기반의 특성 벡터를 사용한다. 실제 TV 방송 신호를 사용하여 장르 분류 성능을 측정하였고, 제안 방법이 기존 방법보다 각 장르에 대하여 우수한 성능을 제공하는 것을 확인하였다. 특히 제안 방법은 기존 방법에서 나타나는 music과 effect 사이를 잘못 분류하는 문제점을 감소시킨다.

Keywords

References

Daeyoung Jang, Jeongil Seo, Yong Ju Lee, Jae-hyoun Yoo, Taejin Park and Taejin Lee, "A Study on Realistic Sound Reproduction for UHDTV," Journal of Broadcast Engineering, vol 20, no. 1, pp. 68-81, Jan. 2015. https://doi.org/10.5909/JBE.2015.20.1.68
G. Tzanetakis and P. Cook, "Musical Genre Classification of Audio Signals," IEEE Transactions on Speech and Audio Processing, vol. 10, no. 5, pp. 293-302, Jul. 2002. https://doi.org/10.1109/TSA.2002.800560
Tao Feng, "Deep learning for music genre classification," private document.
Jung-Sung Lee and Hyoung-Gook Kim, "Background Music Identification in TV Broadcasting Program Algorithm using Audio Peak Detection," Proc. of 2013 Korean Institute of Broadcast and Media Engineers Summer Conference, pp. 34-35, Jun. 2013.
Z. Kons and O. Toledo-Ronen, "Audio event classification using deep neural networks," Proc. of Interspeech, pp. 1482-1486, 2013.
D. Reynolds, "Gaussian Mixture Models," Encyclopedia of Biometrics, pp. 827-832, Jul. 2015.
ETSI ES 202 211, "Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Extended Front-End Feature Extraction Algorithm; Compression Algorithm; Back-End Speech Reconstruction Algorithm," Nov. 2003.
G. E. Hinton and R. R. Salakhutdinov, "Reducing the Dimensionality of Data with Neural Networks," Science, vol. 313, pp. 504-507, Jul. 2006. https://doi.org/10.1126/science.1127647
N. Srivastava, G. Hinton, A. Krizhevsky and R. Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting," Journal of Machine Learning Research, 15(1), pp. 1929-1958, Jun. 2014.

Journal of Broadcast Engineering (방송공학회논문지)

On-Line Audio Genre Classification using Spectrogram and Deep Neural Network

스펙트로그램과 심층 신경망을 이용한 온라인 오디오 장르 분류

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)