Search | Korea Science

Kwak, Jin-Yeol;Chung, Yong-Joo
- The Journal of the Korea institute of electronic communication sciences
- /
- v.16 no.3
- /
- pp.401-406
- /
- 2021
Recently, mean-teacher models based on convolutional recurrent neural networks are popularly used in audio event detection. The mean-teacher model is an architecture that consists of two parallel CRNNs and it is possible to train them effectively on the weakly-labelled and unlabeled audio data by using the consistency learning metric at the output of the two neural networks. In this study, we tried to improve the performance of the mean-teacher model by using additional derivative features of the log-mel spectrum. In the audio event detection experiments using the training and test data from the Task 4 of the DCASE 2018/2019 Challenges, we could obtain maximally a 8.1% relative decrease in the ER(Error Rate) in the mean-teacher model using proposed derivative features.
https://doi.org/10.13067/JKIECS.2021.16.3.401 인용 PDF KSCI

Kwak, Jin-Yeol;Chung, Yong-Joo
- The Journal of the Korea institute of electronic communication sciences
- /
- v.15 no.3
- /
- pp.465-472
- /
- 2020
Recently, various deep neural networks based methods have been proposed for audio event detection. In this study, we improved the performance of audio event detection by adopting an attention approach to a baseline CRNN. We applied context gating at the input of the baseline CRNN and added an attention layer at the output. We improved the performance of the attention based CRNN by using the audio data of strong labels in frame units as well as the data of weak labels in clip levels. In the audio event detection experiments using the audio data from the Task 4 of the DCASE 2018/2019 Challenge, we could obtain maximally a 66% relative increase in the F-score in the proposed attention based CRNN compared with the baseline CRNN.
https://doi.org/10.13067/JKIECS.2020.15.3.465 인용 PDF KSCI