References
- Butko, T., & Nadeu, C. (2011). Audio segmentation of broadcast news in the Albayzin-2010 evaluation: overview, results, and discussion. EURASIP Journal on Audio, Speech, and Music Processing, 2011(1), 1-10. https://doi.org/10.1186/1687-4722-2011-1
- Castan, D., Tavarez, D., Lopez-Otero, P., Franco-Pedroso, J., Delgado, H., Navas, E., Docio-Fernandez, L., ... Lleida, E. (2015). Albayzin-2014 evaluation: audio segmentation and classification in broadcast news domains. EURASIP Journal on Audio, Speech, and Music Processing, 2015(33), 1-9. https://doi.org/10.1186/s13636-014-0045-2
- Doukhan, D., Lechapt, E., Evrard, M., & Carrive, J. (2018). Ina's MIREX 2018 music and speech detection system. Music Information Retrieval Evaluation eXchange (MIREX).
- Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788-798. https://doi.org/10.1109/TASL.2010.2064307
- He, K., Zhang, X., Ren, S., & Sun, J. (2016, June). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778).
- LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791
- Lu, R., & Duan, Z. (2017). Bidirectional GRU for sound event detection. Detection and Classification of Acoustic Scenes and Events.
- Mesaros, A., Heittola, T., & Virtanen, T. (2016). Metrics for polyphonic sound event detection. Applied Sciences, 6(6), 162. https://doi.org/10.3390/app6060162
- Mirex (2015). Music/speech classification and detection. Retrieved from http://www.music-ir.org/mirex/wiki/2015:Music/Speech_Classifi-cation_and_Detection
- Mirex (2018). Music and/or speech detection. Retrieved from http://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection
- Sak, H., Senior, A., & Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In 15th Annual Conference of the International Speech Communication Association (Interspeech-2014) (pp. 338-342). Singapore.
- Tsipas, N., Vrysis, L., Dimoulas, C., & Papanikolaou, G. (2017). Efficient audio-driven multimedia indexing through similaritybased speech/music discrimination. Multimedia Tools and Applications, 76(24), 25603-25621. https://doi.org/10.1007/s11042-016-4315-0
- Yu, F., & Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. Retrieved from https://arxiv.org/abs/1511.07122.
- Zhang, Q., Cui, Z., Niu, X., Geng, S., & Qiao, Y. (2017). Image segmentation with pyramid dilated convolution based on ResNet and U-Net. In International Conference on Neural Information Processing (pp. 364-372).
- Zuo, Z., Shuai, B., Wang, G., Liu, X., Wang, X., Wang, B., & Chen, Y. (2015, June). Convolutional recurrent neural networks: Learning spatial dependencies for image representation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 18-26).