References
- Aguilo, M., Butko, T., Temko, A., & Nadeu, C. (2009). A hierarchical architecture for audio segmentation in a broadcast news task. Proceedings of I Iberian SLTech (pp. 17-20).
- Boersma, P., & Weenink, D. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9-10), 341-345.
- Castan, D., Tavarez, D., Lopez-Otero, P., Franco-Pedroso, J., Delgado, H., Navas, E., Docio-Fernandez, L., Ramos, D., Serrano, J., Ortega, A., & Lleida, E. (2015). Albayzín-2014 evaluation: Audio segmentation and classification in broadcast news domains. EURASIP Journal on Audio, Speech, and Music Processing, 2015(1), 33. https://doi.org/10.1186/s13636-015-0076-3
- Galibert, O. (2013). Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech. Proceedings of INTERSPEECH-2013 (pp. 1131-1134).
- Gallardo-Antolin, A., & Hernandez, R. S. S. (2010). UPM-UC3M system for music and speech segmentation. Proceedings of VI Jornadas en Tecnología del Habla II Iberian SLTech Workshop (FALA) (pp. 421-424).
- Gallardo-Antolin, A., & Montero, J. M. (2010). Histogram equalizationbased features for speech, music and song discrimination. IEEE Signal Processing Letters, 17(7), 659-662. https://doi.org/10.1109/LSP.2010.2049877
- Gupta, V., Kenny, P., Ouellet, P., & Stafylakis, T. (2014). I-vector based speaker adaptation of deep neural networks for French broadcast audio transcription. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 6334-6338).
- Heittola, T., Mesaros, A., Virtanen, T., & Eronen, A. (2011). Sound event detection in multisource environments using source separation. Proceedings of Workshop Machine Listening in Multisource Environments (pp. 36-40).
- Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6), 82-97. https://doi.org/10.1109/MSP.2012.2205597
- Hyvarinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. Neural Networks, 10(3), 626-634. https://doi.org/10.1109/72.761722
- Hyvarinen, A., & Oja, E. (2000). Independent component analysis: Algorithms and applications. Neural Networks, 13(4-5), 411-430. https://doi.org/10.1016/S0893-6080(00)00026-5
- Justusson, B. I. (1981). Median filtering: Statistical properties. In T. S. Huang (Ed.), Two-Dimensional Digital Signal Processing II (pp. 161-196). Berlin: Springer.
- Lee, G. H. (2015). A study on the appropriateness of music broadcasting fee of terrestrial broadcasters. Music Content and Law, 203-250.
- Metallinou, A., Lee, S., & Narayanan, S. (2008). Audio-visual emotion recognition using Gaussian mixture models for face and voice. Proceedings of International Symposium on Multimedia (ISM) (pp. 250-257).
- Mirsa, H., Ikbal, S., Bourlard, H., & Hermansky, H. (2004). Spectral entropy based feature for robust ASR. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 193-196).
- Muller, M., & Ewert, S. (2011). Chroma toolbox: MATLAB implementations for extracting variants of chroma-based audio features. Proceedings of International Society for Music Information Retrieval Conference (ISMIR) (pp. 215-220).
- Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., & Schwarz, P. (2011). The Kaldi speech recognition toolkit. Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
- Saon, G., Soltau, H., Nahamoo, D., & Picheny, M., (2013). Speaker adaptation of neural network acoustic models using i-vectors. Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 55-59).
- SBS (2015). SBS drama special: Mask. Retrieved from http://programs.sbs.co.kr/drama/2015mask on September 27, 2018.
- Snyder, D., Chen, G., & Povey, D. (2015). Musan: A music, speech, and noise corpus. Retrieved from https://arxiv.org/abs/1510.08484v1 on September 27, 2018.
- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
- van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. The Journal of Machine Learning Research, 9(1), 2579-2605.
- Wang, S. S., Lin, P., Lyu, D. C., Tsao, Y., Hwang, H. T., & Su, B. (2014). Acoustic feature conversion using a polynomial based feature transferring algorithm. Proceedings of International Symposium on Chinese Spoken Language Processing (ISCSLP) (pp. 454-458).