This research was supported and funded by the Korean National Police Agency. [Pol-Bot Development for Conversational Police Knowledge Services / PR09-01-000-20]
- S. Byun and S. Lee, "Emotion recognition using tone and tempo based on voice for IoT," Trans. of the Korean Institute of Electrical Engineers, vol. 65, no. 1, pp. 116-121, 2016. DOI: 10.5370/kiee.2016.65.1.116.
- I. Hong, Y. Ko, Y. Kim, and H. Shin, "A study on the emotional feature composed of the mel-frequency cepstral coefficient and the speech speed," Journal of Computing Science and Engineering, vol. 13, no. 4, pp. 131-140, 2019. DOI: 10.5626/JCSE.2019.13.4.131
- M. S. Likitha, S. R. R. Gupta, K. Hasitha, and A. U. Raju, "Speech based human emotion recognition using MFCC," in 2017 Int. Conf. on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 2257-2260, Mar. 2017. DOI: 10.1109/WiSPNET.2017.8300161.
- S. Park, D. Kim, S. Kwon, and N. Park, "Speech emotion recognition based on CNN using spectrogram," in Information and Control Symposium, pp. 240-241, Oct. 2018.
- J. Lee, H. Ryu, D. Chang, and M. Koo, "End-to-end Korean speech emotion recognition using deep neural networks," in Korea Computer Congress, pp. 1000-1002, Jun. 2018.
- G. Tangriberganov, T. A. Adesuyi, and B. Kim, "A hybrid approach for speech emotion recognition using 1D-CNN LSTM," in Korea Computer Congress, pp. 833-835, July. 2020.
- G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, "Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network," in 2016 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 5200-5204, Mar. 2016. DOI: 10.1109/ICASSP.2016.7472669.
- K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv: 1409.1556, 2014.
- J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, "Image-Net: a large-scale hierarchical image database," in 2009 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 248-255, Jun. 2009. DOI: 10.1109/CVPR.2009.5206848.
- J. Lee, U. Yoon, and G. Jo, "CNN-based speech emotion recognition model applying transfer learning and attention mechanism," Journal of KIISE, vol. 47, no. 7, pp. 665-673, 2020. DOI: 10.5626/JOK.2020.47.7.665
- S. R. Livingstone and F. A. Russo, "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English," PLoS ONE, vol. 13, no. 5, pp. e0196391, May. 2018. DOI: 10.1371/journal.pone.0196391.
- W. Tang, G. Long, L. Liu, T. Zhou, J. Jiang, and M. Blumenstein, "Rethinking 1D-CNN for time series classification: a stronger baseline," arXiv: 2002.10061, 2020.
- L. Huang, J. Dong, D. Zhou, and Q. Zhang, "Speech emotion recognition based on three-channel feature fusion of CNN and BiLSTM," in 2020 the 4th International Conference on Innovation in Artificial Intelligence (ICIAI), pp. 52-58, May. 2020. DOI: 10.1145/3390557.3394317
- P. Mishra and R. Sharma, "Gender differentiated convolutional neural networks for speech emotion recognition," in 12th Int. Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), pp. 142-148, Oct. 2020. DOI: 10.1109/ICUMT51630.2020.9222412.
- librosa [Internet]. Available: https://librosa.org/doc/latest/index.html.