Acknowledgement
본 연구는 과학기술정보통신부 및 정보통신기획평가원의 학석사연계ICT핵심인재양성사업의 연구결과로 수행되었음(IITP-2024-RS-2023-00260175).
References
- M. Shin and Y. Shin, "Data sampling strategy for Korean speech emotion classification using wav2vec2.0," in Proceedings of the Annual Conference of Korea Information Processing Society Conference (KIPS) 2023, Vol.30, No.2, pp.493-494, 2023. [Online]. Available: https://kiss.kstudy.com/Detail/Ar?key=4059338
- A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, "wav2vec 2.0: A framework for self-supervised learning of speech representations," Advances in Neural Information Processing Systems, Vol.33, pp.12449-12460, 2020.
- S., Schneider, A., Baevski, R., Collobert, and M. Auli, "wav2vec: Unsupervised pre-training for speech recognition," arXiv preprint arXiv:1904.05862, 2019.
- A., Vaswani et al., "Attention is all you need," Advances in Neural Information Processing Systems, Vol.30, 2017.
- J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
- Y. Liu, et al., "Roberta: A robustly optimized bert pretraining approach," arXiv preprint arXiv:1907.11692, 2019.
- K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, "Electra: Pre-training text encoders as discriminators rather than generators," arXiv preprint arXiv:2003.10555, 2020.
- S. Yoon, S. Byun, and K. Jung, "Multimodal speech emotion recognition using audio and text," In: 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE. pp.112-118, 2018.
- X, Zhang, M, Wang, and X. Guo, "Multi-modal emotion recognition based on deep learning in speech, video and text," 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP), Signal and Image Processing (ICSIP), pp.328-333, 2020.
- J. Agarkhed, "Machine learning based integrated audio and text modalities for enhanced emotional analysis," In 2023 5th International Conference on Inventive Research in Computing Applications (ICIRCA), IEEE, pp.989-993, 2023.
- S. S. Hosseini, M. R. Yamaghani, and S. Poorzaker Arabani, "Multimodal modelling of human emotion using sound, image and text fusion," Signal, Image and Video Processing, Vol.18, No.1, pp.71-79, 2024. https://doi.org/10.1007/s11760-023-02707-8
- H. Koh, S. Joo, and K. Jung, "Reflecting dialogue and pretrained information into multi modal emotion recognition: focusing on text and audio," in The Korean Institute of Information Scientists and Engineers, pp.2136-2138, 2023, [Online]. Available: https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE11488656
- Y.-J. Kim, K. Roh, and D. Chae, "Feature-based Emotion Recognition Model Using Multimodal Data," in The Korean Institute of Information Scientists and Engineers, pp. 2169-2171. 2023, [Online]. Available: https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE11488667
- T. Yoon, S. Lee, H. Lee, H. Jin, and M. Song, "CoKoME: Context modeling for Korean multimodal emotion recognition in conversation," in The Korean Institute of Information Scientists and Engineers, pp.2100-2102, 2023, [Online]. Available: https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE11488644
- J. Lee, J. Bae, and S. Cho, "Multi-modal emotion recognition in Korean conversation via Contextualized GNN," in The Korean Institute of Information Scientists and Engineers, pp.2094-2096. 2023, [Online]. Available: https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE11488642
- S. Park, et al., "Klue: Korean language understanding evaluation," arXiv preprint arXiv:2105.09680, 2021.