Acknowledgement
This work was supported by an Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean Government (23ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence Systems).
References
- D. Pramod and P. Bafna, Conversational recommender systems techniques, tools, acceptance, and adoption: a state of the art review, Expert Syst. Appl. 203 (2022), 117539.
- J. Konstan and L. Terveen, Human-centered recommender systems: origins, advances, challenges, and opportunities, AI Mag. 42 (2021), no. 3, 31-42.
- K. Zielnicki, Simulacra and selection: clothing set recommendation at stitch fix, (Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France), 2019, pp. 1379-1380.
- Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang, Deep Fashion: Powering robust clothes recognition and retrieval with rich annotations, (IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA), 2016, DOI 10.1109/CVPR.2016.124.
- H. Wu, Y. Gao, X. Guo, Z. Al-Halah, S. Rennie, K. Grauman, and R. Feris, Fashion IQ: A new dataset towards retrieving images by natural language feedback, (IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA), 2021, pp. 11307-11317.
- P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing, arXiv preprint, 2021, DOI 10.48550/arXiv.2107.13586.
- K. Gadzicki, R. Khamsehashari, and C. Zetzsche, Early vs late fusion in multimodal convolutional neural networks, (IEEE 23rd International Conference on Information Fusion, Rustenburg, South Africa), 2020, pp. 1-6.
- S. Huang, A. Pareek, R. Zamanian, I. Banerjee, and M. P. Lungren, Multimodal fusion with deep neural networks for leveraging ct imaging and electronic health record: a case-study in pulmonary embolism detection, Sci. Reports 10 (2020), 22147.
- K. Clark, T. Luong, Q. V. Le, and C. Manning, ELECTRA: Pretraining text encoders as discriminators rather than generators, (8th International Conference on Learning Representations, Virtual Conference), 2020.
- S. Bao, H. He, F. Wang, H. Wu, and H. Wang, PLATO: pretrained dialogue generation model with discrete latent variable, (Proc. 58th Annual Meeting of the Association for Computational Linguistics), 2020, pp. 85-96.
- A. Celikyilmaz, E. Clark, and J. Gao, Evaluation of text generation: a survey, arXiv preprint, 2020, DOI 10.48550/arXiv.2006.14799.
- F. Petroni, T. Rocktaschel, S. Riedel, P. Lewis, A. Bakhtin, Y. Wu, and A. Miller, Language models as knowledge bases? (Proceedings of conference on EMNLP-IJCNLP, Hong Kong, China), 2019, pp. 2463-2473.
- L. Cui, Y. Wu, J. Liu, S. Yang, and Y. Zhang, Template-based named entity recognition using BART, arXiv preprint, 2021, DOI 10.48550/arXiv.2106.01760
- X. Li, J. Feng, Y. Meng, Q. Han, F. Wu, and J. Li, A unified MRC framework for named entity recognition, (Proc. 58th Annual Meeting of the Association for Computational Linguistics, Online), 2020, pp. 5849-5859.
- B. Lester, R. Al-Rfou, and N. Constant, The power of scale for parameter-efficient prompt tuning, arXiv preprint, 2021, DOI 10.48550/arXiv.2104.08691.
- X. Han, W. Zhao, N. Ding, Z. Liu, and M. Sun, PTR: prompt tuning with rules for text classification, arXiv preprint, 2021, DOI 10.48550/arXiv.2105.11259.
- A. Prakash, K. Chitta, and A. Geiger, Multi-modal fusion transformer for end-to-end autonomous driving, arXiv preprint, 2021, DOI 10.48550/arXiv.2104.09224.
- A. Nagrani, S. Yang, A. Arnab, A. Jansen, C. Schmid, and C. Sun, Attention bottlenecks for multimodal fusion, Proc. NIPS,34 (2021), 14200-14213.
- J. D. S. Ortega, M. Senoussaoui, E. Granger, M. Pedersoli, P. Cardinal, and A. L. Koerich, Multimodal fusion with deep neural networks for audio-video emotion recognition, arXiv preprint 2019, DOI 10.48550/arXiv.1907.03196.
- Y. Lu, J. Zeng, J. Zhang, S. Wu, and M. Li, Attention calibration for transformer in neural machine translation, (Proceedings of ACL-IJCNLP, Online), 2021, pp. 1288-1298.
- R. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, A. Torralba, R. Urtasun, and S. Fidler, Skip-thought vectors, arXiv preprint, 2015, DOI 10.48550/arXiv.1506.06726.
- L. Logeswaran and H. Lee, An efficient framework for learning sentence representations, (Proceedings of International Conference on Learning, Representations, Vancouver, Canada), 2018.
- T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, BERTScore: evaluating text generation with BERT, (Proceedings of International Conference on Learning Representations, ONline), 2020.
- F. Heuer, S. Mantowsky, S. S. Bukhari, and G. Schneider, MultiTask-Centernet (MCN): Efficient and diverse multitask learning using an anchor free approach, (IEEE/CVF International Conference on Computer Vision Workshops, Montreal, Canada), 2021, pp. 997-1005.
- R. Hu and A. Singh, UniT: Multimodal multitask learning with a unified transformer, (IEEE/CVF International Conference on Computer Vision, Montreal, Canada), 2021, pp. 1439-1449.
- X. Liu, P. He, W. Chen, and J. Gao, Multi-task deep neural networks for natural language understanding, (Proceedings of ACL, Florence, Italy), 2019, pp. 4487-4496.
- B. Lin, F. Ye, Y. Zhang, and I. W. Tsang, Reasonable effectiveness of random weighting: A litmus test for multi-task learning, arXive preprint, 2021, DOI 10.48550/arXiv.2111.10603
- L. Liu, Y. Li, Z. Kuang, J. Xue, Y. Chen, W. Yang, Q. Liao, and W. Zhang, Towards impartial multi-task learning, (Proceedings of International Conference on Learning Representations), 2021.
- R. C. Gunasekara, D. Nahamoo, L. C. Polymenakos, D. E. Ciaurri, J. Ganhotra, and K. P. Fadnis, Quantized dialog-a general approach for conversational systems, Comput Speech Lang. 54 (2019), 17-30. https://doi.org/10.1016/j.csl.2018.06.003
- E. Chung, H. W. Kim, and H. J. Song, Sentence model based subword embeddings for a dialog system, ETRI J. 44 (2022), 599-612. https://doi.org/10.4218/etrij.2020-0245
- M. Park, H. J. Song, and D. Kang, Imbalanced classification via feature dictionary-based minority oversampling, IEEE Access 10 (2022), 34236-34245. https://doi.org/10.1109/ACCESS.2022.3161510
- K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, (IEEE Conference on Computer Vision and Pattern Recognition), 2016, pp. 770-778.
- E. Chung, H. W. Kim, M. Park, and H. J. Song, Multi-modal approach for FASCODE-EVAL, (Annual Conference on Human and Language Technology), 2021, pp. 514-517.
- E. Chung, H. W. Kim, H. Oh, and H. J. Song, Dataset for interactive recommendation system, (Annual Conference on Human and Language Technology), 2020, pp. 481-485.
- E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Pa,sca, and A. Soroa, A study on similarity and relatedness using distributional and WordNet-based approaches, (Proceedings of NAACL, Boulder, CO, USA), 2009, pp. 19-27.
- J. Park, Koelectra: Pretrained electra model for Korean, 2020. https://github.com/monologg/KoELECTRA
- A. Jain, P. K. Singh, and J. Dhar, Multi-objective item evaluation for diverse as well as novel item recommendations, Expert Syst. Appl. 139 (2020), 112857.