DOI QR코드

DOI QR Code

Compressing intent classification model for multi-agent in low-resource devices

저성능 자원에서 멀티 에이전트 운영을 위한 의도 분류 모델 경량화

  • Received : 2022.06.22
  • Accepted : 2022.07.24
  • Published : 2022.09.30

Abstract

Recently, large-scale language models (LPLM) have been shown state-of-the-art performances in various tasks of natural language processing including intent classification. However, fine-tuning LPLM requires much computational cost for training and inference which is not appropriate for dialog system. In this paper, we propose compressed intent classification model for multi-agent in low-resource like CPU. Our method consists of two stages. First, we trained sentence encoder from LPLM then compressed it through knowledge distillation. Second, we trained agent-specific adapter for intent classification. The results of three intent classification datasets show that our method achieved 98% of the accuracy of LPLM with only 21% size of it.

최근 자연어 처리 분야에서 대규모 사전학습 언어모델(Large-scale pretrained language model, LPLM)이 발전함에 따라 이를 미세조정(Fine-tuning)한 의도 분류 모델의 성능도 개선되었다. 하지만 실시간 응답을 요하는 대화 시스템에서 대규모 모델을 미세조정하는 방법은 많은 운영 비용을 필요로 한다. 이를 해결하기 위해 본 연구는 저성능 자원에서도 멀티에이전트 운영이 가능한 의도 분류 모델 경량화 방법을 제안한다. 제안 방법은 경량화된 문장 인코더를 학습하는 과제 독립적(Task-agnostic) 단계와 경량화된 문장 인코더에 어답터(Adapter)를 부착하여 의도 분류 모델을 학습하는 과제 특화적(Task-specific) 단계로 구성된다. 다양한 도메인의 의도 분류 데이터셋으로 진행한 실험을 통해 제안 방법의 효과성을 입증하였다.

Keywords

References

  1. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
  2. Casanueva, I., Temcinas, T., Gerz, D., Henderson, M., & Vulic, I. (2020). Efficient intent detection with dual sentence encoders. arXiv preprint arXiv:2003.04807.
  3. Cho, W. I., Kim, J. I., Moon, Y. K., & Kim, N. S. (2020, May). Discourse component to sentence (DC2S): An efficient human-aided construction of paraphrase and sentence similarity dataset. In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 6819-6826).
  4. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  5. Gao, T., Yao, X., & Chen, D. (2021). Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821.
  6. Henderson, M., Casanueva, I., Mrksic, N., Su, P. H., Wen, T. H., & Vulic, I. (2019). ConveRT: Efficient and accurate conversational representations from transformers. arXiv preprint arXiv:1911.03688.
  7. Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., ... & Gelly, S. (2019, May). Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning (pp. 2790-2799). PMLR.
  8. Jeong, I., & Ahn, H. (2022). A study on the detection of fake news - The Comparison of detection performance according to the use of social engagement networks. Journal of Intelligence and Information Systems, 28(1), 197-216. https://doi.org/10.13088/JIIS.2022.28.1.197
  9. Kim, D., Lee, D., Park, J., Oh, S., Kwon, S., Lee, I., & Choi, D. (2022). KB-BERT: Training and Application of Korean Pre-trained Language Model in Financial Domain. Journal of Intelligence and Information Systems, 28(2), 191-206. https://doi.org/10.13088/JIIS.2022.28.2.191
  10. Larson, S., Mahendran, A., Peper, J. J., Clarke, C., Lee, A., Hill, P., ... & Mars, J. (2019). An evaluation dataset for intent classification and out-of-scope prediction. arXiv preprint arXiv: 1909.02027.
  11. Liu, X., Eshghi, A., Swietojanski, P., & Rieser, V. (2021). Benchmarking natural language understanding services for building conversational agents. In Increasing Naturalness and Flexibility in Spoken Dialogue Interaction (pp. 165-183). Springer, Singapore.
  12. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  13. Mehri, S., Eric, M., & Hakkani-Tur, D. (2020). Dialoglue: A natural language understanding benchmark for task-oriented dialogue. arXiv preprint arXiv:2009.13570.
  14. Park, H.-y., & Kim, K.-j. (2021). Recommender System using BERT Sentiment Analysis. Journal of Intelligence and Information Systems, 27(2), 1-15. https://doi.org/10.13088/JIIS.2021.27.2.001
  15. Park, S., Moon, J., Kim, S., Cho, W. I., Han, J., Park, J., ... & Cho, K. (2021). Klue: Korean language understanding evaluation. arXiv preprint arXiv:2105.09680.
  16. Pfeiffer, J., Ruckle, A., Poth, C., Kamath, A., Vulic, I., Ruder, S., ... & Gurevych, I. (2020). Adapterhub: A framework for adapting transformers. arXiv preprint arXiv:2007.07779.
  17. Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., & Wei, Y. (2020). Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 6398-6407).
  18. Wang, W., Bao, H., Huang, S., Dong, L., & Wei, F. (2020). Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers. arXiv preprint arXiv:2012.15828.
  19. Wei, J., Huang, C., Vosoughi, S., Cheng, Y., & Xu, S. (2021). Few-shot text classification with triplet networks, data augmentation, and curriculum learning. arXiv preprint arXiv: 2103.07552.
  20. Zhang, J., Bui, T., Yoon, S., Chen, X., Liu, Z., Xia, C., ... & Yu, P. (2021). Few-shot intent detection via contrastive pre-training and fine-tuning. arXiv preprint arXiv:2109.06349.
  21. Zhang, J. G., Hashimoto, K., Liu, W., Wu, C. S., Wan, Y., Yu, P. S., ... & Xiong, C. (2020). Discriminative nearest neighbor few-shot intent detection by transferring natural language inference. arXiv preprint arXiv:2010.13009.
  22. Zhang, Z., Takanobu, R., Zhu, Q., Huang, M., & Zhu, X. (2020). Recent advances and challenges in task-oriented dialog systems. Science China Technological Sciences, 63(10), 2011-2027. https://doi.org/10.1007/s11431-020-1692-3