DOI QR코드

DOI QR Code

Performance Comparison Analysis on Named Entity Recognition system with Bi-LSTM based Multi-task Learning

다중작업학습 기법을 적용한 Bi-LSTM 개체명 인식 시스템 성능 비교 분석

  • Kim, GyeongMin (Department of Computer Science and Engineering, Korea University) ;
  • Han, Seunggnyu (Department of Computer Science and Engineering, Korea University) ;
  • Oh, Dongsuk (Department of Computer Science and Engineering, Korea University) ;
  • Lim, HeuiSeok (Department of Computer Science and Engineering, Korea University)
  • Received : 2019.10.28
  • Accepted : 2019.12.20
  • Published : 2019.12.28

Abstract

Multi-Task Learning(MTL) is a training method that trains a single neural network with multiple tasks influences each other. In this paper, we compare performance of MTL Named entity recognition(NER) model trained with Korean traditional culture corpus and other NER model. In training process, each Bi-LSTM layer of Part of speech tagging(POS-tagging) and NER are propagated from a Bi-LSTM layer to obtain the joint loss. As a result, the MTL based Bi-LSTM model shows 1.1%~4.6% performance improvement compared to single Bi-LSTM models.

다중작업학습(Multi-Task Learning, MTL) 기법은 하나의 신경망을 통해 다양한 작업을 동시에 수행하고 각 작업 간에 상호적으로 영향을 미치면서 학습하는 방식을 말한다. 본 연구에서는 전통문화 말뭉치를 직접 구축 및 학습데이터로 활용하여 다중작업학습 기법을 적용한 개체명 인식 모델에 대해 성능 비교 분석을 진행한다. 학습 과정에서 각각의 품사 태깅(Part-of-Speech tagging, POS-tagging) 과 개체명 인식(Named Entity Recognition, NER) 학습 파라미터에 대해 Bi-LSTM 계층을 통과시킨 후 각각의 Bi-LSTM을 계층을 통해 최종적으로 두 loss의 joint loss를 구한다. 결과적으로, Bi-LSTM 모델을 활용하여 단일 Bi-LSTM 모델보다 MTL 기법을 적용한 모델에서 1.1%~4.6%의 성능 향상이 있음을 보인다.

Keywords

References

  1. S. Ruder. (2017). An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098.
  2. R. Caruana. (1997). Multitask learning. Machine learning, 28(1), 41-75. https://doi.org/10.1023/A:1007379606734
  3. M. Long & J. Wang. (2015). Learning multiple tasks with deep relationship networks. arXiv preprint arXiv:1506.02117, 2.
  4. Y. Zhang, Y. Wei & Q. Yang. (2018). Learning to multitask. In Advances in Neural Information Processing Systems (pp. 5771-5782).
  5. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado & J. Dean. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119).
  6. J. Pennington, R. Socher & C. Manning. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).
  7. P. Bojanowski, E. Grave, A. Joulin & T. Mikolov. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135-146. https://doi.org/10.1162/tacl_a_00051
  8. M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee & L. Zettlemoyer. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
  9. J. Devlin, M. W. Chang, K. Lee & K. Toutanova. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 .
  10. Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov & Q. V. Le. (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv preprint arXiv:1906.08237 .
  11. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen & V. Stoyanov. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  12. A. Lamurias, D. Sousa, L. A. Clarke & F. M. Couto. (2019). BO-LSTM: classifying relations via long short-term memory networks along biomedical ontologies. BMC bioinformatics, 20(1), 10. https://doi.org/10.1186/s12859-018-2584-5
  13. C. Lyu, B. Chen, Y. Ren & D. Ji. (2017). Long short-term memory RNN for biomedical named entity recognition. BMC bioinformatics, 18(1), 462. https://doi.org/10.1186/s12859-017-1868-5
  14. A. R. Tuor, R. Baerwolf, N. Knowles, B. Hutchinson, N. Nichols & R. Jasper. (2018, June). Recurrent neural network language models for open vocabulary event-level cyber anomaly detection. In Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence.
  15. S. KP. (2019). RNNSecureNet: Recurrent neural networks for Cyber security use-cases. arXiv preprint arXiv:1901.04281 .
  16. G. Kim, K. Kim, J. Jo & H. Lim. (2018). Constructing for Korean Traditional culture Corpus and Development of Named Entity Recognition Model using Bi-LSTM-CNN-CRFs. Journal of the Korea Convergence Society, 9(12), 47-52. DOI : 10.15207/jkcs.2018.9.12.047
  17. D. Lee, W. Yu & H. Lim. (2017). Bi-directional LSTM-CNN-CRF for Korean Named Entity Recognition System with Feature Augmentation. Journal of the Korea Convergence Society, 8(12), 55-62. DOI : 10.15207/JKCS.2017.8.12.055