DOI QR코드

DOI QR Code

Named Entity Recognition Using Distant Supervision and Active Bagging

원거리 감독과 능동 배깅을 이용한 개체명 인식

  • 이성희 (강원대학교 컴퓨터정보통신공학전공) ;
  • 송영길 (강원대학교 컴퓨터정보통신공학전공) ;
  • 김학수 (강원대학교 컴퓨터정보통신공학전공)
  • Received : 2015.09.24
  • Accepted : 2015.11.12
  • Published : 2016.02.15

Abstract

Named entity recognition is a process which extracts named entities in sentences and determines categories of the named entities. Previous studies on named entity recognition have primarily been used for supervised learning. For supervised learning, a large training corpus manually annotated with named entity categories is needed, and it is a time-consuming and labor-intensive job to manually construct a large training corpus. We propose a semi-supervised learning method to minimize the cost needed for training corpus construction and to rapidly enhance the performance of named entity recognition. The proposed method uses distance supervision for the construction of the initial training corpus. It can then effectively remove noise sentences in the initial training corpus through the use of an active bagging method, an ensemble method of bagging and active learning. In the experiments, the proposed method improved the F1-score of named entity recognition from 67.36% to 76.42% after active bagging for 15 times.

개체명 인식은 문장에서 개체명을 추출하고 추출된 개체명의 범주를 결정하는 작업이다. 기존의 개체명 인식 연구는 주로 지도 학습 기법이 사용되어 왔다. 지도 학습을 위해서는 개체명 범주가 수동으로 부착된 대용량의 학습 말뭉치가 필요하며, 대용량의 학습 말뭉치를 수동으로 구축하는 것은 시간과 인력이 많이 들어가는 일이다. 본 논문에서는 학습 말뭉치 구축비용을 최소화하면서 개체명 인식 성능을 빠르게 향상시키기 위한 준지도 학습 방법을 제안한다. 제안 방법은 초기 학습 말뭉치를 구축하기 위해 원거리 감독법을 사용한다. 그리고 배깅과 능동 학습을 결합한 앙상블 기법의 하나인 능동 배깅을 사용하여 초기 학습 말뭉치에 포함된 노이즈 문장을 효과적으로 제거한다. 실험 결과, 15회의 능동 배깅을 통해 개체명 인식 F1-점수를 67.36%에서 76.42%로 향상시켰다.

Keywords

Acknowledgement

Supported by : 엔씨소프트, 강원대학교

References

  1. A. Mikheev, C. Grover, and M. Moens, "Discription of the LTG System Used for MUC-7," Proc. of the 7th Message Understanding Conference, 1998.
  2. T. Noh, S. Lee, "Extraction and Classification of Proper Nouns by Rule - based Machine Learning," Proc. of the KIISE Korea Computer Congress 2000, Vol. 27, No. 2, pp. 170-172, 2000.
  3. K. Lee, J. Lee, M. Choi, and G. Kim, "Study on Named Entity Recognition in Korean Text," Proc. of the HCLT, pp. 292-299, 2000.
  4. Y. Hwang, H. Lee, E. Chung, B. Yun, and S. Park, "Korean Named Entity Recognition Based on Supervised Learning Using Named Entity Construction Principles," Proc. of the HCLT, pp. 111-117, 2002.
  5. K. Uchimoto, Q. Ma, M. Murata, H. Ozakum, and H. Isahara, "Named Entity Extraction Based on A ME Model and Transformation Rules," Proc. of the ACL, 2000.
  6. A. Blum, Semi-supervised Learning, Encyclopedia of Algorithms, pp. 1-7, Jan, New York, 2015.
  7. M. Mintz, S. Bills, R. Snow, and D. Jurafsky, "Distant supervision for relation extraction without labeled data," Proc. of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Vol. 2, pp. 1003-1011, 2009.
  8. K. Ha, S. Cho, and D. MacLachlan, "Response models based on bagging neural networks," Journal of Interactive Marketing, Vol. 19, No. 1, pp. 17-30, 2005. https://doi.org/10.1002/dir.20028
  9. D. A. Cohn, Z. Ghahramani, and M. I. Jordan, "Active learning with statistical models," Journal of artificial intelligence research, 1996.
  10. A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman, "NYU: Description of the MENE named entity system as used in MUC-7," Proc. of the Seventh Message Understanding Conference, 1998.
  11. C. Lee, M. Jang, "Named Entity Recognition with Structural SVMs and Pegasos algorithm," Journal of Cognitive Science, Vol. 21, No. 4, pp. 655-667, 2010. https://doi.org/10.19066/cogsci.2010.21.4.009
  12. C. Lee, Y. Hwang, H. Oh, S. Lim, J. Heo, C. Lee, H. Kim, J. Wang, and M. Jang, "Fine-Grained Named Entity Recognition using Conditional Random Fields for Question Answering," Proc. of the HCLT, pp. 268-272, 2006.
  13. Y. Kim, "Automatic training corpus generation method of Named Entity Recognition using Big data," M.S. Thesis, Sogang University, 2015.
  14. J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," Proc. of the ICML, pp. 282-289, 2001.
  15. Y. Song, H. Kim, "Semi-automatic Construction of a Named Entity dictionary Based on Active Learning," Proc. of the Computer Science and its Applications Lecture Notes in Electrical Engineering, Vol. 330, pp. 65-70, 2015.
  16. Y. Park, S. Kang, B. Kyu, and J. Seo, "Title Named Entity Recognition using Wikipedia and Making Acronym," Proc. of the KIISE Korea Computer Congress 2013, pp. 637-639, 2013.