Relation Extraction based on Extended Composite Kernel using Flat Lexical Features

평면적 어휘 자질들을 활용한 확장 혼합 커널 기반 관계 추출

  • 최성필 (한국과학기술정보연구원 정보기술연구실) ;
  • 정창후 (한국과학기술정보연구원 정보기술연구실) ;
  • 최윤수 (한국과학기술정보연구원 정보기술연구실) ;
  • 맹성현 (한국과학기술원 전산학과)
  • Published : 2009.08.15


In order to improve the performance of the existing relation extraction approaches, we propose a method for combining two pivotal concepts which play an important role in classifying semantic relationships between entities in text. Having built a composite kernel-based relation extraction system, which incorporates both entity features and syntactic structured information of relation instances, we define nine classes of lexical features and synthetically apply them to the system. Evaluation on the ACE RDC corpus shows that our approach boosts the effectiveness of the existing composite kernels in relation extraction. It also confirms that by integrating the three important features (entity features, syntactic structures and contextual lexical features), we can improve the performance of a relation extraction process.

본 논문에서는 기존의 관계 추출 성능을 향상시키기 위해서 기존의 자질 기반 방법에서 추구하였던 개체 주변 문맥 다양성 정보의 추출 및 적용과 커널 기반 방법의 강점인 관계 인스턴스에 대한 구문 구조적 자질 정보의 통합 활용을 통한 확장된 혼합 커널을 제안한다. ACE RDC 코퍼스를 활용한 실험에서, 기존의 합성곱 구문 트리 커널 기반 혼합 커널을 기반으로 총 9 종류의 평면적 어휘 자질 집합을 정의하고 이를 적용함으로써 성능 향상에 기여하는 어휘 자질 유형을 파악할 수 있었으며, 적은 규모의 학습 집합으로도 현재 최고 수준의 성능에 필적하는 결과를 얻을 수 있었다. 결론적으로 관계 추출을 위한 세 가지 핵심 정보, 즉 개체 자질, 구문 구조적 자질, 주변 문맥 어휘 자질을 통합 적용하면 관계 추출의 성능을 향상시킬 수 있음을 알 수 있었다.


  1. Culotta, A., Sorensen, J., 'Dependency Tree Kernels for Relation Extraction,' Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, 2004
  2. Bunescu, R. C., Mooney, R. J., 'A Shortest Path Dependency Kernel for Relation Extraction,' Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, B.C., pp.724-731, 2005
  3. Bunescu, R. C., Mooney, R. J., 'Subsequence Kernels for Relation Extraction,' Advances in Neural Information Processing Systems, 2006
  4. GuoDong Z., Min Z., Dong H. J., QiaoMing Z., 'Tree Kernel-based Relation Extraction with Context- Sensitive Structured Parse Tree Information,' Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, pp.728-736, June 2007
  5. Zhang, M., Zhang, J., Su, J., Zhou, G., 'A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features,' 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pp.825-832, 2006
  6. Zhao, S. B., Grishman, R., 'Extracting Relations with Integrated Information Using Kernel Methods,' ACL-2005, 2005
  7. Kambhatla N., 'Combining lexical, syntactic and semantic features with Maximum Entropy models for extracting relations,' ACL’2004 (Poster), pp. 178-181. 21-26 July 2004, Barcelona, Spain
  8. GuoDong Z., Su J. Zhang J. and Zhang M., 'Exploring various knowledge in relation extraction,' ACL’2005, pp.427-434, 25-30 June, Ann Arbor, Michgan, USA, 2005
  9. Zelenko, D., Aone, C., Richardella, A., 'Kernel Methods for Relation Extraction,' Journal of Machine Learning Research 3, pp.1083-1106, 2003
  10. Collins, M., Duffy, N., 'Convolution Kernels for Natural Language,' NIPS-2001, 2001
  11. Alessandro Moschitti, 'Making tree kernels practical for natural language learning,' Proceedings of EACL’06, Trento, Italy
  12. Zhang, M., GuoDong, Z., Aiti, A., 'Exploring syntactic structured features over parse trees for relation extraction using kernel methods,' Information Processing and Management, vol.44, pp.687-701, 2008
  13. Alessandro Moschitti, 'Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees,' Proceedings of the 17th European Conference on Machine Learning, Berlin, Germany, 2006
  14. Thorsten Joachims, 'SVM Light,' http://svmlight., 2008
  15. Bernhard Scholkopf, Alexander J. Smola, Learning with Kernels, The MIT Press, Cambridge, Massachusetts, London, England, 2002
  16. Linguistic Data Consortium (LDC), 'Automatic Content Extraction,' ACE/