DOI QR코드

DOI QR Code

ExoTime: Temporal Information Extraction from Korean Texts Using Knowledge Base

  • 투고 : 2017.10.21
  • 심사 : 2017.12.04
  • 발행 : 2017.12.29

초록

Extracting temporal information from documents is becoming more important, because it can be used to various applications such as Question-Answering (QA) systems, Recommendation systems, or Information Retrieval (IR) systems. Most previous studies only focus on English documents, and they are not applicable to the other languages due to the inherent characteristics of languages. In this paper, we propose a new system, named ExoTime, designed to extract temporal information from Korean documents. The ExoTime adopts an external Knowledge Base (KB) in order to achieve better prediction performance, and it also applies a bagging method to the temporal relation prediction. We show that the effectiveness of the proposed approaches by empirical results using Korean TimeBank. The ExoTime system works as a part of ExoBrain that is an artificial intelligent QA system.

키워드

참고문헌

  1. Y. S. Jeong, Z. M. Kim, H. W. Do, C. G. Lim, and H. J. Choi, "Temporal Information Extraction from Korean Texts," In Proceedings of the 19th Conference on Computational Language Learning, Beijing, China, pp. 279-88, 2015.
  2. J. Pustejovsky, J. Castano, R. Ingria, R. Sauri, R. Gaizauskas, A. Setzer, and G. Katz, "TimeML: Robust Specication of Event and Temporal Expressions in Text," In New Directions in Question Answering, Stanford, USA, pp. 28-34, 2003.
  3. T. Caselli, V. B. Lenzi, R. Sprugnoli, E. Pianta, and I. Prodanof, "Annotating Events, Temporal Expressions and Relations in Italian: the It-TimeML Experience for the Ita-TimeBank," In Proceedings of the Fifth Law Workshop, Partland, Oregon, pp. 143-51, 2011.
  4. S. Im, H. You, H. Jang, S. Nam, and H. Shin, "KTimeML: Specication of Temporal and Event Expressions in Korean Text," In Proceedings of the 7th Workshop on Asian Language Resources, Suntec, Singapore, pp. 115-22, 2009.
  5. M. Verhagen, R. J. Gaizauskas, F. Schilder, M. Hepple, J. Moszkowicz, and J. Pustejovsky, "The TempEval Challenge: Identifying Temporal Relations in Text," Language Resources and Evaluation, Vol. 43, No. 2, pp. 161-79, 2009. https://doi.org/10.1007/s10579-009-9086-z
  6. M. Verhagen, R. Sauri, T. Caselli, and J. Pustejovsky, "SemEval-2010 task 13: TempEval-2," In Proceedings of the Fifth International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 57-62, 2010.
  7. N. UzZaman, H. Llorens, L. Derczynski, M. Verhagen, J. Allen, and J. Pustejovsky, "SemEval-2013 Task 1: TEMPEVAL-3: Evaluating Time Expressions, Events, and Temporal Relations," In Proceedings of the Seventh International Workshop on Semantic Evaluation, Atlanta, Georgia, USA, pp. 1-9, 2013.
  8. A. Berglund, R. Johansson, and P. Nugues, "A Machine Learning Approach to Extract Temporal Information from Texts in Swedish and Generate Animated 3D Scenes," In Proceedings of the 11st Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, pp. 385-92, 2006.
  9. N. Chambers, S. Wang, and D. Jurafsky, "Classifying Temporal Relations Between Events," In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Prague, Czech Republic, pp. 173-6, 2007.
  10. N. UzZaman and J. Allen, "Event and Temporal Expression Extraction from Raw Text: First Step towards a Temporally Aware System," International Journal of Semantic Computing, Vol. 4, No. 4, pp. 487-508, 2010. https://doi.org/10.1142/S1793351X10001097
  11. J. Strotgen, M. Gertz, and P. Popov, "Extraction and Exploration of Spatio-Temporal Information in Documents," In Proceedings of the 6th Workshop on Geographic Information Retrieval Article, Zurich, Switzerland, pp. 698-706, 2010.
  12. B. Tang, Y. Wu, M. Jiang, Y. Chen, J. C. Denny, and H. Xu, "A Hybrid System for Temporal Information Extraction from Clinical Text," Journal of the American Medical Informatics Association, Vol. 20, No. 5, pp. 828-35, 2013. https://doi.org/10.1136/amiajnl-2013-001635
  13. T. Cassidy, "Temporal Information Extraction and Knowledge Base Population," PhD thesis, The City University of New York, 2014.
  14. N. Chambers, T. Cassidy, B. McDowell, and S. Bethard, "Dense Event Ordering with a Multi-Pass Architecture," Transactions of the Association for Computational Linguistics, Vol. 2, pp. 273-84, 2014.
  15. S. B. Jang, J. Baldwin, and I. Mani, "Automatic TIMEX2 Tagging of Korean News," ACM Transactions on Asian Language Information Processing, Vol. 3, No. 1, pp. 51-65, 2004. https://doi.org/10.1145/1017068.1017072
  16. C. N. Seon, S. Kang, and J. Seo, "Automatic Recognition and Normalization System of Korean Time Expression Using the Individual Time Units," Cognitive Science, Vol. 21, No. 4, pp. 447-58, 2010.
  17. Y. Kim and J. Choi, "Recognizing Temporal Information in Korean Clinical Narratives through Text Normalization," Healthcare Information Research, Vol. 17, No. 3, pp. 150-5, 2011. https://doi.org/10.4258/hir.2011.17.3.150
  18. G. Angeli and J. Uszkoreit, "Language-Independent Discriminative Parsing of Temporal Expressions," In Proceedings of the 51th Annual Meeting of the Association for Computational Linguistics, Soa, Bulgaria, 2013.
  19. Y. S. Jeong, W. T. Joo, H. W. Do, C. G. Lim, K. S. Choi, and H. J. Choi, "Korean TimeML and Korean TimeBank," In Proceedings of the 10th edition of the Language Resources and Evaluation Conference, Portoroz, Slovenia, pp. 356-9, 2016.
  20. S. Lim, C. K. Lee, J. Hur, and M. G. Jang, "Syntax Analysis of Enumeration type and Parallel Type Using Maximum Entropy Model," In Proceedings of the Korea Human Computer Interaction Conference, 1240-5, 2006.
  21. E. F. T. K. Sang, and S. Buchholz, "Introduction to the CoNLL-2000 Shared Task: Chunking," In Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning, Lisbon, Portugal, pp. 127-32, 2000.
  22. Y. S. Jeong and H. J. Choi, "Language Independent Feature Extractor," In Proceedings of the Twenty-Ninth AAAI Conference on Articial Intelligence, Texas, USA, pp. 4170-1, 2015.
  23. J. Strotgen, and M. Gertz, "HeidelTime: High Quality Rule-based Extraction and Normalization of Temporal Expressions," In Proceedings of the Fifth International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 321-4, 2010.
  24. H. Jung and A. Stent, "ATT1: Temporal Annotation Using Big Windows and Rich Syntactic and Semantic Features," In Proceedings of the Seventh International Workshop on Semantic Evaluation, Atlanta, Georgia, pp. 20-4, 2013.
  25. CRF++ library, http://crfpp.googlecode.com/svn/trunk/doc/index.html
  26. MEM toolkit, http://homepages.inf.ed.ac.uk/lzhang10/maxent_toolkit