DOI QR코드

DOI QR Code

Applying Lexical Semantics to Automatic Extraction of Temporal Expressions in Uyghur

  • Murat, Alim (School of Computer Science and Technology, Xinjiang Normal University) ;
  • Yusup, Azharjan (School of Computer Science and Technology, Xinjiang Normal University) ;
  • Iskandar, Zulkar (School of Computer Science and Technology, Xinjiang Normal University) ;
  • Yusup, Azragul (School of Computer Science and Technology, Xinjiang Normal University) ;
  • Abaydulla, Yusup (School of Computer Science and Technology, Xinjiang Normal University)
  • 투고 : 2017.02.22
  • 심사 : 2017.07.04
  • 발행 : 2018.08.31

초록

The automatic extraction of temporal information from written texts is a key component of question answering and summarization systems and its efficacy in those systems is very decisive if a temporal expression (TE) is successfully extracted. In this paper, three different approaches for TE extraction in Uyghur are developed and analyzed. A novel approach which uses lexical semantics as an additional information is also presented to extend classical approaches which are mainly based on morphology and syntax. We used a manually annotated news dataset labeled with TIMEX3 tags and generated three models with different feature combinations. The experimental results show that the best run achieved 0.87 for Precision, 0.89 for Recall, and 0.88 for F1-Measure in Uyghur TE extraction. From the analysis of the results, we concluded that the application of semantic knowledge resolves ambiguity problem at shallower language analysis and significantly aids the development of more efficient Uyghur TE extraction system.

키워드

참고문헌

  1. H. Jung and A. Stent, "ATT1: temporal annotation using big windows and rich syntactic and semantic features," in Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics (*SEM), and the 7nth International Workshop on Semantic Evaluation (SemEval), Atlanta, GA, 2013, pp. 20-24.
  2. N. Chambers, "NavyTime: event and time ordering from raw text," US Naval Academy, Annapolis, MD, 2013.
  3. P. Jindal and D. Roth, "Extraction of events and temporal expressions from clinical narratives," Journal of Biomedical Informatics, vol. 46, pp. S13-S19, 2013. https://doi.org/10.1016/j.jbi.2013.08.010
  4. M. Filannino and G. Nenadic, "Temporal expression extraction with extensive feature type selection and a posteriori label adjustment," Data & Knowledge Engineering, vol. 100, pp. 19-33, 2015. https://doi.org/10.1016/j.datak.2015.09.002
  5. Azragul, A. Murat, and Y. Abaydula, "Research on method for Uyghur temporal word recognition," International Journal of Database Theory and Application, vol. 9, no. 1, pp. 209-216, 2016.
  6. J. Lin, D. Cao, and C. Yuan, "Automatic TIMEX2 tagging of Chinese temporal information," Journal of Tsinghua University, vol. 48, no. 1, pp. 117-120, 2008.
  7. J. Strotgen and M. Gertz, "HeidelTime: high quality rule-based extraction and normalization of temporal expressions," in Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, 2010, pp. 321-324.
  8. A. X. Chang and C. D. Manning, "SUTime: a library for recognizing and normalizing time expressions," in Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, 2012, pp. 3735-3740.
  9. Y. K. Lin, H. Chen, and R. A. Brown, "MedTime: a temporal information extraction system for clinical narratives," Journal of Biomedical Informatics, vol. 46, pp. S20-S28, 2013. https://doi.org/10.1016/j.jbi.2013.07.012
  10. H. Llorens, E. Saquete, and B. Navarro-Colorado, "Applying semantic knowledge to the automatic processing of temporal expressions and events in natural language," Information Processing & Management, vol. 49, no. 1, pp. 179-197, 2013. https://doi.org/10.1016/j.ipm.2012.05.005
  11. C. Fellbaum, WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press, 1998.
  12. D. Nadeau and S. Sekine, "A survey of named entity recognition and classification," Lingvisticae Investigationes, vol. 30, no. 1, pp. 3-26, 2007. https://doi.org/10.1075/li.30.1.03nad
  13. J. Lafferty, A. McCallum, and F. C. Pereira, "Conditional random fields: probabilistic models for segmenting and labeling sequence data," in Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, 2001, pp. 282-289.
  14. A. Abdurehim, "Automatic inference of affix variants in Uyghur based on POS-tagging corpus," Computer Knowledge and Technology, vol. 12, no. 28, pp. 171-173, 2016.
  15. Nurehmet, Azragul, and Y. Abaidulla, "The research of modern Uyghur language sentence constituents analysis technology," Computer Engineering and Science, vol. 2015, no. 12, pp. 2339-2344, 2015.
  16. H. Isahara, F. Bond, K. Uchimoto, M. Utiyama, and K. Kanzaki, "Development of the Japanese WordNet," in Proceedings of the International Conference on Language Resources and Evaluation, Marrakech, Morocco, 2008.
  17. M. Montazery and H. Faili, "Automatic Persian wordnet construction," in Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Beijing, China, 2010, pp. 846-850.
  18. S. Thoongsup, K. Robkop, C. Mokarat, T. Sinthurahat, T. Charoenporn, V. Sornlertlamvanich, and H. Isahara, "Thai WordNet construction," in Proceedings of the 7th Workshop on Asian Language Resources, Singapore, 2009, pp. 139-144.
  19. O. Kolomiyets, S. Bethard, and M. F. Moens, "Model-portability experiments for textual temporal analysis," in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, Portland, OR, 2011, pp. 271-276.