DOI QR코드

DOI QR Code

Constructing Tagged Corpus and Cue Word Patterns for Detecting Korean Hedge Sentences

한국어 Hedge 문장 인식을 위한 태깅 말뭉치 및 단서어구 패턴 구축

  • 정주석 (대구대학교 대학원 컴퓨터정보공학과) ;
  • 김준혁 (대구대학교 정보통신대학 컴퓨터.IT공학부) ;
  • 김해일 (대구대학교 정보통신대학 컴퓨터.IT공학부) ;
  • 오성호 (대구대학교 정보통신대학 컴퓨터.IT공학부) ;
  • 강신재 (대구대학교 정보통신대학 컴퓨터.IT공학부)
  • Received : 2011.11.19
  • Accepted : 2011.12.12
  • Published : 2011.12.25

Abstract

A hedge is a linguistic device to express uncertainties. Hedges are used in a sentence when the writer is uncertain or has doubt about the contents of the sentence. Due to this uncertainty, sentences with hedges are considered to be non-factual. There are many applications which need to determine whether a sentence is factual or not. Detecting hedges has the advantage in information retrieval, and information extraction, and QnA systems, which make use of non-hedge sentences as target to get more accurate results. In this paper, we constructed Korean hedge corpus, and extracted generalized hedge cue-word patterns from the corpus, and then used them in detecting hedges. In our experiments, we achieved 78.6% in F1-measure.

Hedge는 불확실함을 나타내는 언어적 표현으로, 저자가 자신의 글에 내포된 내용이 불확실하거나 의심이 갈 때 사용한다. 이러한 불확실성 때문에 hedge가 포함된 문장은 사실이 아닌 문장으로 간주된다. 문장이 사실인지 아닌지를 판단하는 것은 여러 응용에서 사용될 수 있는데, 정보검색, 정보추출, 질의응답 등의 응용분야에서 전처리 과정으로 사용되어, 보다 정확한 결과를 얻게 한다. 본 논문에서는 한국어 hedge 말뭉치를 구축하고, 이로부터 hedge 단서 어구들을 추출하여 일반화된 단서어구 패턴을 구축한 후, 한국어 hedge 인식 실험을 하였다. 실험을 통하여 78.6%의 F1-measure값을 얻을 수 있었다.

Keywords

References

  1. G. Lakoff, "Hedges: a study in meaning critera and the logic of fuzzy concepts", Chicago Linguistics Society Papers, vol.8 pp.183-228, 1972.
  2. J. Holmes, "Doubt and Certainty in ESL Textbooks", Applied Linguistics, vol.9, no.1, pp.21-44, 1988. https://doi.org/10.1093/applin/9.1.21
  3. R. Farkas, V. Vincze, G. Mora, J. Csirik, and G. Szarvas, "The CoNLL 2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text", In Proceedings of the Shared Task, 14th Conference on Computational Natural Language Learning, Sweden, pp.1-12, 2010.
  4. M. Light, X. Y. Qiu, and P. Srinivasan, "The language of bioscience: facts, speculations, and statements in between", In Proceedings of BioLINK-2004: Linking Biological Literature, Ontologies and Databases, pp.17-24, 2004.
  5. B. Medlock, and T. Briscoe, "Weakly supervised learning for hedge classification in scientific literature", In Proceedings of 45th Meeting of the Association for Computational Linguistics, pp.992-999, 2007.
  6. G. Szarvas, "Hedge classification in biomedical texts with a weakly supervised selection of keywords", In Proceedings of 46th Meeting of the Association for Computational Linguistics, pp.281-289, 2008.
  7. R. Morante, and W. Daelemans, "Learning the scope of hedge cues in biomedical texts", In Proceedings of the BioNLP-2009 Workshop, pp.28-36, 2009.
  8. B. Tang, X. Wang, X. Wang, B. Yuan, and S. Fan, "A Cascade Method for Detecting Hedges and their Scope in Natural Language Text", In Proceedings of the Shared Task, 14th Conference on Computational Natural Language Learning, Sweden, pp.13-17, 2010.
  9. M. Georgescul, "A Hedgehop over a Max-Margin Framework Using Hedge Cues", In Proceedings of the Shared Task, 14th Conference on Computational Natural Language Learning, Sweden, pp.26-31, 2010.