질의응답시스템에서 정답 특징에 관한 실험적 분석

Experimental Analysis of Correct Answer Characteristics in Question Answering Systems

  • 투고 : 2018.04.15
  • 심사 : 2018.05.25
  • 발행 : 2018.05.31


자연어 질문에 대해 답변을 찾아 제공하는 질의응답시스템의 오류에 가장 큰 영향을 미치는 요소 중 하나가 질문으로 정답을 포함하고 있을 만한 문서나 단락을 검색하는 단계이다. 검색의 성능 향상을 위해서는 정답 포함 문서 및 단락의 특징을 잘 이해해야 한다. 본 논문은 질문, 정답 포함 문서, 정답 미포함 문서로 구성된 말뭉치를 사용하여 정답 문서에는 질문 단어가 얼마나 많이 출현하는지, 출현 위치는 어떻게 분포하는지, 질문과 정답 문서의 주제는 얼마나 유사한지 등을 실험적으로 분석한다. 이를 통해 질의응답시스템을 위한 기존의 검색 연구 결과들에 대한 원인을 설명하고 효과적인 검색 단계의 필요 요소에 관해 논의한다.

One of the factors that have the greatest influence on the error of the question answering system that finds and provides answers to natural language questions is the step of searching for documents or passages that contain correct answers. In order to improve the retrieval performance, it is necessary to understand the characteristics of documents and passages containing correct answers. This paper experimentally analyzes how many question words appear in the correct answer documents, how the location of the question word is distributed, and how the topic of the question and the correct answer document are similar using the corpus composed of the question, the documents with correct answer, and the documents without correct answer. This study explains the causes of previous search research results for question answer system and discusses the necessary elements of effective search step.



  1. S. Abney, M. Collins, and A. Singhal, "Answer extraction," in Proceedings of the sixth Conference on Applied Natural Language Processing, Seattle:WA, pp. 296-301, April 2000.
  2. D. Moldovan, M. Pasca, S. Harabagiu, and M. Surdeanu, "Performance Issues and Error Analysis in an Open-Domain Question Answering System," ACM Transactions on Information Systems, Vol. 21, No. 2, pp. 133-154, April 2003.
  3. X. Yao, B. V. Durme, and P. Clark, "Automatic Coupling of Answer Extraction and Information Retrieval," in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, pp. 159-165. August 2013.
  4. J. Tiedemann and J. Mur, "Simple is Best: Experiments with Different Document Segmentation Strategies for Passage Retrieval," in Proceedings of the 2nd Workshop on Information Retrieval for Question Answering(IRQA 08 Coling 2008), pp. 17-25, Manchester, UK, August 2008.
  5. H. Saggion, R. Gaizauskas, M. Hepple, I. Roberts, and M. A. Greenwood, "Exploring the Performance of Boolean Retrieval Strategies for Open Domain Question Answering," in Proceedings of the Information Retrieval for Question Answering(IR4QA) Workshop at SIGIR, 2004.
  6. L. van der Plas and J. Tiedemann, "Using Lexico-Semantic Information for Query Expansion in Passage Retrieval for Question Answering," in Proceedings of the 2nd Workshop on Information Retrieval for Question Answering(IRQA 08 Coling 2008), pp. 50-57, Manchester, UK, August 2008.
  7. I. Roberts and R. Gaizauskas, "Evaluating Passage Retrieval Approaches for Question Answering," in Proceedings 26th European Conference on IR Research(ECIR 2004), pp. 72-84, Sunderland, UK, April 2004.
  8. A. Ittycheriah, M. Franz, and S. Roukos, "IBM's Statistical Question Answering System-TREC-10," in Proceedings of the 10th Text Retrieval Conference (TREC-10), pp. 258-264, Gaithersburg:MD, November 2001.
  9. G. G. Lee, J. Seo, S. Lee, H. Jung, B. H. Cho, C. Lee, B. K. Kwak, J. Cha, D. Kim, J. An, H. Kim, and K. Kim, "SiteQ: Engineering High Performance QA System Using Lexico-Semantic Pattern Matching and Shallow NLP," in Proceedings of the 10th Text Retrieval Conference (TREC-10), pp. 442-451, Gaithersburg:MD, November 2001.
  10. S. Tellex, B. Katz, J. Lin, A. Fernandes, and G. Marton, "Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering," in Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR '03), pp. 41-47, Toronto, Canada, July 2003.
  11. E. M. Voorhees, "Overview of the TREC 2003 Question Answering Track," in Proceedings of the 12th Text Retrieval Conference (TREC 2003), pp. 54-68, 2003.
  12. J. Lin and B. Katz, "Building a Reusable Test Collection for Question Answering," Journal of the American Society for Information Science and Technology, Vol. 57, No. 7. pp.851-861, 2006.
  13. C. Fellbaum, WordNet: An Electronic Lexical Database, Cambridge, MA: MIT Press, 1998.
  14. T. Tao and C. Zhai, "An Exploration of Proximity Measures in Information Retrieval," in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp. 295-302, 2007.
  15. The Apache Software Foundation, Apache Lucene [Internet]. Available:
  16. C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky, "The Stanford CoreNLP Natural Language Processing Toolkit," in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60. 2014.
  17. S. Bird, E. Loper, and E. Klein, Natural Language Processing with Python, O'Reilly Media Inc., 2009
  18. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, "Scikit-learn: Machine Learning in Python", Journal of Machine Learning Research, Vol. 12, pp. 2825-2830, 2011.
  19. R. Rehurek and P. Sojka, "Software Framework for Topic Modelling with Large Corpora", in Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, pp. 45-50, May 2010.
  20. D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, Vol. 3, pp. 993-1022, January 2003.
  21. K. Kim, H. J. Song, and N. Moon, "Topic Modeling for Automatic Classification of Learner Question and Answer in Teaching-Learning Support," Journal of Digital Contents Society, Vol. 18, No. 2, pp. 339-346, April 2017.
  22. K. S. Han, "Dualized Topic-Preserving Pseudo Relevance Feedback for Question Answering," IEICE Transactions on Information and Systems, Vol. E100-D, No. 7, pp. 1550-1553, July 2017.