Text Corpus-based Question Answering System

Kim, Han-Joon;Kim, Min-Kyoung;Chang, Jae-Young;

Journal of Digital Contents Society (디지털콘텐츠학회 논문지)

Volume 11 Issue 3
/
Pages.375-383
/
2010
/
1598-2009(pISSN)
/
2287-738X(eISSN)

Digital Contents Society (한국디지털콘텐츠학회)

Text Corpus-based Question Answering System

문서 말뭉치 기반 질의응답 시스템

김한준 (서울시립대학교 전자전기컴퓨터공학부) ;
김민경 (아시아투데이 인터넷부) ;
장재영 (한성대학교 컴퓨터공학과)

Received : 2010.08.16
Accepted : 2010.10.01
Published : 2010.09.30

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In developing question-answering (QA) systems, it is hard to analyze natural language questions syntactically and semantically and to find exact answers to given query questions. In order to avoid these difficulties, we propose a new style of question-answering system that automatically generate natural language queries and can allow to search queries fit for given keywords. The key idea behind generating natural queries is that after significant sentences within text documents are applied to the named entity recognition technique, we can generate a natural query (interrogative sentence) for each named entity (such as person, location, and time). The natural query is divided into two types: simple type and sentence structure type. With the large database of question-answer pairs, the system can easily obtain natural queries and their corresponding answers for given keywords. The most important issue is how to generate meaningful queries which can present unambiguous answers. To this end, we propose two principles to decide which declarative sentences can be the sources of natural queries and a pattern-based method for generating meaningful queries from the selected sentences.

질의응답시스템을 구축하는데 있어서 사용자 질의로 입력된 자연어 문장을 문법적 또는 의미적으로 완벽하게 분석하는 작업과 그 질의에 대한 정확한 답변을 찾아내는 작업은 쉬운 일이 아니다. 본 논문에서는 질의응답시스템 구축의 난제를 극복하기 위해, 문서 말뭉치에 기반하여 질의문을 자동 생성, 저장하여 이를 키워드로 검색하는 새로운 방식의 시스템을 제안한다. 질의문 생성을 위한 기본 아이디어는 수집 문서의 주요 문장에 대해 고유명사인식 기술을 활용하여 사람, 사물, 장소, 시간 등의 고유명사를 인식한 후, 각 고유명사에 해당하는 자연어 질의문을 생성하는 것이다. 질의문은 두가지 유형인 단순형 및 문장구조유지형 질의문으로 구분한다. 시스템은 이렇게 준비된 질의문 데이터베이스를 가지고 입력된 검색 키워드에 대하여 관련 질의문과 답변을 쉽게 얻을 수 있다. 본 연구의 관건은 생성된 질의문이 명확한 해답을 도출할 수 있는 의미있는 질의문을 생성하는 것이다. 이를 위해 본 연구에서는 질의문의 원천이 되는 평서문장을 선별하는 원칙과 선별된 평서문으로부터 의미있는 질의문을 생성하는 방법론을 제시한다.

Keywords

References

R. Srihari, W. Li, "A Question Answering System supported by Information Extraction", Proceedings of the 6th conference on Applied Natural Language processing, pp. 166 - 172, 2000
S. Dumais, M. Banko, E. Brill, J. Lin, A. Ng, Web Question Answering: is more always better?", Proceedings of the 25th ACM SIGIR, pp. 291-298, 2002
J. Chu-Carroll, J. Prager, C. Welty, K. Czuba, and D. Ferruci, "A multi-strategy and multi-source approach to question answering", Proceedings of the 10th Text Retrieval Conference (TREC), pp.281-288, 2002
J. Prager, J. Chi-Carroll, K. Czuba, C. Welty, A. Ittycheriah, R. Mahindru, "IBM's PIQUANT", Proceedings of the 11th Text Retrieval Conference (TREC), 2003
W. Salloum, "A Question Answering System based on Conceptual Graph Formalism", Proceedings of the 2nd International Symposium on Knowledge Acquisition and Modeling (KAM 2009), pp. 383-386, 2009
R. Florian., "Named Entity Recognition as a House of Cards: Classifier Stacking", Proceedings of CoNLL2002, pp.175-178, 2002
B. Katz. "From sentence processing to information access on the World Wide Web", Proceedings of AAAI Spring Symposium on Natural Language Processing for the World Wide Web, pp.77-94, 1997
C. Kwok, O. Etzioni, and D. S. Weld, "Scaling Question Answering to the Web", World Wide Web journal, Vol.10, pp.150-161, Hong Kong, 2001
E. Hovy, L. Gerber, U. Hermjakob, M. Junk, and C.-Y. Lin. "Question Answering in Webclopedia", Proceedings of the 9th Text Retrieval Conference (TREC), 2001
J. Kupiec. "MURAX: A robust linguistic approach for question answering using an online encyclopedia", Proceedings of the 16th ACM SIGIR, pp.181-190, 1993
W. Salloum, "A Question Answering System based on Conceptual Graph Formalism", International Symposium on Knowledge Acquisition and Modeling (KAM 2009), pp.383-386, 2009