DOI QR코드

DOI QR Code

단서표현 기반의 인물관련 질의-응답문 문장 주제 분류 시스템

A Topic Classification System Based on Clue Expressions for Person-Related Questions and Passages

  • 이경호 (충남대학교 전자전파정보통신공학과) ;
  • 이공주 (충남대학교 전파정보통신공학과)
  • 투고 : 2015.07.28
  • 심사 : 2015.10.16
  • 발행 : 2015.12.31

초록

일반적으로 질의응답 시스템은 입력된 질문에 대한 정답을 찾기 위해 질문과 관련된 문서 또는 단락 단위의 검색을 수행한다. 그렇지만 단어 기반의 검색만으로는 정답을 포함하는 단락을 찾기 어려운 경우가 있다. 본 논문에서는 이러한 문제를 각 문장이 가지고 있는 주제를 통해 해결할 수 있다고 판단하고 이를 위한 질의-응답문의 주제 분류 시스템에 대해 연구하였다. 이러한 시스템을 위해 필요한 인물과 관련한 주제 유형을 소개하고, 주제를 찾기 위한 단서표현을 정의하였다. 또한 단서표현기반으로 문장의 주제를 파악하는 시스템의 구성에 대해 소개하고, 이 시스템의 구성요소들에 대한 성능 평가를 수행하였다.

In general, Q&A system retrieves passages by matching terms of a question in order to find an answer to the question. However it is difficult for Q&A system to find a correct answer because too many passages are retrieved and matching using terms is not enough to rank them according to their relevancy to a question. To alleviate this problem, we introduce a topic for a sentence, and adopt it for ranking in Q&A system. We define a set of person-related topic class and a clue expression which can indicate a topic of a sentence. A topic classification system proposed in this paper can determine a target topic for an input sentence by using clue expressions, which are manually collected from a corpus. We explain an architecture of the topic classification system and evaluate the performance of the components of this system.

키워드

참고문헌

  1. Yongjin Bae and Hyunki Kim, "Estimating Block Weighting Scheme of Structured Text in the Information Retrieval for Question Answering," Korea Computer Cogress, pp.963-965, 2015.
  2. Zhang, Dell and Wee Sun Lee, "Question classification using support vector machines," Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. ACM, 2003.
  3. Androutsopoulos, Ion, et al., "An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages," Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2000.
  4. Antonellis, Ioannis, Christos Bouras, and Vassilis Poulopoulos, "Personalized news categorization through scalable text classification," Frontiers of WWW Research and Development- APWeb 2006, Springer Berlin Heidelberg, pp.391-401, 2006.
  5. McCallum, Andrew, and Kamal Nigam, "A comparison of event models for naive bayes text classification," AAAI-98 Workshop on Learning for Text Categorization. Vol.752. 1998.
  6. McCallumzy, Andrew, et al., "Building domain-specific search engines with machine learning techniques," AAAI Technical Report SS-99-03, 1999.
  7. Chen, Jingnian, et al., "Feature selection for text classification with Naive Bayes," Expert Systems with Applications, Vol.36, No.3, pp.5432-5435, 2009. https://doi.org/10.1016/j.eswa.2008.06.054
  8. Wijewickrema, Chaaminda Manjula, and Ruwan Gamage, "An ontology based fully automatic document classification system using an existing semi-automatic system," IFLA WLIC 2013 - Future Libraries: Infinite Possibilities, Singapore, 2013.
  9. Morchid, Mohamed, Richard Dufour, and Georges Linares, "A LDA-based topic classification approach from highly imperfect automatic transcriptions," LREC'14, 2014.
  10. Quercia, Daniele, Harry Askham, and Jon Crowcroft, "TweetLDA: supervised topic classification and link prediction in Twitter," Proceedings of the 4th Annual ACM Web Science Conference. ACM, 2012.
  11. Phan, Xuan-Hieu, Le-Minh Nguyen, and Susumu Horiguchi, "Learning to classify short and sparse text & web with hidden topics from large-scale data collections," Proceedings of the 17th international conference on World Wide Web. ACM, 2008.
  12. Faguo, Zhou, et al., "Research on short text classification algorithm based on statistics and rules," Electronic Commerce and Security (ISECS), 2010 Third International Symposium on. IEEE, 2010.
  13. Wang, Chang et al., "Relation Extraction with Relation Topics," Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp.1426-1436, 2011.
  14. Wang, Chang, et al., "Relation Extraction and Scoring in DeepQA," IBM Journal of Research and Development, Vol.56, Issue.3.4, pp.9:1-9:12, 2012.
  15. Changki Lee, Yi-Gyu Hwang, and Myung-Gil Jang, "Finegrained named entity recognition and relation extraction for question answering," in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.799-800, 2007.
  16. Chae, "On the Classification and Distribution of Korean Adverbials: Focusing on the Distinction between Regular and Concord Adverbials," Language and Linguistics, Vol.29, pp.283-323, 2002.
  17. Cortes, Corinna and Vladimir Vapnik, "Support-vector networks," Machine Learning, Vol.20, Issue.3, pp.273-297, 1995. https://doi.org/10.1007/BF00994018
  18. Murphy, Kevin P., "Naive bayes classifiers," University of British Columbia, 2006.