DOI QR코드

DOI QR Code

WordNet-Based Category Utility Approach for Author Name Disambiguation

저자명 모호성 해결을 위한 개념망 기반 카테고리 유틸리티

  • Published : 2009.06.30

Abstract

Author name disambiguation is essential for improving performance of document indexing, retrieval, and web search. Author name disambiguation resolves the conflict when multiple authors share the same name label. This paper introduces a novel approach which exploits ontologies and WordNet-based category utility for author name disambiguation. Our method utilizes author knowledge in the form of populated ontology that uses various types of properties: titles, abstracts and co-authors of papers and authors' affiliation. Author ontology has been constructed in the artificial intelligence and semantic web areas semi-automatically using OWL API and heuristics. Author name disambiguation determines the correct author from various candidate authors in the populated author ontology. Candidate authors are evaluated using proposed WordNet-based category utility to resolve disambiguation. Category utility is a tradeoff between intra-class similarity and inter-class dissimilarity of author instances, where author instances are described in terms of attribute-value pairs. WordNet-based category utility has been proposed to exploit concept information in WordNet for semantic analysis for disambiguation. Experiments using the WordNet-based category utility increase the number of disambiguation by about 10% compared with that of category utility, and increase the overall amount of accuracy by around 98%.

동명이인의 저자를 구분하는 것은 웹에서 문서 색인과 검색의 성능을 향상시킨다. 동명이인의 저자 구분은 웹사이트 상에서 같은 이름을 갖는 여러 명의 사람이 존재했을 때 야기되는 여러 가지 문제점을 해결한다. 본 논문은 동명이인의 저자 구분을 위해 개념망 기반의 카테고리 유틸리티를 제안한다. 따라서 본 논문에서는 학술회의 웹 사이트를 대상으로 제안하고자 하는 방법을 설명한다. 제안된 방법은 저자가 가지고 있는 다양한 속성(제목, 요약, 공동저자, 소속)을 반영한 저자 온톨로지와 개념망을 활용한다. 저자 온톨로지는 OWL API와 휴리스틱한 방법을 사용하여 반자동으로 구축 되었다. 저자명 모호성 해결은 개념망 기반 카테고리 유틸리티를 사용하여 저자 온톨로지 내에 존재하는 동명이인 저자(Candidate Authors)들로부터 해당 논문에 관련된 정확한 저자를 결정한다. 카테고리 유틸리티는 각각의 저자간의 intra-class 유사성 와 inter-class 비유사성을 기본적인 개념으로 하는 평가 함수다. 이에 비해 개념망 기반 카테고리 유틸리티는 모호성 해결을 위해 개념망이 갖는 개념 정보를 추가로 활용한다. 실험 결과를 분석한 결과 개념망 기반 카테고리 유틸리티가 일반적인 카테고리 유틸리티에 비교해서, 저자명 모호성 해결에 있어서 10% 정도 우수한 성능을 보였으며, 전체적으로 98%의 정확도를 보였다.

Keywords

References

  1. Joseph Hassell, Boanerges Aleman-Meza, I.Budak Arpinar, 'Ontology-Driven Automatic Entity Disambiguation in Unstructured Text', 5th International Semantic Web Conference, Athens, GA, USA, 2006
  2. Hui Han, Lee Giles, Hongyuan Zha, 'Two Supervised Learning Approaches for Name Disambiguation in Author Citations', 4th Joint Conference on Digital Libraries, Tucson, Arizona, USA, 2004 https://doi.org/10.1145/996350.996419
  3. Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, R.Guha, Anant Jhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, Jason Y. Zien, 'SemTag and Seeker: Bootstrapping the semantic web via automated semantic annotation', 20th World Wide Web conference, Budapest, Hungary, 2003 https://doi.org/10.1145/775152.775178
  4. Douglas H. Fisher, 'Knowledge Acquisition Via Incremental Conceptual Clustering', Machine Learning, Vol.2, pp.139-172, 1987 https://doi.org/10.1007/BF00114265
  5. Thamar Solorio, 'Improvement of Named Entity Tagging by Machine Learning', Technical Report CCC-04-004, Coordinacin de Ciencias Computacionales, 2004
  6. Michael Erdmann, Alexander Maedche, Hans-Peter Schnurr, Steffen Staab, 'From Manual to Semi-automatic Semantic Annotation: About Ontology-based Text Annotation Tools', Proceedings of the COLING 2000 Workshop on Semantic Annotation and Intelligent Content, Luxembourg, 2000
  7. Norberto Fernandez Garcia, Jose Maria Blazquez del Toro, Luis Sanchez Fernandez and Ansgar Bernardi, 'IdentityRank: Named Entity Disambiguation in the Context of the NEWS Project', 4th European Semantic Web Conference, Innsbruck, Austria, 2007 https://doi.org/10.1007/978-3-540-72667-8_45
  8. Hui Han, Hongyuan Zha, C. Lee Giles, 'Name Disambiguation in Author Citations using a K-way Spectral Clustering Method', 5th Joint Conference on Digital Libraries, Denver, Colorado, USA, 2004 https://doi.org/10.1145/1065385.1065462
  9. Alexiei Dingli, Fabio Ciravegna, Yorick Wilks, 'Automatic Semantic Annotation using Unsupervised Information Extraction and Integration', K-CAP 2003 Workshop on Knowledge Markup and Semantic Annotation, 2003
  10. Ziming Zhuang, Rohit Wagle, C. Lee Giles, 'What's There and What's Not? Focused Crawling for Missing Documents in Digital Libraries', 5th Joint Conference on Digital Libraries, Denver, Colorado, USA 2004 https://doi.org/10.1145/1065385.1065455
  11. Borislav Popov, Atanas Kiryakov, Angel Kirilov, Dimitar Manov, Damyan Ognyanoff, Miroslav Goranov, 'KIM . Semantic Annotation Platform', Proceeding of the 2nd International Semantic Web Conference, Sanibel Island, Florida, 2003
  12. Yiming Yang, and Jan O.Pedersen, 'A comparative study on Feature Selection in Text Categorization', Proceedings of ICML-97, 14th International Conference on Machine Learning, 1997
  13. WordNet, http://wordnet.princeton.edu/