• Title/Summary/Keyword: Author Name Disambiguation

Search Result 15, Processing Time 0.028 seconds

A Method for Same Author Name Disambiguation in Domestic Academic Papers (국내 학술논문의 동명이인 저자명 식별을 위한 방법)

  • Shin, Daye;Yang, Kiduk
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.28 no.4
    • /
    • pp.301-319
    • /
    • 2017
  • The task of author name disambiguation involves identifying an author with different names or different authors with the same name. The author name disambiguation is important for correctly assessing authors' research achievements and finding experts in given areas as well as for the effective operation of scholarly information services such as citation indexes. In the study, we performed error correction and normalization of data and applied rules-based author name disambiguation to compare with baseline machine learning disambiguation in order to see if human intervention could improve the machine learning performance. The improvement of over 0.1 in F-measure by the corrected and normalized email-based author name disambiguation over machine learning demonstrates the potential of human pattern identification and inference, which enabled data correction and normalization process as well as the formation of the rule-based diambiguation, to complement the machine learning's weaknesses to improve the author name disambiguation results.

WordNet-Based Category Utility Approach for Author Name Disambiguation (저자명 모호성 해결을 위한 개념망 기반 카테고리 유틸리티)

  • Kim, Je-Min;Park, Young-Tack
    • The KIPS Transactions:PartB
    • /
    • v.16B no.3
    • /
    • pp.225-232
    • /
    • 2009
  • Author name disambiguation is essential for improving performance of document indexing, retrieval, and web search. Author name disambiguation resolves the conflict when multiple authors share the same name label. This paper introduces a novel approach which exploits ontologies and WordNet-based category utility for author name disambiguation. Our method utilizes author knowledge in the form of populated ontology that uses various types of properties: titles, abstracts and co-authors of papers and authors' affiliation. Author ontology has been constructed in the artificial intelligence and semantic web areas semi-automatically using OWL API and heuristics. Author name disambiguation determines the correct author from various candidate authors in the populated author ontology. Candidate authors are evaluated using proposed WordNet-based category utility to resolve disambiguation. Category utility is a tradeoff between intra-class similarity and inter-class dissimilarity of author instances, where author instances are described in terms of attribute-value pairs. WordNet-based category utility has been proposed to exploit concept information in WordNet for semantic analysis for disambiguation. Experiments using the WordNet-based category utility increase the number of disambiguation by about 10% compared with that of category utility, and increase the overall amount of accuracy by around 98%.

Disambiguation of Korean Names in References

  • Kim, Sungwon
    • Journal of Information Science Theory and Practice
    • /
    • v.6 no.2
    • /
    • pp.62-70
    • /
    • 2018
  • One of the characteristics of academic writing is the inclusion of citations and references. As the development of reference styles used for international scholarly communication has mostly been led by Western academic societies, the reference styles developed in Western nations do not reflect the characteristics of Korean names. As a result, it is hard to distinguish Korean authors through citations based on Western reference styles, which in turn decreases the retrieval efficiency of relevant authors and ultimately the efficiency of scholarly communication. This paper intends to analyze author name disambiguation of Korean authors indicated according to Western reference styles. It aims to suggest the necessity for enhancing name disambiguation of Korean authors and revision of reference styles. Its ultimate goal is to increase the efficiency of scholarly communication through the improvement of name disambiguation of Korean authors. For this purpose, this study collected and analyzed name data of Korean researchers and compared name disambiguation of authors by reference style. Based on research results, this study confirmed a necessity for revising reference styles to improve name disambiguation of authors and suggested a necessity for research into the improvement of plans for revision.

Review of Author Name Disambiguation Techniques for Citation Analysis (인용분석에서의 모호한 저자명 식별을 위한 방법들에 관한 고찰)

  • Kim, Hyun-Jung
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.23 no.3
    • /
    • pp.5-17
    • /
    • 2012
  • In citation analysis, author names are often used as the unit of analysis and some authors are indexed under the same name in bibliographic databases where the citation counts are obtained from. There are many techniques for author name disambiguation, using supervised, unsupervised, or semisupervised learning algorithms. Unsupervised approach uses machine learning algorithms to extract necessary bibliographic information from large-scale databases and digital libraries, while supervised approaches use manually built training datasets for clustering author groups for combining them with learning algorithms for author name disambiguation. The study examines various techniques for author name disambiguation in the hope for finding an aid to improve the precision of citation counts in citation analysis, as well as for better results in information retrieval.

A Large-scale Test Set for Author Disambiguation (저자 식별을 위한 대용량 평가셋 구축)

  • Kang, In-Su;Kim, Pyung;Lee, Seung-Woo;Jung, Han-Min;You, Beom-Jong
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.11
    • /
    • pp.455-464
    • /
    • 2009
  • To overcome article-oriented search functions and provide author-oriented ones, a namesake problem for author names should be solved. Author disambiguation, proposed as its solution, assigns identifiers of real individuals to author name entities. Although recent state-of-the-art approaches to author disambiguation have reported above 90% performance, there are few academic information services which adopt author-resolving functions. This paper describes a large-scale test set for author disambiguation which was created by KISTI to foster author resolution researches. The result of these researches can be applied to academic information systems and make better service. The test set was constructed from DBLP data through web searches and manual inspection, Currently it consists of 881 author names, 41,673 author name entities, and 6,921 person identifiers.

Disambiguation of Author Names Using Co-citation (동시인용정보를 이용한 동명이인 저자의 중의성 해소)

  • Kang, In-Su
    • Journal of Information Management
    • /
    • v.42 no.3
    • /
    • pp.167-186
    • /
    • 2011
  • Co-citation means that two or more studies are cited together by a later study. This paper deals with the relationship between co-citation and author disambiguation. Author disambiguation is to cluster same-name author instances into real-world individuals. Co-citation may influence author disambiguation in terms that two or more related research works performed by the same person may be co-cited by some later studies. This article describes automated steps to gather co-citation information from Google scholar, and proposes a new clustering algorithm to effectively integrate co-citation information with other author disambiguation features. Experiments showed that co-citation helps to improve the performance of author disambiguation.

Application of Machine Learning Techniques for Resolving Korean Author Names (한글 저자명 중의성 해소를 위한 기계학습기법의 적용)

  • Kang, In-Su
    • Journal of the Korean Society for information Management
    • /
    • v.25 no.3
    • /
    • pp.27-39
    • /
    • 2008
  • In bibliographic data, the use of personal names to indicate authors makes it difficult to specify a particular author since there are numerous authors whose personal names are the same. Resolving same-name author instances into different individuals is called author resolution, which consists of two steps: calculating author similarities and then clustering same-name author instances into different person groups. Author similarities are computed from similarities of author-related bibliographic features such as coauthors, titles of papers, publication information, using supervised or unsupervised methods. Supervised approaches employ machine learning techniques to automatically learn the author similarity function from author-resolved training samples. So far however, a few machine learning methods have been investigated for author resolution. This paper provides a comparative evaluation of a variety of recent high-performing machine learning techniques on author disambiguation, and compares several methods of processing author disambiguation features such as coauthors and titles of papers.

Email Extraction and Utilization for Author Disambiguation (저자 식별을 위한 전자메일의 추출 및 활용)

  • Kang, In-Su
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.6
    • /
    • pp.261-268
    • /
    • 2008
  • An author of a paper is represented as his/her personal name in a bibliographic record. However, the use of names to indicate authors may deteriorate recall and precision of paper and/or author search, since the same name can be shared by many different individuals and a person can write his/her name in different forms. To solve this problem, it is required to disambiguate same-name author names into different persons. As features for author resolution, previous studies have exploited bibliographic attributes such as co-authors, titles, publication information, etc. This study attempts to apply email addresses of authors to disambiguate author names. For this, we first handle the extraction of email addresses from full-text papers, and then evaluate and analyze the effect of email addresses on author resolution using a large-scale test set.

Exploration of Hierarchical Techniques for Clustering Korean Author Names (한글 저자명 군집화를 위한 계층적 기법 비교)

  • Kang, In-Su
    • Journal of Information Management
    • /
    • v.40 no.2
    • /
    • pp.95-115
    • /
    • 2009
  • Author resolution is to disambiguate same-name author occurrences into real individuals. For this, pair-wise author similarities are computed for author name entities, and then clustering is performed. So far, many studies have employed hierarchical clustering techniques for author disambiguation. However, various hierarchical clustering methods have not been sufficiently investigated. This study covers an empirical evaluation and analysis of hierarchical clustering applied to Korean author resolution, using multiple distance functions such as Dice coefficient, Cosine similarity, Euclidean distance, Jaccard coefficient, Pearson correlation coefficient.

Author Graph Generation based on Author Disambiguation (저자 식별에 기반한 저자 그래프 생성)

  • Kang, In-Su
    • Journal of Information Management
    • /
    • v.42 no.1
    • /
    • pp.47-62
    • /
    • 2011
  • While an ideal author graph should have its nodes to represent authors, automatically-generated author graphs mostly use author names as their nodes due to the difficulty of resolving author names into individuals. However, employing author names as nodes of author graphs merges namesakes, otherwise separate nodes in the author graph, into the same node, which may distort the characteristics of the author graph. This study proposes an algorithm which resolves author ambiguities based on co-authorship and then yields an author graph consisting of not author name nodes but author nodes. Scientific collaboration relationship this algorithm depends on tends to produce the clustering results which minimize the over-clustering error at the expense of the under-clustering error. In experiments, the algorithm is applied to the real citation records where Korean namesakes occur, and the results are discussed.