• Title/Summary/Keyword: author disambiguation

Search Result 18, Processing Time 0.02 seconds

Features for Author Disambiguation (저자 식별을 위한 자질 비교)

  • Kang, In-Su;Lee, Seungwoo;Jung, Hanmin;Kim, Pyung;Goo, HeeKwan;Lee, MiKyung;Sung, Won-Kyung;Park, DongIn
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.107-111
    • /
    • 2007
  • There exists a many-to-many mapping relationship between persons and their names. A person may have multiple names, and different persons may share the same name. These synonymous and homonymous names may severely deteriorate the recall and precision of the person search, respectively. This study addresses the characteristics of features for resolving homonymous author names appearing in citation data. As disambiguation features, previous works have employed citation-internal features such as co-authorship, titles of articles, titles of publications as well as citation-external features such as emails, affiliations, Web evidences. To the best of our knowledge, however, there has been no literature to deal with the influences of features on author disambiguation. This study analyzes the effect of individual features on author resolution using a large-scale test set for Korean.

  • PDF

The Impact of Name Ambiguity on Properties of Coauthorship Networks

  • Kim, Jinseok;Kim, Heejun;Diesner, Jana
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.2
    • /
    • pp.6-15
    • /
    • 2014
  • Initial based disambiguation of author names is a common data pre-processing step in bibliometrics. It is widely accepted that this procedure can introduce errors into network data and any subsequent analytical results. What is not sufficiently understood is the precise impact of this step on the data and findings. We present an empirical answer to this question by comparing the impact of two commonly used initial based disambiguation methods against a reasonable proxy for ground truth data. We use DBLP, a database covering major journals and conferences in computer science and information science, as a source. We find that initial based disambiguation induces strong distortions in network metrics on the graph and node level: Authors become embedded in ties for which there is no empirical support, thus increasing their sphere of influence and diversity of involvement. Consequently, networks generated with initial-based disambiguation are more coherent and interconnected than the actual underlying networks, and individual authors appear to be more productive and more strongly embedded than they actually are.

Email Extraction and Utilization for Author Disambiguation (저자 식별을 위한 전자메일의 추출 및 활용)

  • Kang, In-Su
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.6
    • /
    • pp.261-268
    • /
    • 2008
  • An author of a paper is represented as his/her personal name in a bibliographic record. However, the use of names to indicate authors may deteriorate recall and precision of paper and/or author search, since the same name can be shared by many different individuals and a person can write his/her name in different forms. To solve this problem, it is required to disambiguate same-name author names into different persons. As features for author resolution, previous studies have exploited bibliographic attributes such as co-authors, titles, publication information, etc. This study attempts to apply email addresses of authors to disambiguate author names. For this, we first handle the extraction of email addresses from full-text papers, and then evaluate and analyze the effect of email addresses on author resolution using a large-scale test set.

A Study on Utilization of ORCID based Author Identifier at National Level (국가 차원의 ORCID 기반 저자 식별자 활용에 관한 연구)

  • Kim, Eun-Jeong;Noh, Kyung-Ran
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.28 no.3
    • /
    • pp.151-174
    • /
    • 2017
  • The diffusion of the internet, the advancement of ICT technology, and digital diffusion have facilitated the streamlining and acceleration of scholarly communication and speeding up research, and the paradigm of scholarly information dissemination is changing. This study introduces the ORCID, a unique author identifier, and examines the ORCID organization's activities, the advantages given to researchers and research institutes, and the membership status. In addition, this paper examines adoptions and utilizations of ORCID in major countries including USA, UK, Italy, and China. Based on this, this paper suggests the necessary considerations for utilizing ORCID in terms of governance, system elements, policy and institutional aspects in an effort to identify authors at national level.

A Comparison of Author Name Disambiguation Performance through Topic Modeling (토픽모델링을 통한 저자명 식별 성능 비교)

  • Kim, Ha Jin;Jung, Hyo-jung;Song, Min
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2014.08a
    • /
    • pp.149-152
    • /
    • 2014
  • 본 연구에서는 저자명 모호성 해소를 위해 토픽모델링 기법을 사용하여 저자명을 식별 하였다. 기존의 토픽모델링은 용어 자질만을 고려하였지만 본 연구에서는 제 3의 메타데이터 자질을 활용하여 ACT(Author-Conference Topic Model) 모델과 DMR(Dirichlet-multinomial Regression) 토픽모델링을 대상으로 저자명 식별 성능을 평가, 비교하였다. 또한 수작업으로 저자 식별 작업을 한 데이터셋을 기반으로 저자 당 논문 수와 토픽 수에 차이를 두고 연구를 진행하였다. 그 결과 저자명 식별에 있어 ACT 모델보다 DMR 토픽모델링의 성능이 더 우수한 것을 알 수 있었다.

  • PDF

Name Disambiguation using Cycle Detection Algorithm Based on Social Networks (사회망 기반 순환 탐지 기법을 이용한 저자명 명확화 기법)

  • Shin, Dong-Wook;Kim, Tae-Hwan;Jeong, Ha-Na;Choi, Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.306-319
    • /
    • 2009
  • A name is a key feature for distinguishing people, but we often fail to discriminate people because an author may have multiple names or multiple authors may share the same name. Such name ambiguity problems affect the performance of document retrieval, web search and database integration. Especially, in bibliography information, a number of errors may be included since there are different authors with the same name or an author name may be misspelled or represented with an abbreviation. For solving these problems, it is necessary to disambiguate the names inputted into the database. In this paper, we propose a method to solve the name ambiguity by using social networks constructed based on the relations between authors. We evaluated the effectiveness of the proposed system based on DBLP data that offer computer science bibliographic information.

Construction of Citation Network of Authors Using URI (URI 기반 저자 인용 네트워크 구축 및 활용)

  • Koo, Hee-Kwan;Jung, Han-Min;Kang, In-Su;Lee, Seung-Woo;Sung, Won-Kyung
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.2
    • /
    • pp.150-159
    • /
    • 2007
  • For the construction of accurate scientific citation information, author disambiguation should be primarily resolved. This study proposes a method that utilizes URI(Uniform Resource Identifier) to create precise author citation networks. The adoption of URIs for representing authors and papers in this study enables us to maintain the integrity of constantly changing citation information and to guarantee the accessibility to the right literature. In experiments, we extracted 2,872 author-centric citation relation pairs from recent major IT-related proceedings written in Korean. From those, 135 citation network groups were discovered. The findings of this study are expected to be applied to a variety of researcher network services and scientific information portal services.

A Semantic-Based Feature Expansion Approach for Improving the Effectiveness of Text Categorization by Using WordNet (문서범주화 성능 향상을 위한 의미기반 자질확장에 관한 연구)

  • Chung, Eun-Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.261-278
    • /
    • 2009
  • Identifying optimal feature sets in Text Categorization(TC) is crucial in terms of improving the effectiveness. In this study, experiments on feature expansion were conducted using author provided keyword sets and article titles from typical scientific journal articles. The tool used for expanding feature sets is WordNet, a lexical database for English words. Given a data set and a lexical tool, this study presented that feature expansion with synonymous relationship was significantly effective on improving the results of TC. The experiment results pointed out that when expanding feature sets with synonyms using on classifier names, the effectiveness of TC was considerably improved regardless of word sense disambiguation.