• Title/Summary/Keyword: name ambiguity

Search Result 17, Processing Time 0.031 seconds

Name Disambiguation using Cycle Detection Algorithm Based on Social Networks (사회망 기반 순환 탐지 기법을 이용한 저자명 명확화 기법)

  • Shin, Dong-Wook;Kim, Tae-Hwan;Jeong, Ha-Na;Choi, Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.306-319
    • /
    • 2009
  • A name is a key feature for distinguishing people, but we often fail to discriminate people because an author may have multiple names or multiple authors may share the same name. Such name ambiguity problems affect the performance of document retrieval, web search and database integration. Especially, in bibliography information, a number of errors may be included since there are different authors with the same name or an author name may be misspelled or represented with an abbreviation. For solving these problems, it is necessary to disambiguate the names inputted into the database. In this paper, we propose a method to solve the name ambiguity by using social networks constructed based on the relations between authors. We evaluated the effectiveness of the proposed system based on DBLP data that offer computer science bibliographic information.

The Impact of Name Ambiguity on Properties of Coauthorship Networks

  • Kim, Jinseok;Kim, Heejun;Diesner, Jana
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.2
    • /
    • pp.6-15
    • /
    • 2014
  • Initial based disambiguation of author names is a common data pre-processing step in bibliometrics. It is widely accepted that this procedure can introduce errors into network data and any subsequent analytical results. What is not sufficiently understood is the precise impact of this step on the data and findings. We present an empirical answer to this question by comparing the impact of two commonly used initial based disambiguation methods against a reasonable proxy for ground truth data. We use DBLP, a database covering major journals and conferences in computer science and information science, as a source. We find that initial based disambiguation induces strong distortions in network metrics on the graph and node level: Authors become embedded in ties for which there is no empirical support, thus increasing their sphere of influence and diversity of involvement. Consequently, networks generated with initial-based disambiguation are more coherent and interconnected than the actual underlying networks, and individual authors appear to be more productive and more strongly embedded than they actually are.

A Method to Solve the Entity Linking Ambiguity and NIL Entity Recognition for efficient Entity Linking based on Wikipedia (위키피디아 기반의 효과적인 개체 링킹을 위한 NIL 개체 인식과 개체 연결 중의성 해소 방법)

  • Lee, Hokyung;An, Jaehyun;Yoon, Jeongmin;Bae, Kyoungman;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.813-821
    • /
    • 2017
  • Entity Linking find the meaning of an entity mention, which indicate the entity using different expressions, in a user's query by linking the entity mention and the entity in the knowledge base. This task has four challenges, including the difficult knowledge base construction problem, multiple presentation of the entity mention, ambiguity of entity linking, and NIL entity recognition. In this paper, we first construct the entity name dictionary based on Wikipedia to build a knowledge base and solve the multiple presentation problem. We then propose various methods for NIL entity recognition and solve the ambiguity of entity linking by training the support vector machine based on several features, including the similarity of the context, semantic relevance, clue word score, named entity type similarity of the mansion, entity name matching score, and object popularity score. We sequentially use the proposed two methods based on the constructed knowledge base, to obtain the good performance in the entity linking. In the result of the experiment, our system achieved 83.66% and 90.81% F1 score, which is the performance of the NIL entity recognition to solve the ambiguity of the entity linking.

Implementation of persistent identification of topological entities based on macro-parametrics approach

  • Farjana, Shahjadi Hisan;Han, Soonhung;Mun, Duhwan
    • Journal of Computational Design and Engineering
    • /
    • v.3 no.2
    • /
    • pp.161-177
    • /
    • 2016
  • In history based parametric CAD modeling systems, persistent identification of the topological entities after design modification is mandatory to keep the design intent by recording model creation history and modification history. Persistent identification of geometric and topological entities is necessary in the product design phase as well as in the re-evaluation stage. For the identification, entities should be named first according to the methodology which will be applicable for all the entities unconditionally. After successive feature operations on a part body, topology based persistent identification mechanism generates ambiguity problem that usually stems from topology splitting and topology merging. Solving the ambiguity problem needs a complex method which is a combination of topology and geometry. Topology is used to assign the basic name to the entities. And geometry is used for the ambiguity solving between the entities. In the macro parametrics approach of iCAD lab of KAIST a topology based persistent identification mechanism is applied which will solve the ambiguity problem arising from topology splitting and also in case of topology merging. Here, a method is proposed where no geometry comparison is necessary for topology merging. The present research is focused on the enhancement of the persistent identification schema for the support of ambiguity problem especially of topology splitting problem and topology merging problem. It also focused on basic naming of pattern features.

Prosody and comprehension of ambiguous dative NPs in Korean

  • Kang, Soyoung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.2
    • /
    • pp.153-161
    • /
    • 2014
  • The current study reports the results from a cross-modal naming experiment investigating the effects of a prosodic boundary location on the comprehension of ambiguous dative NPs in Korean (Yeongmi-ka Ceonghi-eykey norae-rul pwulecwu-n pwuin-ul ${\cdots}$). The underlined dative NP, Ceonghi-eykey, can temporarily be attached to the embedded rel-marked verb, pwulecwu-n ('sing-rel') or to the matrix verb to appear later. Participants heard sentence fragments manipulated for the location of Intonation Phrase boundary (the biggest prosodic boundary in the model of Seoul Korean) and right after that, had to name visually presented naming targets, which resolve the ambiguity of dative NPs. The prosodic manipulation did not result in difference in naming time, suggesting that the location of a prosodic boundary failed to influence the way Korean listeners interpreted ambiguous dative NPs. Possible reasons for the null effect were discussed.

Reconsideration about Nomenclature of Herbs Listed in the Korean Pharmacopoeia (대한민국약전에 수재된 식물성 한약재의 학명에 대한 재고)

  • Doh, Eui-Jeong;Lee, Guem-San
    • The Korea Journal of Herbology
    • /
    • v.28 no.3
    • /
    • pp.61-68
    • /
    • 2013
  • Objectives : A precise and simple system of nomenclature was required to avoid error, ambiguity or confusion. Although medicinal plants must be produced or distributed based on a pharmacopoeia described origin including scientific name, the Korean Pharmacopoeia tenth edition (KP 10) had many names against the nomenclature. Therefore, this study aimed at searching correct scientific names for 241 plants in KP 10. Methods : Authoritative databases - The Plant List, International Plant Name Index, YList, Tropicos, eFloras, World Checklist of Selected Plant Families, The Global Compositae Checklist, The International Legume Database and Information Service, et al. - and previously performed researches, floras were cross-checked. Results : The arrangement of this list was designed for four cases, errors including illegitimate, nomenclatural synonyms, recommended names and decision reserved names. Consideration about the scientific names produced nine correct names for ten misspellings and illegitimate, and thirty-six correct names for forty-one nomenclatural synonyms. These results should be reflected in the next of KP 10. Separately, ten recommended names were also suggested for taxonomic synonyms which had been used indiscriminately due to diverse taxonomic opinions. In addition to those, decision reserved names were suggested for thirteen species which had been corridor of uncertainty. Then again, there was need to study about authorship, because KP 10 did not keep recommendations for author citations. Conclusions : Correction of scientific names for some medicinal plants which violated the International Code of Nomenclature would be useful to improve the accuracy of a Pharmacopoeia as the criterional materials.

Korean Names

  • Kim, Chin-W.
    • Lingua Humanitatis
    • /
    • v.7
    • /
    • pp.11-30
    • /
    • 2005
  • Historical origins of both personal names and place names in Korea are reviewed. It is shown that names of native origin have been largely replaced by those of Sino-Korean names. Some statistics are given on the basis of the 2000 census data in South Korea. A unique method of naming personal names which contain a generation marker called hangnyol is reviewed. This enables the person to figure out one's position and others in the family tree up to as many as ten generations without going consulting the book of genealogy. While this practice had a role to play in a vertically structured society where seniority is important, it is less practiced as the society is becoming more egalitarian, so that native names, not writable in Chinese characters, are on the rise. In this global age, a person is not just a member of his family or clan, s/he is also a member of the international community. The author proposes several things that should be considered in naming to fit the modern global age: euphony of names, ambiguity, possible bad connotations when Romanized, unintended homophones with comic meanings, etc.

  • PDF

Automated Classification of PubMed Texts for Disambiguated Annotation Using Text and Data Mining

  • Choi, Yun-Jeong;Park, Seung-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.101-106
    • /
    • 2005
  • Recently, as the size of genetic knowledge grows faster, automated analysis and systemization into high-throughput database has become hot issue. One essential task is to recognize and identify genomic entities and discover their relations. However, ambiguity of name entities is a serious problem because of their multiplicity of meanings and types. So far, many effective techniques have been proposed to analyze documents. Yet, accuracy is high when the data fits the model well. The purpose of this paper is to design and implement a document classification system for identifying entity problems using text/data mining combination, supplemented by rich data mining algorithms to enhance its performance. we propose RTP ost system of different style from any traditional method, which takes fault tolerant system approach and data mining strategy. This feedback cycle can enhance the performance of the text mining in terms of accuracy. We experimented our system for classifying RB-related documents on PubMed abstracts to verify the feasibility.

  • PDF

A Study on the Identification Algorithm for Organization's Name of Author of Korean Science & Technology Contents (국내 과학기술콘텐츠 저자의 소속기관명 식별을 위한 소속기관명 자동 식별 알고리즘에 관한 연구)

  • Kim, Jinyoung;Lee, Seok-Hyong;Suh, Dongjun;Kim, Kwang-Young;Yoon, Jungsun
    • Journal of Digital Contents Society
    • /
    • v.18 no.2
    • /
    • pp.373-382
    • /
    • 2017
  • As the number of scientific and technical contents increases, services that support efficient search of scientific and technical contents are required. When an author's affiliation is used as a keyword, not only the contents produced by the affiliation can be searched, but also the identification rate of the search result using the author and the term as keyword can be improved. Because of the ambiguity and vagueness of the data used as a search keyword, the search result may include false negative or false positive. However, the previous research on the control through identification of the search keyword is mainly focused on the author data and terminology data. In this paper, we propose the algorithm to identify affiliations and experiment with show the experiment with scientific and technological contents held by the Korea Institute of Science and Technology Information.

Privacy-Enhanced Subject Identification method Embedded in X.509 Certificate (X.509 인증서에 포함된 프라이버시 보호기능을 가진 개인 식별 방법)

  • Lee, Jae-Il;Park, Jong-Wook;Kim, Seung-Joo;Song, Joo-Seok
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.16 no.5
    • /
    • pp.59-66
    • /
    • 2006
  • A Certification Authority issues X.509 public key certificates to bind a public key to a subject. The subject is specified through one or more subject names in the 'subject' or 'subjectAltName' fields of a certificate. In reality, however, there are individuals that have the same or similar names. This ambiguity can be resolved by including a 'permanent identifier' in all certificates issued to the same subject, which is unique across multiple CAs. But, a person's unique identifier is regarded as a sensitive personal data. Such an identifier cannot simply be included as part of the subject field, since its disclosure may lead to misuse. We present a new method for secure and accurate user authentication through the PEPSI included in the standard certificate extension of a X.509 certificate. The PEPSI can be served not only for user authentication but also for the user anonymity without divulging personal information.