• 제목/요약/키워드: name ambiguity

검색결과 17건 처리시간 0.019초

사회망 기반 순환 탐지 기법을 이용한 저자명 명확화 기법 (Name Disambiguation using Cycle Detection Algorithm Based on Social Networks)

  • 신동욱;김태환;정하나;최중민
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제36권4호
    • /
    • pp.306-319
    • /
    • 2009
  • 이름은 사람을 구별하기 위한 특징이지만 여러 사람이 하나의 이름을 공유하는 경우와 한 사람이 여러 이름을 사용하는 경우 때문에 이름만으로는 사람을 명확히 구별할 수 없다. 이러한 문제는 정보 검색 분야에서 문서 검색이나 웹 검색, 데이터베이스 통합 등에 영향을 미친다. 특히 서지 정보에는 저자들 중 동명이인이 존재하거나 한 저자가 축약된 이름 혹은 잘못된 철자를 사용하기도 하기 때문에 에러정보가 많이 포함되어 있다. 이러한 문제를 해결하기 위해 데이터베이스에 입력된 자료 중 이름에 대한 정보를 명확하게 해야 한다. 본 논문에서는 저자간의 관계로부터 구축된 사회망을 이용해 이름의 모호성을 해결하는 방법을 제안하고 컴퓨터 과학 서지정보를 제공하는 DBLP(Digital Bibliography & Library Project) 데이터를 기반한 실험을 통해 제안한 시스템의 성능의 효율성을 평가하였다.

The Impact of Name Ambiguity on Properties of Coauthorship Networks

  • Kim, Jinseok;Kim, Heejun;Diesner, Jana
    • Journal of Information Science Theory and Practice
    • /
    • 제2권2호
    • /
    • pp.6-15
    • /
    • 2014
  • Initial based disambiguation of author names is a common data pre-processing step in bibliometrics. It is widely accepted that this procedure can introduce errors into network data and any subsequent analytical results. What is not sufficiently understood is the precise impact of this step on the data and findings. We present an empirical answer to this question by comparing the impact of two commonly used initial based disambiguation methods against a reasonable proxy for ground truth data. We use DBLP, a database covering major journals and conferences in computer science and information science, as a source. We find that initial based disambiguation induces strong distortions in network metrics on the graph and node level: Authors become embedded in ties for which there is no empirical support, thus increasing their sphere of influence and diversity of involvement. Consequently, networks generated with initial-based disambiguation are more coherent and interconnected than the actual underlying networks, and individual authors appear to be more productive and more strongly embedded than they actually are.

위키피디아 기반의 효과적인 개체 링킹을 위한 NIL 개체 인식과 개체 연결 중의성 해소 방법 (A Method to Solve the Entity Linking Ambiguity and NIL Entity Recognition for efficient Entity Linking based on Wikipedia)

  • 이호경;안재현;윤정민;배경만;고영중
    • 정보과학회 논문지
    • /
    • 제44권8호
    • /
    • pp.813-821
    • /
    • 2017
  • 개체 링킹은 입력된 질의에 존재하는 개체를 표현한 개체 표현(entity mention)을 지식베이스에 존재하는 개체와 연결하여 의미를 파악하는 연구이다. 개체 링킹에 관한 연구는 지식 베이스 구축 문제, 다중 표현 문제, 개체 연결 중의성 문제, NIL 개체 인식 문제가 존재한다. 본 연구에서는 지식 베이스 구축 문제와 다중 표현 문제를 해결하기 위해 위키피디아를 기반으로 개체 이름 사전을 구축한다, 또한, 문맥 유사도, 의미적 관련성, 단서 단어 점수, 개체 표현의 개체명 타입 유사도, 개체 이름 매칭 점수, 개체인기도 점수 자질들을 기반으로 SVM(support vector machine)을 학습하여, NIL 개체를 인식하는 문제와 개체 연결 중의성을 해소하는 방법을 제안한다. 구축한 지식 베이스를 기반으로 제안한 두 방법을 순차적으로 적용하였을 때 좋은 개체 링킹 성능을 얻었다. 개체 링킹 시스템의 성능은 NIL 개체 인식 성능이 83.66%, 중의성 해소 성능이 90.81%의 F1 점수를 보였다.

Implementation of persistent identification of topological entities based on macro-parametrics approach

  • Farjana, Shahjadi Hisan;Han, Soonhung;Mun, Duhwan
    • Journal of Computational Design and Engineering
    • /
    • 제3권2호
    • /
    • pp.161-177
    • /
    • 2016
  • In history based parametric CAD modeling systems, persistent identification of the topological entities after design modification is mandatory to keep the design intent by recording model creation history and modification history. Persistent identification of geometric and topological entities is necessary in the product design phase as well as in the re-evaluation stage. For the identification, entities should be named first according to the methodology which will be applicable for all the entities unconditionally. After successive feature operations on a part body, topology based persistent identification mechanism generates ambiguity problem that usually stems from topology splitting and topology merging. Solving the ambiguity problem needs a complex method which is a combination of topology and geometry. Topology is used to assign the basic name to the entities. And geometry is used for the ambiguity solving between the entities. In the macro parametrics approach of iCAD lab of KAIST a topology based persistent identification mechanism is applied which will solve the ambiguity problem arising from topology splitting and also in case of topology merging. Here, a method is proposed where no geometry comparison is necessary for topology merging. The present research is focused on the enhancement of the persistent identification schema for the support of ambiguity problem especially of topology splitting problem and topology merging problem. It also focused on basic naming of pattern features.

Prosody and comprehension of ambiguous dative NPs in Korean

  • Kang, Soyoung
    • 말소리와 음성과학
    • /
    • 제6권2호
    • /
    • pp.153-161
    • /
    • 2014
  • The current study reports the results from a cross-modal naming experiment investigating the effects of a prosodic boundary location on the comprehension of ambiguous dative NPs in Korean (Yeongmi-ka Ceonghi-eykey norae-rul pwulecwu-n pwuin-ul ${\cdots}$). The underlined dative NP, Ceonghi-eykey, can temporarily be attached to the embedded rel-marked verb, pwulecwu-n ('sing-rel') or to the matrix verb to appear later. Participants heard sentence fragments manipulated for the location of Intonation Phrase boundary (the biggest prosodic boundary in the model of Seoul Korean) and right after that, had to name visually presented naming targets, which resolve the ambiguity of dative NPs. The prosodic manipulation did not result in difference in naming time, suggesting that the location of a prosodic boundary failed to influence the way Korean listeners interpreted ambiguous dative NPs. Possible reasons for the null effect were discussed.

대한민국약전에 수재된 식물성 한약재의 학명에 대한 재고 (Reconsideration about Nomenclature of Herbs Listed in the Korean Pharmacopoeia)

  • 도의정;이금산
    • 대한본초학회지
    • /
    • 제28권3호
    • /
    • pp.61-68
    • /
    • 2013
  • Objectives : A precise and simple system of nomenclature was required to avoid error, ambiguity or confusion. Although medicinal plants must be produced or distributed based on a pharmacopoeia described origin including scientific name, the Korean Pharmacopoeia tenth edition (KP 10) had many names against the nomenclature. Therefore, this study aimed at searching correct scientific names for 241 plants in KP 10. Methods : Authoritative databases - The Plant List, International Plant Name Index, YList, Tropicos, eFloras, World Checklist of Selected Plant Families, The Global Compositae Checklist, The International Legume Database and Information Service, et al. - and previously performed researches, floras were cross-checked. Results : The arrangement of this list was designed for four cases, errors including illegitimate, nomenclatural synonyms, recommended names and decision reserved names. Consideration about the scientific names produced nine correct names for ten misspellings and illegitimate, and thirty-six correct names for forty-one nomenclatural synonyms. These results should be reflected in the next of KP 10. Separately, ten recommended names were also suggested for taxonomic synonyms which had been used indiscriminately due to diverse taxonomic opinions. In addition to those, decision reserved names were suggested for thirteen species which had been corridor of uncertainty. Then again, there was need to study about authorship, because KP 10 did not keep recommendations for author citations. Conclusions : Correction of scientific names for some medicinal plants which violated the International Code of Nomenclature would be useful to improve the accuracy of a Pharmacopoeia as the criterional materials.

Korean Names

  • Kim, Chin-W.
    • 인문언어
    • /
    • 제7집
    • /
    • pp.11-30
    • /
    • 2005
  • Historical origins of both personal names and place names in Korea are reviewed. It is shown that names of native origin have been largely replaced by those of Sino-Korean names. Some statistics are given on the basis of the 2000 census data in South Korea. A unique method of naming personal names which contain a generation marker called hangnyol is reviewed. This enables the person to figure out one's position and others in the family tree up to as many as ten generations without going consulting the book of genealogy. While this practice had a role to play in a vertically structured society where seniority is important, it is less practiced as the society is becoming more egalitarian, so that native names, not writable in Chinese characters, are on the rise. In this global age, a person is not just a member of his family or clan, s/he is also a member of the international community. The author proposes several things that should be considered in naming to fit the modern global age: euphony of names, ambiguity, possible bad connotations when Romanized, unintended homophones with comic meanings, etc.

  • PDF

Automated Classification of PubMed Texts for Disambiguated Annotation Using Text and Data Mining

  • Choi, Yun-Jeong;Park, Seung-Soo
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.101-106
    • /
    • 2005
  • Recently, as the size of genetic knowledge grows faster, automated analysis and systemization into high-throughput database has become hot issue. One essential task is to recognize and identify genomic entities and discover their relations. However, ambiguity of name entities is a serious problem because of their multiplicity of meanings and types. So far, many effective techniques have been proposed to analyze documents. Yet, accuracy is high when the data fits the model well. The purpose of this paper is to design and implement a document classification system for identifying entity problems using text/data mining combination, supplemented by rich data mining algorithms to enhance its performance. we propose RTP ost system of different style from any traditional method, which takes fault tolerant system approach and data mining strategy. This feedback cycle can enhance the performance of the text mining in terms of accuracy. We experimented our system for classifying RB-related documents on PubMed abstracts to verify the feasibility.

  • PDF

국내 과학기술콘텐츠 저자의 소속기관명 식별을 위한 소속기관명 자동 식별 알고리즘에 관한 연구 (A Study on the Identification Algorithm for Organization's Name of Author of Korean Science & Technology Contents)

  • 김진영;이석형;서동준;김광영;윤정선
    • 디지털콘텐츠학회 논문지
    • /
    • 제18권2호
    • /
    • pp.373-382
    • /
    • 2017
  • 과학기술콘텐츠가 증가함에 따라 과학기술콘텐츠의 효율적인 검색을 지원하는 서비스가 요구되고 있다. 저자의 소속기관명을 키워드로 사용할 경우 한 기관에서 생산된 콘텐츠를 확인할 수 있을 뿐만 아니라 저자, 용어를 키워드로 사용한 검색 결과의 식별율을 향상 시킬 수 있다. 검색 키워드로 사용되는 데이터들의 중의성과 모호성으로 인해 검색 결과에 false negative, false positive가 포함될 수 있으므로 데이터의 식별을 통한 통제는 중요하다. 저자의 소속기관명의 식별을 통한 통제 역시 기관의 이명, 약어 검색을 지원가능하게 하므로 매우 중요하지만 기존의 데이터 식별을 통한 통제에 대한 연구는 저자, 용어에 대한 연구가 주를 이루었다. 본 연구에서는 기관명 식별 알고리즘을 제안하고, 한국과학기술정보연구원에서 보유하고 있는 국내 과학기술콘텐츠들에 대한 데이터를 이용한 실험 결과를 보인다.

X.509 인증서에 포함된 프라이버시 보호기능을 가진 개인 식별 방법 (Privacy-Enhanced Subject Identification method Embedded in X.509 Certificate)

  • 이재일;박종욱;김승주;송주석
    • 정보보호학회논문지
    • /
    • 제16권5호
    • /
    • pp.59-66
    • /
    • 2006
  • CA는 소유자와 공개키를 연계시키기 위해 X.509 공개키 인증서를 발급한다. 소유자는 인증서의 'subject'필드나 'subjectAltName' 확장필드에 있는 하나 이상의 소유자 명에 의해 특정되어진다. 그러나 실세계에서는 동일하거나 유사한 이름을 갖는 개인들이 존재한다. 이런 모호성은 같은 소유자에게 발급되는 모든 인증서 내에 여러 CA간에 유일성을 보장해주는 '항구식별자'를 포함시킴으로써 해결될 수 있다. 그러나 많은 경우에 있어서 개인의 유일한 식별자는 민감한 개인정보이기 때문에 소유자 필드에 단순 포함시켜서는 안 된다. 누출될 경우 오용될 수 있기 때문이다. 본 논문에서는 X.509 인증서의 표준 확장필드에 포함되는 PEPSI를 통하여 사용자 식별자의 기밀성을 제공함으로써 보다 안전하고 정확한 사용자 인증방안을 제시한다.