• Title/Summary/Keyword: Entity

Search Result 2,083, Processing Time 0.03 seconds

Entity embedding based on RELIC for Entity linking of Korean (RELIC기반 엔터티 임베딩을 이용한 한국어 엔터티 링킹)

  • Choi, Heyon-Jun;Na, Seung-Hoon;Kim, Hyun-Ho;Kim, Seon-Hoon;Kang, Inho
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.128-131
    • /
    • 2020
  • 엔터티 링킹은 주어진 문서 상에서 엔터티가 내포된 부분에 어떤 엔터티가 연결되어야 하는 지를 판단하는 작업이다. 따라서, 이 과정에서 엔터티의 표상을 얻어내는 것이 엔터티 링킹의 성능에 큰 영향을 끼치게 된다. 이 논문에서는 RELIC을 통해 엔터티 임베딩을 얻어내고, 이를 엔터티 링킹에 적용시킨 결과 0.57%p의 성능 향상을 이루었다.

  • PDF

Change Acceptable In-Depth Searching in LOD Cloud for Efficient Knowledge Expansion (효과적인 지식확장을 위한 LOD 클라우드에서의 변화수용적 심층검색)

  • Kim, Kwangmin;Sohn, Yonglak
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.171-193
    • /
    • 2018
  • LOD(Linked Open Data) cloud is a practical implementation of semantic web. We suggested a new method that provides identity links conveniently in LOD cloud. It also allows changes in LOD to be reflected to searching results without any omissions. LOD provides detail descriptions of entities to public in RDF triple form. RDF triple is composed of subject, predicates, and objects and presents detail description for an entity. Links in LOD cloud, named identity links, are realized by asserting entities of different RDF triples to be identical. Currently, the identity link is provided with creating a link triple explicitly in which associates its subject and object with source and target entities. Link triples are appended to LOD. With identity links, a knowledge achieves from an LOD can be expanded with different knowledge from different LODs. The goal of LOD cloud is providing opportunity of knowledge expansion to users. Appending link triples to LOD, however, has serious difficulties in discovering identity links between entities one by one notwithstanding the enormous scale of LOD. Newly added entities cannot be reflected to searching results until identity links heading for them are serialized and published to LOD cloud. Instead of creating enormous identity links, we propose LOD to prepare its own link policy. The link policy specifies a set of target LODs to link and constraints necessary to discover identity links to entities on target LODs. On searching, it becomes possible to access newly added entities and reflect them to searching results without any omissions by referencing the link policies. Link policy specifies a set of predicate pairs for discovering identity between associated entities in source and target LODs. For the link policy specification, we have suggested a set of vocabularies that conform to RDFS and OWL. Identity between entities is evaluated in accordance with a similarity of the source and the target entities' objects which have been associated with the predicates' pair in the link policy. We implemented a system "Change Acceptable In-Depth Searching System(CAIDS)". With CAIDS, user's searching request starts from depth_0 LOD, i.e. surface searching. Referencing the link policies of LODs, CAIDS proceeds in-depth searching, next LODs of next depths. To supplement identity links derived from the link policies, CAIDS uses explicit link triples as well. Following the identity links, CAIDS's in-depth searching progresses. Content of an entity obtained from depth_0 LOD expands with the contents of entities of other LODs which have been discovered to be identical to depth_0 LOD entity. Expanding content of depth_0 LOD entity without user's cognition of such other LODs is the implementation of knowledge expansion. It is the goal of LOD cloud. The more identity links in LOD cloud, the wider content expansions in LOD cloud. We have suggested a new way to create identity links abundantly and supply them to LOD cloud. Experiments on CAIDS performed against DBpedia LODs of Korea, France, Italy, Spain, and Portugal. They present that CAIDS provides appropriate expansion ratio and inclusion ratio as long as degree of similarity between source and target objects is 0.8 ~ 0.9. Expansion ratio, for each depth, depicts the ratio of the entities discovered at the depth to the entities of depth_0 LOD. For each depth, inclusion ratio illustrates the ratio of the entities discovered only with explicit links to the entities discovered only with link policies. In cases of similarity degrees with under 0.8, expansion becomes excessive and thus contents become distorted. Similarity degree of 0.8 ~ 0.9 provides appropriate amount of RDF triples searched as well. Experiments have evaluated confidence degree of contents which have been expanded in accordance with in-depth searching. Confidence degree of content is directly coupled with identity ratio of an entity, which means the degree of identity to the entity of depth_0 LOD. Identity ratio of an entity is obtained by multiplying source LOD's confidence and source entity's identity ratio. By tracing the identity links in advance, LOD's confidence is evaluated in accordance with the amount of identity links incoming to the entities in the LOD. While evaluating the identity ratio, concept of identity agreement, which means that multiple identity links head to a common entity, has been considered. With the identity agreement concept, experimental results present that identity ratio decreases as depth deepens, but rebounds as the depth deepens more. For each entity, as the number of identity links increases, identity ratio rebounds early and reaches at 1 finally. We found out that more than 8 identity links for each entity would lead users to give their confidence to the contents expanded. Link policy based in-depth searching method, we proposed, is expected to contribute to abundant identity links provisions to LOD cloud.

Expansion of Word Representation for Named Entity Recognition Based on Bidirectional LSTM CRFs (Bidirectional LSTM CRF 기반의 개체명 인식을 위한 단어 표상의 확장)

  • Yu, Hongyeon;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.44 no.3
    • /
    • pp.306-313
    • /
    • 2017
  • Named entity recognition (NER) seeks to locate and classify named entities in text into pre-defined categories such as names of persons, organizations, locations, expressions of times, etc. Recently, many state-of-the-art NER systems have been implemented with bidirectional LSTM CRFs. Deep learning models based on long short-term memory (LSTM) generally depend on word representations as input. In this paper, we propose an approach to expand word representation by using pre-trained word embedding, part of speech (POS) tag embedding, syllable embedding and named entity dictionary feature vectors. Our experiments show that the proposed approach creates useful word representations as an input of bidirectional LSTM CRFs. Our final presentation shows its efficacy to be 8.05%p higher than baseline NERs with only the pre-trained word embedding vector.

A Study on the Relation Between Information Model and Usability of Website (웹사이트의 정보 모델과 사용성의 관계)

  • 이지수
    • Archives of design research
    • /
    • v.13 no.4
    • /
    • pp.67-76
    • /
    • 2000
  • Websites support various user activities in a wide range of contents domain and so they require different approach to extract principal design problems. In the point of media perspective on websites, this paper figures out the relationship between designer, user and website and discusses design factors of usability. It aims for the basic framework for interface design. In the media perspective website is an information entity mediating user and designer. Information entity is composed of various design factors relating to user, designer, website and others. It intends that user and information entity are accommodative to each other and have common conceptual model. To do so it is necessary for achieving usability objectives such as effectiveness, efficiency and satisfaction based on the understanding user goal, cognitive and affective characteristics. In the point of usability we examine design factors and features that are appropriate for users cognitive and affective function according to information entity model that constitutes contents, organization and representation level.

  • PDF

A Study on Collecting and Structuring Language Resource for Named Entity Recognition and Relation Extraction from Biomedical Abstracts (생의학 분야 학술 논문에서의 개체명 인식 및 관계 추출을 위한 언어 자원 수집 및 통합적 구조화 방안 연구)

  • Kang, Seul-Ki;Choi, Yun-Soo;Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.51 no.4
    • /
    • pp.227-248
    • /
    • 2017
  • This paper introduces an integrated model for systematically constructing a linguistic resource database that can be used by machine learning-based biomedical information extraction systems. The proposed method suggests an orderly process of collecting and constructing dictionaries and training sets for both named-entity recognition and relation extraction. Multiple heterogeneous structures for the resources which are collected from diverse sources are analyzed to derive essential items and fields for constructing the integrated database. All the collected resources are converted and refined to build an integrated linguistic resource storage. In this paper, we constructed entity dictionaries of gene, protein, disease and drug, which are considered core linguistic elements or core named entities in the biomedical domains and conducted verification tests to measure their acceptability.

Feature Generation of Dictionary for Named-Entity Recognition based on Machine Learning (기계학습 기반 개체명 인식을 위한 사전 자질 생성)

  • Kim, Jae-Hoon;Kim, Hyung-Chul;Choi, Yun-Soo
    • Journal of Information Management
    • /
    • v.41 no.2
    • /
    • pp.31-46
    • /
    • 2010
  • Now named-entity recognition(NER) as a part of information extraction has been used in the fields of information retrieval as well as question-answering systems. Unlike words, named-entities(NEs) are generated and changed steadily in documents on the Web, newspapers, and so on. The NE generation causes an unknown word problem and makes many application systems with NER difficult. In order to alleviate this problem, this paper proposes a new feature generation method for machine learning-based NER. In general features in machine learning-based NER are related with words, but entities in named-entity dictionaries are related to phrases. So the entities are not able to be directly used as features of the NER systems. This paper proposes an encoding scheme as a feature generation method which converts phrase entities into features of word units. Futhermore, due to this scheme, entities with semantic information in WordNet can be converted into features of the NER systems. Through our experiments we have shown that the performance is increased by about 6% of F1 score and the errors is reduced by about 38%.

Implementation of Non-SQL Data Server Framework Applying Web Tier Object Modeling (웹티어 오브젝트 모델링을 통한 non-SQL 데이터 서버 프레임웍 구현)

  • Kwon Ki-Hyeon;Cheon Sang-Ho;Choi Hyung-Jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.4B
    • /
    • pp.285-290
    • /
    • 2006
  • Various aspects should be taken into account while developing a distributed architecture based on a multi-tier model or an enterprise architecture. Among those, the separation of role between page designer and page developer, defining entity which is used for database connection and transaction processing are very much important. In this paper, we presented DONSL(Data Server of Non SQL query) architecture to solve these problems applying web tier object modelling. This architecture solves the above problems by simplifying tiers coupling and removing DAO(Data Access Object) and entity from programming logic. We concentrate upon these three parts. One is about how to develop the DAO not concerning the entity modification, another is automatic transaction processing technique including SQL generation and the other is how to use the AET/MET(Automated/Manual Execute d Transaction) effectively.

Automatic Construction of Class Hierarchies and Named Entity Dictionaries using Korean Wikipedia (한국어 위키피디아를 이용한 분류체계 생성과 개체명 사전 자동 구축)

  • Bae, Sang-Joon;Ko, Young-Joong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.492-496
    • /
    • 2010
  • Wikipedia as an open encyclopedia contains immense human knowledge written by thousands of volunteer editors and its reliability is also high. In this paper, we propose to automatically construct a Korean named entity dictionary using the several features of the Wikipedia. Firstly, we generate class hierarchies using the class information from each article of Wikipedia. Secondly, the titles of each article are mapped to our class hierarchies, and then we calculate the entropy value of the root node in each class hierarchy. Finally, we construct named entity dictionary with high performance by removing the class hierarchies which have a higher entropy value than threshold. Our experiment results achieved overall F1-measure of 81.12% (precision : 83.94%, recall : 78.48%).

A Quantitative Trust Model with consideration of Subjective Preference (주관적 선호도를 고려한 정량적 신뢰모델)

  • Kim, Hak-Joon;Lee, Sun-A;Lee, Kyung-Mi;Lee, Keon-Myung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.1
    • /
    • pp.61-65
    • /
    • 2006
  • This paper is concerned with a quantitative computational trust model which lakes into account multiple evaluation criteria and uses the recommendation from others in order to get the trust value for entities. In the proposed trust model, the trust for an entity is defined as the expectation for the entity to yield satisfactory outcomes in the given situation. Once an interaction has been made with an entity, it is assumed that outcomes are observed with respect to evaluation criteria. When the trust information is needed, the satisfaction degree, which is the probability to generate satisfactory outcomes for each evaluation criterion, is computed based on the outcome probability distributions and the entity's preference degrees on the outcomes. Then, the satisfaction degrees for evaluation criteria are aggregated into a trust value. At that time, the reputation information is also incorporated into the trust value. This paper presents in detail how the trust model works.

A Generation from Entity-Relationship Model to XML Schema Model (개체-관계 모델에선 XML Schema의 생성)

  • Kim, Chang-Suk;Kim, Dae-Su;Son, Dong-Cheul
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.6
    • /
    • pp.667-673
    • /
    • 2004
  • The XML is emerging as standard language for data exchange on the Web. Therefore the demand of XML Schema(W3C XML Schema Spec.) that verifies XML document becomes increasing. However, XML Schema has a weak point for design because of its complication despite of various data and abundant expressiveness. This paper shows a simple way of design for XML Schema using a fundamental means for database design, the Entity-Relationship model. The conversion from the Entity-Relationship model to XML Schema can not be directly on account of discordance between the two models. So we present some algorithms to generate XML Schema from the Entity-Relationship model. The algorithms produce XML Schema codes using a hierarchical view representation. An important objective of this automatic generation is to preserve XML Schema's characteristics such as reusability, global and local ability, ability of expansion and various type changes.