• Title/Summary/Keyword: Entity-based

Search Result 753, Processing Time 0.029 seconds

Expansion of Word Representation for Named Entity Recognition Based on Bidirectional LSTM CRFs (Bidirectional LSTM CRF 기반의 개체명 인식을 위한 단어 표상의 확장)

  • Yu, Hongyeon;Ko, Youngjoong
    • Journal of KIISE
    • /
    • v.44 no.3
    • /
    • pp.306-313
    • /
    • 2017
  • Named entity recognition (NER) seeks to locate and classify named entities in text into pre-defined categories such as names of persons, organizations, locations, expressions of times, etc. Recently, many state-of-the-art NER systems have been implemented with bidirectional LSTM CRFs. Deep learning models based on long short-term memory (LSTM) generally depend on word representations as input. In this paper, we propose an approach to expand word representation by using pre-trained word embedding, part of speech (POS) tag embedding, syllable embedding and named entity dictionary feature vectors. Our experiments show that the proposed approach creates useful word representations as an input of bidirectional LSTM CRFs. Our final presentation shows its efficacy to be 8.05%p higher than baseline NERs with only the pre-trained word embedding vector.

Performance Comparison Analysis on Named Entity Recognition system with Bi-LSTM based Multi-task Learning (다중작업학습 기법을 적용한 Bi-LSTM 개체명 인식 시스템 성능 비교 분석)

  • Kim, GyeongMin;Han, Seunggnyu;Oh, Dongsuk;Lim, HeuiSeok
    • Journal of Digital Convergence
    • /
    • v.17 no.12
    • /
    • pp.243-248
    • /
    • 2019
  • Multi-Task Learning(MTL) is a training method that trains a single neural network with multiple tasks influences each other. In this paper, we compare performance of MTL Named entity recognition(NER) model trained with Korean traditional culture corpus and other NER model. In training process, each Bi-LSTM layer of Part of speech tagging(POS-tagging) and NER are propagated from a Bi-LSTM layer to obtain the joint loss. As a result, the MTL based Bi-LSTM model shows 1.1%~4.6% performance improvement compared to single Bi-LSTM models.

An Active Co-Training Algorithm for Biomedical Named-Entity Recognition

  • Munkhdalai, Tsendsuren;Li, Meijing;Yun, Unil;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.575-588
    • /
    • 2012
  • Exploiting unlabeled text data with a relatively small labeled corpus has been an active and challenging research topic in text mining, due to the recent growth of the amount of biomedical literature. Biomedical named-entity recognition is an essential prerequisite task before effective text mining of biomedical literature can begin. This paper proposes an Active Co-Training (ACT) algorithm for biomedical named-entity recognition. ACT is a semi-supervised learning method in which two classifiers based on two different feature sets iteratively learn from informative examples that have been queried from the unlabeled data. We design a new classification problem to measure the informativeness of an example in unlabeled data. In this classification problem, the examples are classified based on a joint view of a feature set to be informative/non-informative to both classifiers. To form the training data for the classification problem, we adopt a query-by-committee method. Therefore, in the ACT, both classifiers are considered to be one committee, which is used on the labeled data to give the informativeness label to each example. The ACT method outperforms the traditional co-training algorithm in terms of f-measure as well as the number of training iterations performed to build a good classification model. The proposed method tends to efficiently exploit a large amount of unlabeled data by selecting a small number of examples having not only useful information but also a comprehensive pattern.

Combat Entity Based Modeling Methodology to Enable Joint Analysis of Performance/Engagement Effectiveness - Part 2 : Detailed Model Design & Model Implementation (성능/교전 효과도의 상호 분석이 가능한 전투 개체 기반의 모델링 방법론 - 제2부 : 상세 모델 설계 및 모델 구현)

  • Seo, Kyung-Min;Choi, Changbeom;Kim, Tag Gon
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.17 no.2
    • /
    • pp.235-247
    • /
    • 2014
  • Based on two dimensional model partition method proposed in Part 1, Part 2 provides detailed model specification and implementation. To mathematically delineate a model's behaviors and interactions among them, we extend the DEVS (Discrete Event Systems Specification) formalism and newly propose CE-DEVS (Combat Entity-DEVS) for an upper abstraction sub-model of a combat entity model. The proposed CE-DEVS additionally define two sets and one function to reflect essential semantics for the model's behaviors explicitly. These definitions enable us to understand and represent the model's behaviors easily since they eliminate differences of meaning between real-world expressions and model specifications. For model implementation, upper abstraction sub-models are implemented with DEVSim++, while the lower sub-models are realized using the C++ language. With the use of overall modeling techniques proposed in Part 1 and 2, we can conduct constructive simulation and assess factors about combat logics as well as battle field functions of the next-generation combat entity, minimizing additional modeling efforts. From the anti-torpedo warfare experiment, we can gain interesting experimental results regarding engagement situations employing developing weapons and their tactics. Finally, we expect that this work will serve an immediate application for various engagement warfare.

An Integrative Way of Process Analysis for Better Total Quality Management: Focusing on Drug Entity (종합적 질 관리 (TQM)를 위한 프로세스 분석 방법 -의약품 실체를 중심으로-)

  • Kim, Myeng-Ki
    • Quality Improvement in Health Care
    • /
    • v.1 no.1
    • /
    • pp.56-65
    • /
    • 1994
  • Total quality management has been a focus of concern in recent years since some dissatisfaction with the results from implementation of quality assurance programs in the U.S. Many managerial methodologies and innovation guidelines from academic disciplines have been applied to promote TQM programs in the health field. This paper consists of two folds of aspects: firstly to examine TQM's managerial philosophy by comparing with the newly introduced managerial concepts in Business Reengineering; and then to introduce a method for an integrative way of process analysis, Entity Life-Cycle Diagram (ELCD) modeling. The analysis method was compared with Process Map, which is a well-known method for BR applications. To show effectiveness of ELCD modeling, a case of application was introduced using 'drug' as a target entity. With having TQM issues in mind, the result was reflected in designing Entity Relation Diagrams. The results of ELCD modeling turn out to be helpful in designing database related to quality monitoring, in that many monitoring check points can be identified in a systematic way and that queries cross-sectional over organizational boundaries can be generated with a consistent view focusing on the drug use as a single process. Full evaluation of the analysis method remains to be studied until the completion of the information system under construction. But as long as TQM is based on a process-oriented view and needs supports from information system, ELCD can be one of the appropriate choice as a tool for the process analysis.

  • PDF

Author Entity Identification using Representative Properties in Linked Data (대표 속성을 이용한 저자 개체 식별)

  • Kim, Tae-Hong;Jung, Han-Min;Sung, Won-Kyung;Kim, Pyung
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.1
    • /
    • pp.17-29
    • /
    • 2012
  • In recent years, Linked Data that is published under an open license shows increased growth rate and comes into the spotlight due to its interoperability and openness especially in government of developed countries. However there are relatively few out-links compared with its entire number of links and most of links refer a few hub dataset. These occur because of absence of technology that identifies entities in Linked data. In this paper, we present an improved author entity resolution method that using representative properties. To solve problems of previous methods that utilizes relation with other entities(owl:sameAs, owl:differentFrom and so on) or depends on Curation, we design and evaluate an automated realtime resolution process based on multi-ontologies that respects entity's type and its logical characteristics so as to verify entities consistency. The evaluation of author entity resolution shows positive results (The average of K measuring result is 0.8533.) with 29 author information that has obtained confirmation.

ER2XML: An Implementation of XML Schema Generator based on the Entity-Relationship Model (ER2XML :개체-관계 모델을 기반으로한 XML Schema 생성기의 구현)

  • Kim Chang Suk;Son Dong-Cheul
    • The KIPS Transactions:PartD
    • /
    • v.12D no.1 s.97
    • /
    • pp.1-12
    • /
    • 2005
  • The XML is emerging as standard language for data exchange on the Web. Therefore a demand of XML Schema(W3C MLL Schema Spec.) that verifies XML document becomes increasing. However, XML Schema has a weak point for design because of its complication despiteof various data and abundant expressiveness. This paper shows a simple way of design for XML Schema using a fundamental means for database design, the Entity-Relationship model. The conversion from the Entity-Relationship model to XML Schema can not be directly on account of discordance between the two models. So we present some algorithms to generate XML Schema from the Entity-Relationship model. The algorithms produce XML Schema codes using a hierarchical view representation. An important objective of this automatic generation is to preserve XML Schema's characteristics such as reusability, global and local ability, ability of expansion and various type changes.

Probabilistic based Web Contents Mining (확률 기반 웹 콘텐츠 마이닝)

  • Yun, Bo-Hyun;Cho, Kwang-Moon
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.16-20
    • /
    • 2006
  • In Web contents mining, it is important to recognize the unlabeled entities and to integrate the sub-linked information and the extracted results. This paper presents the probabilistic based method which can recognize the unlabeled entity by using the Baysien model. Moreover, we propose the method that can use the information of the sub-linked web pages and integrate the extracted results. In the experimental results, we can see that the probabilistic based entity and information integration show the most significant precision.

  • PDF

An Approach to Persistent Naming and Naming Mapping Based on OSI and IGM for Parametric CAD Model Exchanges (파라메트릭 CAD모델 교환을 위한 OSI와 IGM기반의 고유 명칭 방법과 명칭 매핑 방법)

  • Mun D.H.;Han S.H.
    • Korean Journal of Computational Design and Engineering
    • /
    • v.9 no.3
    • /
    • pp.226-237
    • /
    • 2004
  • If the topology changes in the re-generation step of the history-based and feature-based CAD systems, it is difficult to identify an entity in the old model and find the same entity in the new model. This problem is known as 'persistent naming problem'. To exchange parametric CAD models, the persistent naming problem and the naming mapping problem must be solved among different CAD system, which use different naming scheme. For CAD model exchange the persistent naming has its own characteristics compare to that for CAD system development. This paper analyses previous researches and proposes a solution to the persistent naming problem for CAD model exchanges and to the naming mapping problem among different naming schemes.

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • v.2 no.2
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.