• Title/Summary/Keyword: Entity

Search Result 2,083, Processing Time 0.026 seconds

Named Entity and Event Annotation Tool for Cultural Heritage Information Corpus Construction (문화유산정보 말뭉치 구축을 위한 개체명 및 이벤트 부착 도구)

  • Choi, Ji-Ye;Kim, Myung-Keun;Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.9
    • /
    • pp.29-38
    • /
    • 2012
  • In this paper, we propose a named entity and event annotation tool for cultural heritage information corpus construction. Focusing on time, location, person, and event suitable for cultural heritage information management, the annotator writes the named entities and events with the proposed tool. In order to easily annotate the named entities and the events, the proposed tool automatically annotates the location information such as the line number or the word number, and shows the corresponding string, formatted as both bold and italic, in the raw text. For the purpose of reducing the costs of the manual annotation, the proposed tool utilizes the patterns to automatically recognize the named entities. Considering the very little training corpus, the proposed tool extracts simple rule patterns. To avoid error propagation, the proposed patterns are extracted from the raw text without any additional process. Experimental results show that the proposed tool reduces more than half of the manual annotation costs.

A Business System Analysis Model with Extended Entity Concept (확장된 개체 개념의 비즈니스 시스템 분석 모델)

  • Lee, Seo-Jeong;Ko, Byung-Sun;Choi, Mi-Sook;Park, Jai-Nyun
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.12
    • /
    • pp.885-895
    • /
    • 2001
  • Existing system analysis models suggest various ideas to present entity relations and event flows for consistency between analysis and design paradigms. However, they are preferred to derive and arrange related entities on system flow than to identify entities. To identify entities systematically is a basic and important work of software development, and identified entities can be major assets of business system. In case of business systems the business rules or the computed or derived information like attendance lists of lecture system can be the most important system assets. The management information or meta data are also. In this paper, we suggest a business system analysis models to derive and present entities. System is identified entities, interfaces and event or behaviors through this model then entities are extended to independent entities, dependent entities, which are dependent to independent entities, constraint shows the physical and administrative notices. Various entity identification can reduce the incompleteness of entity analysis.

  • PDF

Performance Comparison Analysis on Named Entity Recognition system with Bi-LSTM based Multi-task Learning (다중작업학습 기법을 적용한 Bi-LSTM 개체명 인식 시스템 성능 비교 분석)

  • Kim, GyeongMin;Han, Seunggnyu;Oh, Dongsuk;Lim, HeuiSeok
    • Journal of Digital Convergence
    • /
    • v.17 no.12
    • /
    • pp.243-248
    • /
    • 2019
  • Multi-Task Learning(MTL) is a training method that trains a single neural network with multiple tasks influences each other. In this paper, we compare performance of MTL Named entity recognition(NER) model trained with Korean traditional culture corpus and other NER model. In training process, each Bi-LSTM layer of Part of speech tagging(POS-tagging) and NER are propagated from a Bi-LSTM layer to obtain the joint loss. As a result, the MTL based Bi-LSTM model shows 1.1%~4.6% performance improvement compared to single Bi-LSTM models.

An Active Co-Training Algorithm for Biomedical Named-Entity Recognition

  • Munkhdalai, Tsendsuren;Li, Meijing;Yun, Unil;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.575-588
    • /
    • 2012
  • Exploiting unlabeled text data with a relatively small labeled corpus has been an active and challenging research topic in text mining, due to the recent growth of the amount of biomedical literature. Biomedical named-entity recognition is an essential prerequisite task before effective text mining of biomedical literature can begin. This paper proposes an Active Co-Training (ACT) algorithm for biomedical named-entity recognition. ACT is a semi-supervised learning method in which two classifiers based on two different feature sets iteratively learn from informative examples that have been queried from the unlabeled data. We design a new classification problem to measure the informativeness of an example in unlabeled data. In this classification problem, the examples are classified based on a joint view of a feature set to be informative/non-informative to both classifiers. To form the training data for the classification problem, we adopt a query-by-committee method. Therefore, in the ACT, both classifiers are considered to be one committee, which is used on the labeled data to give the informativeness label to each example. The ACT method outperforms the traditional co-training algorithm in terms of f-measure as well as the number of training iterations performed to build a good classification model. The proposed method tends to efficiently exploit a large amount of unlabeled data by selecting a small number of examples having not only useful information but also a comprehensive pattern.

A Study on the Integration of Recognition Technology for Scientific Core Entities (과학기술 핵심개체 인식기술 통합에 관한 연구)

  • Choi, Yun-Soo;Jeong, Chang-Hoo;Cho, Hyun-Yang
    • Journal of the Korean Society for information Management
    • /
    • v.28 no.1
    • /
    • pp.89-104
    • /
    • 2011
  • Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In order to extract these entities automatically from scientific documents at once, we developed a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer and terminology extractor.

Combat Entity Based Modeling Methodology to Enable Joint Analysis of Performance/Engagement Effectiveness - Part 2 : Detailed Model Design & Model Implementation (성능/교전 효과도의 상호 분석이 가능한 전투 개체 기반의 모델링 방법론 - 제2부 : 상세 모델 설계 및 모델 구현)

  • Seo, Kyung-Min;Choi, Changbeom;Kim, Tag Gon
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.17 no.2
    • /
    • pp.235-247
    • /
    • 2014
  • Based on two dimensional model partition method proposed in Part 1, Part 2 provides detailed model specification and implementation. To mathematically delineate a model's behaviors and interactions among them, we extend the DEVS (Discrete Event Systems Specification) formalism and newly propose CE-DEVS (Combat Entity-DEVS) for an upper abstraction sub-model of a combat entity model. The proposed CE-DEVS additionally define two sets and one function to reflect essential semantics for the model's behaviors explicitly. These definitions enable us to understand and represent the model's behaviors easily since they eliminate differences of meaning between real-world expressions and model specifications. For model implementation, upper abstraction sub-models are implemented with DEVSim++, while the lower sub-models are realized using the C++ language. With the use of overall modeling techniques proposed in Part 1 and 2, we can conduct constructive simulation and assess factors about combat logics as well as battle field functions of the next-generation combat entity, minimizing additional modeling efforts. From the anti-torpedo warfare experiment, we can gain interesting experimental results regarding engagement situations employing developing weapons and their tactics. Finally, we expect that this work will serve an immediate application for various engagement warfare.

Things unknown before being recorded (기록되기 전엔 알 수 없는 것들)

  • Lee, Kyoung Hee;Kim, Ik Han
    • The Korean Journal of Archival Studies
    • /
    • no.68
    • /
    • pp.107-150
    • /
    • 2021
  • Representation of an entity starts with recognition of its existence, and recording is mutually circular in that it acts as a means to enable the recognition of the existence. No record is left on an unrecognized entity, record is distorted if any, and the distorted reproduction represents the entity, reinforcing its invisibility. Spivak describes those who cannot speak on their own and cannot be represented as subaltern. This paper examines public record, the media and research records of female restaurant workers, identifies the subaltern characteristics and limitations of their records, and suggests the points to be considered and specific roles required for recording the subalterns. If it is possible to increase the possibility of representation by completely recording a person as an entity that contains the times and society, the accountability of the record to provide an account will extend beyond institutions to the times and society, and individuals and community will be established as political subjects.

Pathological Entity of Jueyin Disease and the Relationship between the Concept of Three-Yin-Three-Yang in 《Shanghanlun》 (《상한론(傷寒論)》 궐음병의 병리본질과 삼음삼양(三陰三陽) 개념과의 관계)

  • Chi, Gyoo Yong;Park, Shin Hyung
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.33 no.2
    • /
    • pp.75-81
    • /
    • 2019
  • In order to research the pathological entity of Jueyin disease in ${\ll}Shanghanlun{\gg}$, some sharing concept of three-yin-three-yang used in ${\ll}Neijing{\gg}$ and ${\ll}Shanghanlun{\gg}$ were investigated first, and then the meaning of jueyin and jueyin disease were analyzed. In cold damage disease, time-space factor is important because the pathological change is rapid and the symptoms along path are similar, therefore three-yin-three-yang having complex meaning of time and space can be used as an appropriate pathological concept. So to speak, it is able to be interpreted as various modes like variations of yin-yang, qi-blood, change of pulse condition, theories of opening, closing, pivot or exuberance and debilitation of form and qi manifested in the six districts of the human body following disease process. Jueyin is between front taiyin and rear shaoyin, and it's attribution is inherent in qi stagnation and yin exuberance in relative to the location of flank and liver. Putting together above descriptions, pathological entity of jueyin disease is that the symptoms mingled with cold and stagnant heat competing each other when a subject having qi stagnation in flank with cold in extremities and lower abdomen in particular is seized with cold influenza.

A Named Entity Recognition Model in Criminal Investigation Domain using Pretrained Language Model (사전학습 언어모델을 활용한 범죄수사 도메인 개체명 인식)

  • Kim, Hee-Dou;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.2
    • /
    • pp.13-20
    • /
    • 2022
  • This study is to develop a named entity recognition model specialized in criminal investigation domains using deep learning techniques. Through this study, we propose a system that can contribute to analysis of crime for prevention and investigation using data analysis techniques in the future by automatically extracting and categorizing crime-related information from text-based data such as criminal judgments and investigation documents. For this study, the criminal investigation domain text was collected and the required entity name was newly defined from the perspective of criminal analysis. In addition, the proposed model applying KoELECTRA, a pre-trained language model that has recently shown high performance in natural language processing, shows performance of micro average(referred to as micro avg) F1-score 98% and macro average(referred to as macro avg) F1-score 95% in 9 main categories of crime domain NER experiment data, and micro avg F1-score 98% and macro avg F1-score 62% in 56 sub categories. The proposed model is analyzed from the perspective of future improvement and utilization.

Chinese-clinical-record Named Entity Recognition using IDCNN-BiLSTM-Highway Network

  • Tinglong Tang;Yunqiao Guo;Qixin Li;Mate Zhou;Wei Huang;Yirong Wu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1759-1772
    • /
    • 2023
  • Chinese named entity recognition (NER) is a challenging work that seeks to find, recognize and classify various types of information elements in unstructured text. Due to the Chinese text has no natural boundary like the spaces in the English text, Chinese named entity identification is much more difficult. At present, most deep learning based NER models are developed using a bidirectional long short-term memory network (BiLSTM), yet the performance still has some space to improve. To further improve their performance in Chinese NER tasks, we propose a new NER model, IDCNN-BiLSTM-Highway, which is a combination of the BiLSTM, the iterated dilated convolutional neural network (IDCNN) and the highway network. In our model, IDCNN is used to achieve multiscale context aggregation from a long sequence of words. Highway network is used to effectively connect different layers of networks, allowing information to pass through network layers smoothly without attenuation. Finally, the global optimum tag result is obtained by introducing conditional random field (CRF). The experimental results show that compared with other popular deep learning-based NER models, our model shows superior performance on two Chinese NER data sets: Resume and Yidu-S4k, The F1-scores are 94.98 and 77.59, respectively.