• Title, Summary, Keyword: Information Extraction

Search Result 4,801, Processing Time 0.063 seconds

TAKES: Two-step Approach for Knowledge Extraction in Biomedical Digital Libraries

  • Song, Min
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.1
    • /
    • pp.6-21
    • /
    • 2014
  • This paper proposes a novel knowledge extraction system, TAKES (Two-step Approach for Knowledge Extraction System), which integrates advanced techniques from Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing (NLP). In particular, TAKES adopts a novel keyphrase extraction-based query expansion technique to collect promising documents. It also uses a Conditional Random Field-based machine learning technique to extract important biological entities and relations. TAKES is applied to biological knowledge extraction, particularly retrieving promising documents that contain Protein-Protein Interaction (PPI) and extracting PPI pairs. TAKES consists of two major components: DocSpotter, which is used to query and retrieve promising documents for extraction, and a Conditional Random Field (CRF)-based entity extraction component known as FCRF. The present paper investigated research problems addressing the issues with a knowledge extraction system and conducted a series of experiments to test our hypotheses. The findings from the experiments are as follows: First, the author verified, using three different test collections to measure the performance of our query expansion technique, that DocSpotter is robust and highly accurate when compared to Okapi BM25 and SLIPPER. Second, the author verified that our relation extraction algorithm, FCRF, is highly accurate in terms of F-Measure compared to four other competitive extraction algorithms: Support Vector Machine, Maximum Entropy, Single POS HMM, and Rapier.

Keyword Extraction Using Unsupervised Learning Method (비감독 학습 기법에 의한 키워드 추출)

  • Shin, Seong-Yoon;Baek, Jeong-Uk;Rhee, Yang-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • /
    • pp.165-166
    • /
    • 2010
  • Noun extraction is to find all nouns presented in the document, Korean information retrieval uses noun as index terms or keywords of representing the document. In this paper, we proposes the method of keyword extraction using pre-built dictionary. This method reduces the execution time by reducing unnecessary operations. And noun, even large documents without affecting significantly the accuracy, can be extracted. This paper proposed noun extraction method using the appearance characteristics of the noun and keyword extraction method using unsupervised learning techniques.

  • PDF

An Efficient Damage Information Extraction from Government Disaster Reports

  • Shin, Sungho;Hong, Seungkyun;Song, Sa-Kwang
    • Journal of Internet Computing and Services
    • /
    • v.18 no.6
    • /
    • pp.55-63
    • /
    • 2017
  • One of the purposes of Information Technology (IT) is to support human response to natural and social problems such as natural disasters and spread of disease, and to improve the quality of human life. Recent climate change has happened worldwide, natural disasters threaten the quality of life, and human safety is no longer guaranteed. IT must be able to support tasks related to disaster response, and more importantly, it should be used to predict and minimize future damage. In South Korea, the data related to the damage is checked out by each local government and then federal government aggregates it. This data is included in disaster reports that the federal government discloses by disaster case, but it is difficult to obtain raw data of the damage even for research purposes. In order to obtain data, information extraction may be applied to disaster reports. In the field of information extraction, most of the extraction targets are web documents, commercial reports, SNS text, and so on. There is little research on information extraction for government disaster reports. They are mostly text, but the structure of each sentence is very different from that of news articles and commercial reports. The features of the government disaster report should be carefully considered. In this paper, information extraction method for South Korea government reports in the word format is presented. This method is based on patterns and dictionaries and provides some additional ideas for tokenizing the damage representation of the text. The experiment result is F1 score of 80.2 on the test set. This is close to cutting-edge information extraction performance before applying the recent deep learning algorithms.

Survey of Temporal Information Extraction

  • Lim, Chae-Gyun;Jeong, Young-Seob;Choi, Ho-Jin
    • Journal of Information Processing Systems
    • /
    • v.15 no.4
    • /
    • pp.931-956
    • /
    • 2019
  • Documents contain information that can be used for various applications, such as question answering (QA) system, information retrieval (IR) system, and recommendation system. To use the information, it is necessary to develop a method of extracting such information from the documents written in a form of natural language. There are several kinds of the information (e.g., temporal information, spatial information, semantic role information), where different kinds of information will be extracted with different methods. In this paper, the existing studies about the methods of extracting the temporal information are reported and several related issues are discussed. The issues are about the task boundary of the temporal information extraction, the history of the annotation languages and shared tasks, the research issues, the applications using the temporal information, and evaluation metrics. Although the history of the tasks of temporal information extraction is not long, there have been many studies that tried various methods. This paper gives which approach is known to be the better way of extracting a particular part of the temporal information, and also provides a future research direction.

A Study on Extracting News Contents from News Web Pages (뉴스 웹 페이지에서 기사 본문 추출에 관한 연구)

  • Lee, Yong-Gu
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.1
    • /
    • pp.305-320
    • /
    • 2009
  • The news pages provided through the web contain unnecessary information. This causes low performance and inefficiency of the news processing system. In this study, news content extraction methods, which are based on sentence identification and block-level tags news web pages, was suggested. To obtain optimal performance, combinations of these methods were applied. The results showed good performance when using an extraction method which applied the sentence identification and eliminated hyperlink text from web pages. Moreover, this method showed better results when combined with the extraction method which used block-level. Extraction methods, which used sentence identification, were effective for raising the extraction recall ratio.

A Study on the Integration of Recognition Technology for Scientific Core Entities (과학기술 핵심개체 인식기술 통합에 관한 연구)

  • Choi, Yun-Soo;Jeong, Chang-Hoo;Cho, Hyun-Yang
    • Journal of the Korean Society for information Management
    • /
    • v.28 no.1
    • /
    • pp.89-104
    • /
    • 2011
  • Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In order to extract these entities automatically from scientific documents at once, we developed a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer and terminology extractor.

Automatic Building Extraction from Airborne Laser Scanning Data using TIN

  • Jeong Jae-Wook;Chang Hwi-Jeong;Cho Woosug;Kim Kyoung-ok
    • Proceedings of the KSRS Conference
    • /
    • /
    • pp.132-135
    • /
    • 2004
  • Building information plays a key role in diverse applications such as urban planning, telecommunication and environment monitoring. Automatic building extraction has been a prime interest in the field of GIS and photogrammetry. In this paper, we presented an automatic approach for building extraction from lidar data. The proposed approach is divided into four processes: pre-processing, filtering, segmentation and building extraction. Experimental results showed that the proposed method detected most of buildings with less commission and omission errors.

  • PDF

A Distance Approach for Open Information Extraction Based on Word Vector

  • Liu, Peiqian;Wang, Xiaojie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2470-2491
    • /
    • 2018
  • Web-scale open information extraction (Open IE) plays an important role in NLP tasks like acquiring common-sense knowledge, learning selectional preferences and automatic text understanding. A large number of Open IE approaches have been proposed in the last decade, and the majority of these approaches are based on supervised learning or dependency parsing. In this paper, we present a novel method for web scale open information extraction, which employs cosine distance based on Google word vector as the confidence score of the extraction. The proposed method is a purely unsupervised learning algorithm without requiring any hand-labeled training data or dependency parse features. We also present the mathematically rigorous proof for the new method with Bayes Inference and Artificial Neural Network theory. It turns out that the proposed algorithm is equivalent to Maximum Likelihood Estimation of the joint probability distribution over the elements of the candidate extraction. The proof itself also theoretically suggests a typical usage of word vector for other NLP tasks. Experiments show that the distance-based method leads to further improvements over the newly presented Open IE systems on three benchmark datasets, in terms of effectiveness and efficiency.

Research and Development of a Geological Remote Sensing Information Extraction System

  • Zhengmin, He
    • Proceedings of the KSRS Conference
    • /
    • /
    • pp.1275-1277
    • /
    • 2003
  • This paper presents a geological remote sensing information extraction system, the aim of which is to provide practical models and powerful tools to extract geological information from remote sensing images for geological exploration applications. After reviewing and analyzing the existing methods for geological information extraction, we developed more than ten models to enhance and extract geological information, such as alteration information, linear features and special lithological characters. The system is developed based on Erdas Imagine using its programming language. It has been successfully used in the 'reat Investigation of Land and Natural Resources of China' program.

  • PDF

Research and Development of a Geological Remote Sensing Information Extraction System

  • Zhengmin, He
    • Proceedings of the KSRS Conference
    • /
    • /
    • pp.1442-1444
    • /
    • 2003
  • This paper presents a geological remote sensing information extraction system, the aim of which is to provide practical models and powerful tools to extract geological information from remote sensing images for geological exploration applications. After reviewing and analyzing the existing methods for geological information extraction, we developed more than ten models to enhance and extract geological information, such as alteration information, linear features and special lithological characters. The system is developed based on Erdas Imagine using its programming language. It has been successfully used in the ‘Great Investigation of Land and Natural Resources of China’ program.

  • PDF