• Title/Summary/Keyword: Information Extraction

Search Result 5,282, Processing Time 0.045 seconds

Feature Extraction of the 3-Dimensional Objects with Circular Cross Sections (단면이 원인 3차원 물체의 특징 추출)

  • Cho, Dong-Uk
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.4
    • /
    • pp.866-876
    • /
    • 1996
  • A feature extraction method for the objects that have a circular cross section is proposed.To implement a robust recognition system which can effectively deal with various types of 2-dimensional image and 3-dimensional image, both 2- dimensional information and 3-dimensional information should be collectively extracted and combined for the optimum. For this, this paper presents a feature extraction method for 3-dimensional objects, particularly for the objects with a circular cross section which most objects in the real world are known to have. Firstly, the Z gradient is proposed to extract the shape information from those objects. Using this information, normal vectors are derived from the surface patches. The intersection points between the vectors are applied to the geometric feature extraction.Also, for more accurate recognition, a feature extraction method for between surface regions is proposed.Finally, the extraction method of function information is investigated for the final recognition process.The usefulness of the proposed method is proved through the experimentation.

  • PDF

Efficient Management of Moving Object Trajectories in the Stream Environment (스트림 환경에서 이동객체 궤적의 효율적 관리)

  • Lee, Won-Cheol;Moon, Yang-Sae;Rhee, Sang-Min
    • Journal of KIISE:Databases
    • /
    • v.34 no.4
    • /
    • pp.343-356
    • /
    • 2007
  • Due to advances in position monitoring technologies such as global positioning systems and sensor networks, recent position information of moving objects has the form of streaming data which are updated continuously and rapidly. In this paper we propose an efficient trajectory maintenance method that stores the streaming position data of moving objects in the limited size of storage space and estimates past positions based on the stored data. For this, we first propose a new concept of incremental extraction of position information. The incremental extraction means that, whenever a new position is added into the system, we incrementally re-compute the new version of past position data maintained in the system using the current version of past position data and the newly added position. Next, based on the incremental extraction, we present an overall framework that stores position information and estimates past positions in the stream environment. We then propose two polynomial-based methods, line-based and curve-based methods, as the method of estimating the past positions on the framework. We also propose three incremental extraction methods: equi-width, slope-based, and recent-emphasis extraction methods. Experimental results show that the proposed incremental extraction provides the relatively high accuracy (error rate is less than 3%) even though we maintain only a little portion (only 0.1%) of past position information. In particular, the curve-based incremental extraction provides very low error rate of 1.5% even storing 0.1% of total position data. These results indicate that our incremental extraction methods provide an efficient framework for storing the position information of moving objects and estimating the past positions in the stream environment.

Development and Evaluation of Information Extraction Module for Postal Address Information (우편주소정보 추출모듈 개발 및 평가)

  • Shin, Hyunkyung;Kim, Hyunseok
    • Journal of Creative Information Culture
    • /
    • v.5 no.2
    • /
    • pp.145-156
    • /
    • 2019
  • In this study, we have developed and evaluated an information extracting module based on the named entity recognition technique. For the given purpose in this paper, the module was designed to apply to the problem dealing with extraction of postal address information from arbitrary documents without any prior knowledge on the document layout. From the perspective of information technique practice, our approach can be said as a probabilistic n-gram (bi- or tri-gram) method which is a generalized technique compared with a uni-gram based keyword matching. It is the main difference between our approach and the conventional methods adopted in natural language processing that applying sentence detection, tokenization, and POS tagging recursively rather than applying the models sequentially. The test results with approximately two thousands documents are presented at this paper.

A Knowledge-based Wrapper Learning Agent for Semi-Structured Information Sources (준구조화된 정보소스에 대한 지식기반의 Wrapper 학습 에이전트)

  • Seo, Hee-Kyoung;Yang, Jae-Young;Choi, Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.1_2
    • /
    • pp.42-52
    • /
    • 2002
  • Information extraction(IE) is a process of recognizing and fetching particular information fragments from a document. In previous work, most IE systems generate the extraction rules called the wrappers manually, and although this manual wrapper generation may achieve more correct extraction, it reveals some problems in flexibility, extensibility, and efficiency. Some other researches that employ automatic ways of generating wrappers are also experiencing difficulties in acquiring and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources, and as a result, the real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents an agent-based information extraction system named XTROS that exploits the domain knowledge to learn from documents in a semi-structured information source. This system generates a wrapper for each information source automatically and performs information extraction and information integration by applying this wrapper to the corresponding source. In XTROS, both the domain knowledge and the wrapper are represented as XML-type documents. The wrapper generation algorithm first recognizes the meaning of each logical line of a sample document by using the domain knowledge, and then finds the most frequent pattern from the sequence of semantic representations of the logical lines. Eventually, the location and the structure of this pattern represented by an XML document becomes the wrapper. By testing XTROS on several real-estate information sites, we claim that it creates the correct wrappers for most Web sources and consequently facilitates effective information extraction and integration for heterogeneous and complex information sources.

Extraction of similar XML data based on XML structure and processing unit

  • Park, Jong-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.4
    • /
    • pp.59-65
    • /
    • 2017
  • XML has established itself as the format for data exchange on the internet and the volume of its instance is large scale. Therefore, to extract similar information from XML instance is one of research topics but is insufficient. In this paper, we extract similar information from various kind of XML instances according to the same goal. Also we use only the structure information of XML instance for information extraction because some of XML instance is described without its schema. In order to efficiently extract similar information, we propose a minimum unit of processing and two approaches for finding the unit. The one is a structure-based method which uses only the structure information of XML instance and another is a measure-based method which finds a unit by numerical formula. Our two approaches can be applied to any application that needs the extraction of similar information based on XML data. Also the approach can be used for HTML instance.

A Keyphrase Extraction Model for Each Conference or Journal (학술대회 및 저널별 기술 핵심구 추출 모델)

  • Jeong, Hyun Ji;Jang, Gwangseon;Kim, Tae Hyun;Sin, Donggu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.81-83
    • /
    • 2022
  • Understanding research trends is necessary to select research topics and explore related works. Most researchers search representative keywords of interesting domains or technologies to understand research trends. However some conferences in artificial intelligence or data mining fields recently publish hundreds to thousands of papers for each year. It makes difficult for researchers to understand research trend of interesting domains. In our paper, we propose an automatic technology keyphrase extraction method to support researcher to understand research trend for each conference or journal. Keyphrase extraction that extracts important terms or phrases from a text, is a fundamental technology for a natural language processing such as summarization or searching, etc. Previous keyphrase extraction technologies based on pretrained language model extract keyphrases from long texts so performances are degraded in short texts like titles of papers. In this paper, we propose a techonolgy keyphrase extraction model that is robust in short text and considers the importance of the word.

  • PDF

A Collaborative Framework for Discovering the Organizational Structure of Social Networks Using NER Based on NLP (NLP기반 NER을 이용해 소셜 네트워크의 조직 구조 탐색을 위한 협력 프레임 워크)

  • Elijorde, Frank I.;Yang, Hyun-Ho;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.2
    • /
    • pp.99-108
    • /
    • 2012
  • Many methods had been developed to improve the accuracy of extracting information from a vast amount of data. This paper combined a number of natural language processing methods such as NER (named entity recognition), sentence extraction, and part of speech tagging to carry out text analysis. The data source is comprised of texts obtained from the web using a domain-specific data extraction agent. A framework for the extraction of information from unstructured data was developed using the aforementioned natural language processing methods. We simulated the performance of our work in the extraction and analysis of texts for the detection of organizational structures. Simulation shows that our study outperformed other NER classifiers such as MUC and CoNLL on information extraction.

Text Extraction and Summarization from Web News (웹 뉴스의 기사 추출과 요약)

  • Han, Kwang-Rok;Sun, Bok-Keun;Yoo, Hyoung-Sun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.5
    • /
    • pp.1-10
    • /
    • 2007
  • Many types of information provided through the web including news contents contain unnecessary clutters. These clutters make it difficult to build automated information processing systems such as the summarization, extraction and retrieval of documents. We propose a system that extracts and summarizes news contents from the web. The extraction system receives news contents in HTML as input and builds an element tree similar to DOM tree, and extracts texts while removing clutters with the hyperlink attribute in the HTML tag from the element tree. Texts extracted through the extraction system are transferred to the summarization system, which extracts key sentences from the texts. We implement the summarization system using co-occurrence relation graph. The summarized sentences of this paper are expected to be transmissible to PDA or cellular phone by message services such as SMS.

  • PDF

An Ontology-based Knowledge Management System - Integrated System of Web Information Extraction and Structuring Knowledge -

  • Mima, Hideki;Matsushima, Katsumori
    • Proceedings of the CALSEC Conference
    • /
    • 2005.03a
    • /
    • pp.55-61
    • /
    • 2005
  • We will introduce a new web-based knowledge management system in progress, in which XML-based web information extraction and our structuring knowledge technologies are combined using ontology-based natural language processing. Our aim is to provide efficient access to heterogeneous information on the web, enabling users to use a wide range of textual and non textual resources, such as newspapers and databases, effortlessly to accelerate knowledge acquisition from such knowledge sources. In order to achieve the efficient knowledge management, we propose at first an XML-based Web information extraction which contains a sophisticated control language to extract data from Web pages. With using standard XML Technologies in the system, our approach can make extracting information easy because of a) detaching rules from processing, b) restricting target for processing, c) Interactive operations for developing extracting rules. Then we propose a structuring knowledge system which includes, 1) automatic term recognition, 2) domain oriented automatic term clustering, 3) similarity-based document retrieval, 4) real-time document clustering, and 5) visualization. The system supports integrating different types of databases (textual and non textual) and retrieving different types of information simultaneously. Through further explanation to the specification and the implementation technique of the system, we will demonstrate how the system can accelerate knowledge acquisition on the Web even for novice users of the field.

  • PDF

Object Extraction technique Using Belief Propagation Stereo Algorithm of Bidirectional Search based on Brightness (밝기기반 양방향 탐색기법의 신뢰전파 스테레오 알고리즘을 이용한 물체 추출 기법)

  • Choi, Young-Seok;Choi, Kyung-Seok;Kang, Hyun-Soo
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.313-314
    • /
    • 2007
  • In this paper, we suggest robust object extraction algorithm taking advantage of efficient Belief Propagation method. It does not get a disparity information because of uniform region and occlusion region etc. on initial depth map that use forward direction disparity information although is object area. Therefore, We run parallel backward disparity information and brightness information for certain object extraction.

  • PDF