• Title/Summary/Keyword: schema extraction

Search Result 46, Processing Time 0.022 seconds

Efficient Structural Information Extraction for XML Data (XML데이터를 위한 효율적인 구조 정보 추출 기법)

  • Min, Jun-Ki
    • The KIPS Transactions:PartD
    • /
    • v.14D no.3 s.113
    • /
    • pp.285-292
    • /
    • 2007
  • There has been an increasing interest in n since it is spotlighted as the standard for data representation and exchange in the Web. The structural information for XML documents serves several important purposes. In spite of its importance, the schema is not mandatory for XML documents. Thus, much research to extract structural information for XML document has been conducted. In this paper, we present a technique for efficient extraction of concise and accurate DTD for XML documents. By restriction of DTD content model using the mixed content model of DTD and XML Schema as well as applying some heuristic rules proposed in this paper, we achieve the efficiency and conciseness. The result of an experiment with real life DTDs shows that our approach is superior to existing approaches.

Web Information Extraction using HTML Tag Pattern (HTML 태그페턴을 이용한 웹정보추출시스템)

  • Park, Byung-Kwon
    • Proceedings of the Korea Association of Information Systems Conference
    • /
    • 2005.05a
    • /
    • pp.79-92
    • /
    • 2005
  • To query the vast amount of web pages which are available i]l the Internet, it is necessary to extract the encoded information in the web pages for converting it into structured data (e.g. relational data for SQL) or semistructured data (e.g. XML data for XQuery), In this paper, we propose a new web information extraction system, PIES, to convert web information into XML documents. PIES is based on a user-specified target schema and HTML tag pattern descriptions. The web information is extracted by the pattern descriptions and validated by the target schema. We designed a new language to describe extraction rules, and a new regular expression to describe HTML tag patterns. We implemented PIES and applied it to the US patent web site to evaluate its correctness. It successfully extracted more than thousands of US patent data and converted them into XML documents.

  • PDF

The Levelized Schema Extraction in XML Documents (XML 문서에서의 단계화된 스키마 추출)

  • 김성림;윤용익
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.1
    • /
    • pp.105-113
    • /
    • 2002
  • XML documents, which are becoming new standard for expressing and exchanging data in the Internet, don't have defined schema. It is not adequate to directly apply XML documents to the existing SQL or OQL. Research on how to extract schema for XML documents and query language is going on actively. Fer users' query, the results could be too many or too less. It is important to give the users adequate results. This paper suggests the way to extract many levelized schema according to the frequency of element occurrence in XML documents. The Schema can be reduced or extended to correspond to the users'query more flexibly.

  • PDF

The Schema Extraction Method for GA Preserving Diversity of the Distributions in Population (개체 분포의 다양성을 유지시키는 GA를 위한 스키마 추출 기법)

  • Jo, Yong-Gun;Jang, Sung-Hwan;Hoon Kang
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2000.05a
    • /
    • pp.232-235
    • /
    • 2000
  • In this paper, we introduce a new genetic reordering operator based on the concept of schema to solve the Traveling Salesman Problem(TSP). Because TSP is a well-known combinatorial optimization problem and belongs to a NP-complete problem, there is a huge solution space to be searched. For robustness to local minima, the operator separates selected strings into two parts to reduce the destructive probability of good building blocks. And it applies inversion to the schema part to prevent the premature convergence. At the same time, it searches new spaces of solutions. In addition, we have the non-schema part to be applied to inversion as well as for robustness to local minima. By doing so, we can preserve diversity of the distributions in population and make GA be adaptive to the dynamic environment.

  • PDF

Extraction of Relational Schema from XML Schema (XML 스키마로부터 관계형 스키마 추출 기법)

  • 김은욱;민미경
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2002.11b
    • /
    • pp.351-354
    • /
    • 2002
  • 데이터로서 XML의 의미가 중요해짐에 따라 XML 문서를 저장하는 방법들에 대한 연구가 활발히 진행되고 있다. 그 중 하나가 스키마를 이용하여 XML 문서를 관계형 데이터베이스에 저장하는 것으로서, 지금까지 DTD를 중심으로 연구가 이루어져 왔으나, XML 스키마의 등장으로 DTD의 단점을 보완하고, 기존 관계형 데이터베이스와 보다 유사하게 표현 할 수 있게 되었다. 본 논문에서는 XML 스키마에서 관계형 스키마를 추출하는 기법을 제시한다. 제시된 기법은 DTD에서 관계형 스키마를 추출하는 기법을 기반으로 하여, DTD에서 표현할 수 없는 사용자 정의 데이터형을 추가로 제공하는 등, XML 스키마의 속성과 요소에 따른 여러 특성을 표현할 수 있다.

  • PDF

Design of Formalized message exchanging method using XMDR (XMDR을 이용한 정형화된 메시지 교환 기법 설계)

  • Hwang, Chi-Gon;Jung, Kye-Dong;Choi, Young-Keun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.6
    • /
    • pp.1087-1094
    • /
    • 2008
  • Recently, XML has been widely used as a standard for a data exchange, and there has emerged the tendency that the size of XML document becomes larger. The data transfer can cause problems due to the increase in traffic, especially when a massive data such as Data Warehouse is being collected and analyzed. Therefore, an XMDR wrapper can solve this problem since it analyzes the tree structures of XML Schema, regenerates XML Schema using the analyzed tree structures, and sends it to each station with an XMDR Query. XML documents which are returned as an outcome encode XML tags according to XML Schema, and send standardized messages. As the formalized XML documents decrease network traffic and comprise XML class information, they are efficient for extraction, conversion, and alignment of data. In addition, they are efficient for the conversion process through XSLT, too, as they have standardized forms. In this paper we profuse a method in which XML Schema and XMDR_Query sent to each station are generated through XMDR(extended Meta-Data Registry) and the generation of products and XML conversion occur in each station wrapper.

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

  • Kim, JaeHun;Lee, Myungjin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.43-61
    • /
    • 2019
  • Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.

Designing Schemes to Associate Basic Semantics Register with RDF/OWL (기본의미등록기의 RDF/OWL 연계방안에 관한 연구)

  • Oh, Sam-Gyun
    • Journal of the Korean Society for information Management
    • /
    • v.20 no.3
    • /
    • pp.241-259
    • /
    • 2003
  • The Basic Semantic Register(BSR) is and official ISO register designed for interoperability among eBusiness and EDI systems. The entities registered in the current BSR are not defined in a machine-understandable way, which renders automatic extraction of structural and relationship information from the register impossible. The purpose of this study is to offer a framework for designing an ontology that can provide semantic interoperability among BSR-based systems by defining data structures and relationships with RDF and OWL, similar meaning by the 'equivalentClass' construct in OWL, the hierachical relationships among classes by the 'subClassOf' construct in RDF schema, definition of any entities in BSR by the 'label' construct in RDF schema, specification of usage guidelines by the 'comment' construct in RDF schema, assignment of classes to BSU's by the 'domain' construct in RDF schema, specification of data types of BSU's by the 'range' construct in RDF schema. Hierarchical relationships among properties in BSR can be expressed using the 'subPropertyOf' in RDF schema. Progress in semantic interoperability can be expected among BSR-based systems through applications of semantic web technology suggested in this study.

Product Information Extraction System Based on STEP in CPC Environment (협업적 제품 거래 환경에서 STEP 기반의 제품정보 추출 시스템)

  • Keem, Joon-Hyoung;Park, Sang-Ho;Kim, Hyun
    • Proceedings of the KSME Conference
    • /
    • 2003.11a
    • /
    • pp.1840-1845
    • /
    • 2003
  • Collaborative product commerce (CPC) supports a collaboration that a global enterprise and customer related to life cycle of product share product information and a collaboration process for the collaboration, and integrating applications. In this paper, we use common data schema in order to solve a interoperability problem about shared product information between enterprises. And we map to common data schema from each other different data format. Therefore we implement CPC Adaptor in order to integrate distributed product information.

  • PDF

Product Information Extraction System Based on STEP in CPC Environment (협업적 제품 거래 환경에서 STEP 기반의 제품정보 추출 시스템)

  • Park, Sang-Ho;Keem, Joon-Hyoung;Kim, Hyun
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.28 no.5
    • /
    • pp.648-653
    • /
    • 2004
  • Collaborative product commerce (CPC) supports a collaboration that a global enterprise and customer related to lift cycle of product share product information and a collaboration process for the collaboration, and integrating applications. In this paper, we use common data schema in order to solve a interoperability problem about shared product information between enterprises. And we map to common data schema from each other different data format. Therefore we implement CPC Adaptor in order to integrate distributed product information.