• Title/Summary/Keyword: RDF Triples

Search Result 26, Processing Time 0.02 seconds

Conversion of Large RDF Data using Hash-based ID Mapping Tables with MapReduce Jobs (맵리듀스 잡을 사용한 해시 ID 매핑 테이블 기반 대량 RDF 데이터 변환 방법)

  • Kim, InA;Lee, Kyu-Chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.236-239
    • /
    • 2021
  • With the growth of AI technology, the scale of Knowledge Graphs continues to be expanded. Knowledge Graphs are mainly expressed as RDF representations that consist of connected triples. Many RDF storages compress and transform RDF triples into the condensed IDs. However, if we try to transform a large scale of RDF triples, it occurs the high processing time and memory overhead because it needs to search the large ID mapping table. In this paper, we propose the method of converting RDF triples using Hash-based ID mapping tables with MapReduce, which is the software framework with a parallel, distributed algorithm. Our proposed method not only transforms RDF triples into Integer-based IDs, but also improves the conversion speed and memory overhead. As a result of our experiment with the proposed method for LUBM, the size of the dataset is reduced by about 3.8 times and the conversion time was spent about 106 seconds.

  • PDF

RDF Triple Processing Methodology for Web Service in Semantic Web Environment (시맨틱 웹 환경에서 웹 서비스를 위한 RDF Triple 처리기법)

  • Jeong Kwan-Ho;Kim Pan-Koo;Kim Kweon-Cheon
    • Journal of Internet Computing and Services
    • /
    • v.7 no.2
    • /
    • pp.9-21
    • /
    • 2006
  • Researches on enhancing the searching function of the web service using the ontology concept have been studying. One of them suggests a searching method for UDDI using DAML and DAML+OIL. However this approach has inconveniences to use operations proper to the circumstance and to define respective ontologies according to them. To solve these problems, we introduce an effective method of dealing with N-Triple, filtering care Triples, merging Triples, semantic connection between Triples and searching Triples for searching information and recommending the results in semantic web environment. Furthermore, we implement this proposed method in a system to test it. Finally, we test the system in the virtual semantic web environment for out research analysis.

  • PDF

The Basic Concepts Classification as a Bottom-Up Strategy for the Semantic Web

  • Szostak, Rick
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.4 no.1
    • /
    • pp.39-51
    • /
    • 2014
  • The paper proposes that the Basic Concepts Classification (BCC) could serve as the controlled vocabulary for the Semantic Web. The BCC uses a synthetic approach among classes of things, relators, and properties. These are precisely the sort of concepts required by RDF triples. The BCC also addresses some of the syntactic needs of the Semantic Web. Others could be added to the BCC in a bottom-up process that carefully evaluates the costs, benefits, and best format for each rule considered.

Structural Change Detection Technique for RDF Data in MapReduce (맵리듀스에서의 구조적 RDF 데이터 변경 탐지 기법)

  • Lee, Taewhi;Im, Dong-Hyuk
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.8
    • /
    • pp.293-298
    • /
    • 2014
  • Detecting and understanding the changes between RDF data is crucial in the evolutionary process, synchronization system, and versioning system on the web of data. However, current researches on detecting changes still remain unsatisfactory in that they did neither consider the large scale of RDF data nor accurately produce the RDF deltas. In this paper, we propose a scalable and effective change detection using a MapReduce framework which has been used in many fields to process and analyze large volumes of data. In particular, we focus on the structure-based change detection that adopts a strategy for the comparison of blank nodes in RDF data. To achieve this, we employ a method which is composed of two MapReduce jobs. First job partitions the triples with blank nodes by grouping each triple with the same blank node ID and then computes the incoming path to the blank node. Second job partitions the triples with the same path and matchs blank nodes with the Hungarian method. In experiments, we show that our approach is more accurate and effective than the previous approach.

An Efficient Indexing Scheme Considering the Characteristics of Large Scale RDF Data (대규모 RDF 데이터의 특성을 고려한 효율적인 색인 기법)

  • Kim, Kiyeon;Yoon, Jonghyeon;Kim, Cheonjung;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.1
    • /
    • pp.9-23
    • /
    • 2015
  • In this paper, we propose a new RDF index scheme considering the characteristics of large scale RDF data to improve the query processing performance. The proposed index scheme creates a S-O index for subjects and objects since the subjects and objects of RDF triples are used redundantly. In order to reduce the total size of the index, it constructs a P index for the relatively small number of predicates in RDF triples separately. If a query contains the predicate, we first searches the P index since its size is relatively smaller compared to the S-O index. Otherwise, we first searches the S-O index. It is shown through performance evaluation that the proposed scheme outperforms the existing scheme in terms of the query processing time.

A Trustworthiness Improving Link Evaluation Technique for LOD considering the Syntactic Properties of RDFS, OWL, and OWL2 (RDFS, OWL, OWL2의 문법특성을 고려한 신뢰향상적 LOD 연결성 평가 기법)

  • Park, Jaeyeong;Sohn, Yonglak
    • Journal of KIISE:Databases
    • /
    • v.41 no.4
    • /
    • pp.226-241
    • /
    • 2014
  • LOD(Linked Open Data) is composed of RDF triples which are based on ontologies. They are identified, linked, and accessed under the principles of linked data. Publications of LOD data sets lead to the extension of LOD cloud and ultimately progress to the web of data. However, if ontologically the same things in different LOD data sets are identified by different URIs, it is difficult to figure out their sameness and to provide trustworthy links among them. To solve this problem, we suggest a Trustworthiness Improving Link Evaluation, TILE for short, technique. TILE evaluates links in 4 steps. Step 1 is to consider the inference property of syntactic elements in LOD data set and then generate RDF triples which have existed implicitly. In Step 2, TILE appoints predicates, compares their objects in triples, and then evaluates links between the subjects in the triples. In Step 3, TILE evaluates the predicates' syntactic property at the standpoints of subject description and vocabulary definition and compensates the evaluation results of Step 2. The syntactic elements considered by TILE contain RDFS, OWL, OWL2 which are recommended by W3C. Finally, TILE makes the publisher of LOD data set review the evaluation results and then decide whether to re-evaluate or finalize the links. This leads the publishers' responsibility to be reflected in the trustworthiness of links among the data published.

Efficient Change Detection between RDF Models Using Backward Chaining Strategy (후방향 전진 추론을 이용한 RDF 모델의 효율적인 변경 탐지)

  • Im, Dong-Hyuk;Kim, Hyoung-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.2
    • /
    • pp.125-133
    • /
    • 2009
  • RDF is widely used as the ontology language for representing metadata on the semantic web. Since ontology models the real-world, ontology changes overtime. Thus, it is very important to detect and analyze changes in knowledge base system. Earlier studies on detecting changes between RDF models focused on the structural differences. Some techniques which reduce the size of the delta by considering the RDFS entailment rules have been introduced. However, inferencing with RDF models increases data size and upload time. In this paper, we propose a new change detection using RDF reasoning that only computes a small part of the implied triples using backward chaining strategy. We show that our approach efficiently detects changes through experiments with real-life RDF datasets.

TripleDiff: an Incremental Update Algorithm on RDF Documents in Triple Stores (TripleDiff: 트리플 저장소에서 RDF 문서에 대한 점진적 갱신 알고리즘)

  • Lee, Tae-Whi;Kim, Ki-Sung;Yoo, Sang-Won;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.5
    • /
    • pp.476-485
    • /
    • 2006
  • The Resource Description Framework(RDF), which emerged with the semantic web, is settling down as a standard for representing information about the resources in the World Wide Web Hence, a lot of research on storing and query processing RDF documents has been done and several RDF storage systems, such as Sesame and Jena, have been developed. But the research on updating RDF documents is still insufficient. When a RDF document is changed, data in the RDF triple store also needs to be updated. However, current RDF triple stores don't support incremental update. So updating can be peformed only by deleting the old version and then storing the new document. This updating method is very inefficient because RDF documents are steadily updated. Furthermore, it makes worse when several RDF documents are stored in the same database. In this paper, we propose an incremental update algorithm on RDF, documents in triple stores. We use a text matching technique for two versions of a RDF document and compensate for the text matching result to find the right target triples to be updated. We show that our approach efficiently update RDF documents through experiments with real-life RDF datasets.

R2RML Based ShEx Schema

  • Choi, Ji-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.45-55
    • /
    • 2018
  • R2RML is a W3C standard language that defines how to expose the relational data as RDF triples. The output from an R2RML mapping is only an RDF dataset. By definition, the dataset has no schema. The lack of schema makes the dataset in linked data portal impractical for integrating and analyzing data. To address this issue, we propose an approach for generating automatically schemas for RDF graphs populated by R2RML mappings. More precisely, we represent the schema using ShEx, which is a language for validating and describing RDF. Our approach allows to generate ShEx schemas as well as RDF datasets from R2RML mappings. Our ShEx schema can provide benefits for both data providers and ordinary users. Data providers can verify and guarantee the structural integrity of the dataset against the schema. Users can write SPARQL queries efficiently by referring to the schema. In this paper, we describe data structures and algorithms of the system to derive ShEx documents from R2RML documents and presents a brief demonstration regarding its proper use.

An elastic distributed parallel Hadoop system for bigdata platform and distributed inference engines (동적 분산병렬 하둡시스템 및 분산추론기에 응용한 서버가상화 빅데이터 플랫폼)

  • Song, Dong Ho;Shin, Ji Ae;In, Yean Jin;Lee, Wan Gon;Lee, Kang Se
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1129-1139
    • /
    • 2015
  • Inference process generates additional triples from knowledge represented in RDF triples of semantic web technology. Tens of million of triples as an initial big data and the additionally inferred triples become a knowledge base for applications such as QA(question&answer) system. The inference engine requires more computing resources to process the triples generated while inferencing. The additional computing resources supplied by underlying resource pool in cloud computing can shorten the execution time. This paper addresses an algorithm to allocate the number of computing nodes "elastically" at runtime on Hadoop, depending on the size of knowledge data fed. The model proposed in this paper is composed of the layered architecture: the top layer for applications, the middle layer for distributed parallel inference engine to process the triples, and lower layer for elastic Hadoop and server visualization. System algorithms and test data are analyzed and discussed in this paper. The model hast the benefit that rich legacy Hadoop applications can be run faster on this system without any modification.