DOI QR코드

DOI QR Code

Scalable RDFS Reasoning Using the Graph Structure of In-Memory based Parallel Computing

인메모리 기반 병렬 컴퓨팅 그래프 구조를 이용한 대용량 RDFS 추론

  • 전명중 (숭실대학교 컴퓨터공학과) ;
  • 소치승 (숭실대학교 컴퓨터공학과) ;
  • 바트셀렘 (숭실대학교 컴퓨터공학과) ;
  • 김강필 (숭실대학교 컴퓨터공학과) ;
  • 김진 (숭실대학교 컴퓨터공학과) ;
  • 홍진영 (숭실대학교 컴퓨터공학과) ;
  • 박영택 (숭실대학교 컴퓨터공학과)
  • Received : 2015.01.27
  • Accepted : 2015.06.05
  • Published : 2015.08.15

Abstract

In recent years, there has been a growing interest in RDFS Inference to build a rich knowledge base. However, it is difficult to improve the inference performance with large data by using a single machine. Therefore, researchers are investigating the development of a RDFS inference engine for a distributed computing environment. However, the existing inference engines cannot process data in real-time, are difficult to implement, and are vulnerable to repetitive tasks. In order to overcome these problems, we propose a method to construct an in-memory distributed inference engine that uses a parallel graph structure. In general, the ontology based on a triple structure possesses a graph structure. Thus, it is intuitive to design a graph structure-based inference engine. Moreover, the RDFS inference rule can be implemented by utilizing the operator of the graph structure, and we can thus design the inference engine according to the graph structure, and not the structure of the data table. In this study, we evaluate the proposed inference engine by using the LUBM1000 and LUBM3000 data to test the speed of the inference. The results of our experiment indicate that the proposed in-memory distributed inference engine achieved a performance of about 10 times faster than an in-storage inference engine.

근래에 들어 풍부한 지식베이스를 구축하기 위한 대용량 RDFS 추론에 대한 관심이 높아지면서 기존의 단일 머신으로는 대용량 데이터의 추론 성능을 향상시키기에 한계가 있다. 그래서 분산 환경에서 의 RDFS 추론 엔진 개발이 활발히 연구되고 있다. 하지만 기존의 분산 환경 엔진은 실시간 처리가 불가능 하며 구현이 어렵고 반복 작업에 취약하다. 본 논문에서는 이러한 문제를 극복하기 위해 병렬 그래프 구조 를 사용한 인-메모리 분산 추론 엔진 구축 방법을 제안한다. 트리플 형태의 온톨로지는 기본적으로 그래프 구조를 가지고 있으므로 그래프 구조 기반의 추론 엔진을 설계하는 것이 직관적이다. 또한 그래프 구조를 활용하는 오퍼레이터를 활용하여 RDFS 추론 규칙을 구현함으로써 기존의 데이터 관점과 달리 그래프 구조의 관점에서 설계할 수 있다. 본 논문에서 제안한 추론 엔진을 평가하기 위해 LUBM1000(1억 3천 3백만 트리플, 17.9GB), LUBM3000(4억 1천 3백만 트리플, 54.3GB)에 대해 추론 속도를 실험을 하였으며 실 험결과, 비-인메모리 분산 추론 엔진보다 약 10배 정도 빠른 추론 성능을 보였다.

Keywords

Acknowledgement

Supported by : 정보통신기술진흥센터

References

  1. Bumsuk Jang, et al., "Transitivity Reasoning for RDF Ontology with Iterative MapReduce," IMIS Seventh International Conference, pp. 232-237, 2013.
  2. Bo Liu, Liang wU, et al., "Exploiting Incremental Reasoning in Healthcare Based on Hadoop and Amazon Cloud," AAAI-14 Workshop, 2014.
  3. Jesse Weaver and James A. Hendler, "Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples," The SEMANTIC Web-ISWC2009, pp. 682-697, 2009.
  4. Oren, Eyal, et al., "Marvin: A platform for largescale analysis of Semantic Web data," Web Science Trust, Jan. pp. 18-20, 2009.
  5. Tanimura Y, et al., "Extensions to the Pig data processing platform for scalable RDF data processing using Hadoop," ICDEW, IEEE 26th International Conference, pp. 251-256, Mar. 2010.
  6. Urbani, Jacopo, et al., "WebPIE: A Web-scale parallel inference engine using MapReduce," Web Semantics: Science, Services and Agents on the World Wide Web 10, pp. 59-75, 2012. https://doi.org/10.1016/j.websem.2011.05.004
  7. Matei Zaharia, et al., "Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing," NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, pp. 2-2, Apr. 2012.
  8. Joseph E. Gonzalez, Reynold S. Xin, et. al., "GraphX: graph processing in a distributed dataflow framework," USENIX on Operating Systems Design and Implementation(OSDI '14), pp. 599-613, Oct. 2014.
  9. Reynold S. Xin, Joseph E. Gonzalez, et. al., "GraphX: a resilient distributed graph system on Spark," Graph Data Management Experiences and Systems (GRADES '13), Jun. 2013.
  10. Reynold S. Xin, Daniel Crankshaw, et. al., "GraphX: Unifying Data-Parallel and Graph-Parallel Analytics," arXiv.org Computer Science arXiv:1402.2394, Feb. 2014.
  11. Wan-Gon Lee, et al., "Distributed Table Join for Scalable RDFS Reasoning on Cloud Computing Environment," Journal of KIISE, pp. 674-685, Sep. 2014.
  12. Jagvaral Batselem, Young-Tack Park, "Distributed scalable RDFS reasoning," Big Data and Smart Computing, pp. 31-34, Feb. 2015.
  13. Jose Emilio Labra Gayo, Johan Jeuring, et. al., "Inductive Triple Graphs: A Purely Functional Approach to Represent RDF," Third International Workshop on Graph Structures for knowledge Representation and Reasoning, Vol. 8323, pp. 92-110, Aug. 2013.
  14. Semih Salihoglu, Jennifer Widom, "HelP: High-level Primitives For Large-Scale Graph Processing," GRADES'14 Proc. of Workshop on GRAph Data management Experiences and Systems, pp. 1-6, Jun. 2014.
  15. Kavitha Srinivas, "OWL Reasoning in the Real World: Searching for Godot," Description Logics, pp. 27-30, Jul. 2009.