DOI QR코드

DOI QR Code

Join Query Performance Optimization Based on Convergence Indexing Method

융합 인덱싱 방법에 의한 조인 쿼리 성능 최적화

  • Received : 2020.11.11
  • Accepted : 2021.02.17
  • Published : 2021.02.28

Abstract

Since RDF (Resource Description Framework) triples are modeled as graph, we cannot directly adopt existing solutions in relational databases and XML technology. In order to store, index, and query Linked Data more efficiently, we propose a convergence indexing method combined R*-tree and K-dimensional trees. This method uses a hybrid storage system based on HDD (Hard Disk Drive) and SSD (Solid State Drive) devices, and a separated filter and refinement index structure to filter unnecessary data and further refine the immediate result. We perform performance comparisons based on three standard join retrieval algorithms. The experimental results demonstrate that our method has achieved remarkable performance compared to other existing methods such as Quad and Darq.

RDF(Resource Description Framework) 데이터 구조는 그래프로 모델링하기 때문에, 관계형 데이터베이스와 XML 기술의 기존 솔루션은 RDF 모델에 바로 적용하기 어렵다. 우리는 링크 데이터를 더욱 효과적으로 저장하고, 인덱스하고, 검색하기 위해 융합 인덱싱 방법을 제안한다. 이 방법은 HDD(Hard Disk Drive) 와 SSD(Solid State Drive) 디바이스에 기반한 하이브리드 스토리지 시스템을 사용하고, 불필요한 데이터를 필터하고 중간 결과를 정제하기 위해 분리된 필터 및 정제 인덱스 구조를 사용한다. 우리는 3개의 표준 조인 검색알고리즘에 대한 성능 비교를 수행했는데, 실험 결과 제안된 방법이 Quad와 Darq와 같은 다른 기존 방법들에 비해 뛰어난 성능을 보인다.

Keywords

References

  1. M. Poblet, P. Casannovas, and V. Rodriguez-Doncel, Linked Democracy: Foundations, Tools, and Applications, Springer, 2019, pp. 1-25.
  2. H. S. Seok and Y. J. Lee, "Ontology-based IoT Context Information Modeling and Semantic-based IoT Mashup Services Implementation," J. of the Korea Institute of Electronic Communication Science, vol. 14, no. 4, 2019, pp. 71-76.
  3. M. Svoboda, "Efficient querying of distributed Linked Data," In Proc. 2011 Joint EDBT/ICDT PhD Workshop, Uppsala Sweden, 2011, pp. 45-50.
  4. A. Harth, K. Hose, M. Karnstedt, A. Polleres, K. U. Satler, and J. Umbrich, "Data summaries for on-demand queries over Linked Data," In Proc. 19th International Conference on World Wide Web (WWW), Raleigh, North Carolina, USA, Apr. 2010, pp. 411-420.
  5. G. Swathi, S. M. Hussain, P. Kanakam, and D. Suryanarayana, "SPARQL for semantic information retrieval from RDF knowledge base," Int. J. of Engineering Trends and Technology (IJETT), vol. 41, no. 7, 2016, pp. 351-354. https://doi.org/10.14445/22315381/IJETT-V41P264
  6. O. Hartig, "An overview on execution strategies for Linked Data queries," Datenbank Spektrum, vol. 13, issue 2, 2013, pp. 89-99. https://doi.org/10.1007/s13222-013-0122-1
  7. C. Weiss, P. Karras, and A. Bernstein, "Hexastore: sextuple indexing for Semantic Web data management," In Proc. Very Large Data Base (VLDB) Endowment, vol. 1, no. 1, 2008, pp. 1008-1019.
  8. T. Neumann and G. Weikum, "The RDF-3X engine for scalable management of RDF data," In Proc. Very Large Data Base (VLDB) Endowment, vol. 19, no. 1, 2010, pp. 91-113. https://doi.org/10.1007/s00778-009-0165-y
  9. Y. X. Sun and Y. J. L, "Storage and Retrieval Architecture based on Key-Value Solid State Device," J. of the Korea Institute of Electronic Communication Science, vol. 15, no. 1, 2020, pp. 24-52.
  10. B. Quilitz and U. Leser, "Querying distributed RDF data sources with SPARQL", In Proc. 5th European Semantic Web Conf. (ESWC), Tenerife, Canary Islands, Spain, June 2008, pp. 524-538.
  11. M. Priti and H. E. Margaret, "Join processing in relational databases," ACM Computing Surveys, vol. 24, no. 1, 1992, pp. 63-113. https://doi.org/10.1145/128762.128764
  12. N. Beckmann, H. P. Kriegel, R. Schneider, and B. K. Seeger, "The R*-tree: An efficient and robust access method for points and rectangles," In Proc. ACM SIGMOD International Conference on Management of Data, Atlantic City New Jersey, USA, 1990, pp. 322-331.
  13. A, Guttman, "R-trees: A dynamic index structure for spatial searching," In Proc. ACM International Conference on Management of Data, vol. 14, no. 2, 1984, pp. 47-57. https://doi.org/10.1145/971697.602266
  14. J. D. David and H. G.Robert, "Multiprocessor hash-based join algorithms," In Proc. 11th international conference on Very Large Data Bases, Stockholm, Sweden, 1985.
  15. T. David, C. H. C. Leung, W. Rahayu, and S. Goel, High-Performance Parallel Database Processing and Grid Databases, New York, Wiley, 2008.
  16. J. Gray, "What next? A dozen information technology research goals ACM turing award lecture," J. of the ACM, vol. 50, no. 1, 2003, pp. 41-57. https://doi.org/10.1145/602382.602401
  17. S. Nedev and V. Kamenov, "HDD performance research", In Proc. 8th International Scientific Conference Computer Science, Greece, Kavala, 2018, pp. 106-111.
  18. N. Agrawal, V. Prabhakaran, and T. Wobber, "Design tradeoffs for SSD performance," USENIX Annual Technical Conf., Boston, Massachusetts, USA, June 2008, pp. 57-70.
  19. Y. Guo, Z. Pan, and J. Heflin, "An Evaluation of Knowledge Base Systems for Large OWL Datasets", In Proc. 3rd International Semantic Web Conference, Hiroshima, Japan, 2004, pp. 274-288.
  20. C. R. Aberger, S. Tu, K. Olukotun, and C. Re, "Old techniques for new join algorithms: A case study in RDF processing," In Proc. IEEE 32nd International Conference on Data Engineering Workshops, Helsinki, Finland, 2016, pp. 97-102.
  21. K. Lee and L. Liu, "Scaling queries over big RDF graphs with semantic hash partitioning," In Proc. Very Large Data Base (VLDB) Endowment, vol. 6, no. 14, 2013, pp. 1894-1905.