• Title/Summary/Keyword: Join Processing

Search Result 229, Processing Time 0.03 seconds

Performance Comparison of Join Operations Parallelization by using GPGPU (GPGPU 기반 조인 연산 병렬화 성능 비교)

  • Lee, Jong-Sub;Lee, Sang-Back;Lee, Kyu-Chul
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.28-44
    • /
    • 2018
  • In a database system, the most expensive operation among relational operations is a join operation. Generally, CPU-based join operations uses parallel processing with either 1 core or 16 cores at most, which does not significantly improve the function. On the other hand, GPGPU(General-Purpose computing on Graphics Processing Units) allows parallel processing through thousands of processing units, greatly reducing the time required to perform join operations. Parallelization of the operation using GPGPU uses NVIDIA's CUDA SDK. In this paper, we implement parallelization of the join operation using GPGPU and compare the performances. The used join operations are Nested Loop Join (NLJ), Sort Merge Join (SMJ) and Hash Join (HJ), and GPGPU equipment uses TITAN Xp, GTX 1080 Ti and GTX 1080. We measure and compare the performance of join operations based on CPU and GPGPU. We compare this performance with the performance of the previous study on the join operation based on GPGPU. The results of experiment show that the performance based on GPGPU is 6~328 times faster than the one based on CPU.

An Energy-Efficient In-Network Join Query Processing using Synopsis and Encoding in Sensor Network (센서 네트워크에서 시놉시스와 인코딩을 이용한 에너지 효율적인 인-네트워크 조인 질의 처리)

  • Yeo, Myung-Ho;Jang, Yong-Jin;Kim, Hyun-Ju;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.2
    • /
    • pp.126-134
    • /
    • 2011
  • Recently, many researchers are interested in using join queries to correlate sensor readings stored in different regions. In the conventional algorithm, the preliminary join coordinator collects the synopsis from sensor nodes and determines a set of sensor readings that are required for processing the join query. Then, the base station collects only a part of sensor readings instead of whole readings and performs the final join process. However, it has a problem that incurs communication overhead for processing the preliminary join. In this paper, we propose a novel energy-efficient in-network join scheme that solves such a problem. The proposed scheme determines a preliminary join coordinator located to minimize the communication cost for the preliminary join. The coordinator prunes data that do not contribute to the join result and performs the compression of sensor readings in the early stage of the join processing. Therefore, the base station just collects a part of compressed sensor readings with the decompression table and determines the join result from them. In the result, the proposed scheme reduces communication costs for the preliminary join processing and prolongs the network lifetime.

Structural Semi-Join Operators for Efficient Path Processing in XML Databases (XML 데이터베이스에서 효율적인 경로처리를 위한 구조적 세미조인 기법)

  • Son, Seok-Hyun;Shin, Hyo-Seop
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.2
    • /
    • pp.252-256
    • /
    • 2010
  • The structural join is one of core operators for efficient processing of XML queries. It can be mainly used for path-represented XML queries as it efficiently retrieves the node pairs that form a hierarchical relationship (i.e., ancestor-descendant, Parent-child relationship) among large-scale XML nodes. However, the structural join algorithms still suffer potential overhead in the middle of processing of XML path queries. In addressing this problem, the structural semi-join is proposed as a novel operator that retrieves only the ancestor or descendant nodes as join results for efficient processing. In this paper, we describe the algorithms for the structural semi-join and present the methods of XML path processing based on the structural semi-join algorithms. The experimental results show that the structural semi-join algorithms are very efficient in processing XML path processing.

Segment Join Technique for Processing in Queries Fast (빠른 XML질의 처리를 위한 세그먼트 조인 기법)

  • ;Moon Bongki;Lee Sukho
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.334-343
    • /
    • 2005
  • Complex queries such as path alld twig patterns have been the focus of much research on processing XML data. Structural join algorithms use a form of encoded structural information for elements in an XML document to facilitate join processing. Recently, structural join algorithms such as Twigstack and TSGeneric- have been developed to process such complex queries, and they have been shown that the processing costs of the algorithms are linearly proportional to the sum of input data. However, the algorithms have a shortcoming that their processing costs increase with the length of a queery. To overcome the shortcoming, we propose the segment join technique to augment the structural join with structural indexes such as the 1-Index. The SegmentTwig algorithm based on the segment join technique performs joins between a pair of segments, which is a series of query nodes, rather than joins between a pair of query nodes. Consequently, the query can be processed by reading only a query node per segment. Our experimental study shorts that segment join algorithms outperform the structural join methods consistently and considerably for various data sets.

A MapReduce-based kNN Join Query Processing Algorithm for Analyzing Large-scale Data (대용량 데이터 분석을 위한 맵리듀스 기반 kNN join 질의처리 알고리즘)

  • Lee, HyunJo;Kim, TaeHoon;Chang, JaeWoo
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.504-511
    • /
    • 2015
  • Recently, the amount of data is rapidly increasing with the popularity of the SNS and the development of mobile technology. So, it has been actively studied for the effective data analysis schemes of the large amounts of data. One of the typical schemes is a Voronoi diagram based on kNN join algorithm (VkNN-join) using MapReduce. For two datasets R and S, VkNN-join can reduce the time of the join query processing involving big data because it selects the corresponding subset Sj for each Ri and processes the query with them. However, VkNN-join requires a high computational cost for constructing the Voronoi diagram. Moreover, the computational overhead of the VkNN-join is high because the number of the candidate cells increases as the value of the k increases. In order to solve these problems, we propose a MapReduce-based kNN-join query processing algorithm for analyzing the large amounts of data. Using the seed-based dynamic partitioning, our algorithm can reduce the overhead for constructing the index structure. Also, it can reduce the computational overhead to find the candidate partitions by selecting corresponding partitions with the average distance between two seeds. We show that our algorithm has better performance than the existing scheme in terms of the query processing time.

Design and Performance Analysis of MapReduce-based kNN join Query Processing Algorithm (맵리듀스 기반 kNN join 질의처리 알고리즘의 설계 및 성능평가)

  • Kim, TaeHoon;Lee, HyunJo;Chang, JaeWoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.733-736
    • /
    • 2014
  • 최근 대용량 데이터에 대한 효율적인 데이터 분석 기법이 활발히 연구되고 있다. 대표적인 기법으로는 맵리듀스 환경에서 보로노이 다이어그램을 이용한 k 최근접점 조인(VkNN-join) 알고리즘이 존재한다. VkNN-join 알고리즘은 부분집합 Ri에 연관된 부분집합 Sj만을 후보탐색 영역으로 선정하여 질의를 처리하기 때문에 질의처리 시간을 감소시킨다. 그러나 VkNN-join은 색인 구축 비용이 높으며, kNN 연산 오버헤드가 큰 문제점이 존재한다. 이를 해결하기 위해, 본 논문에서는 대용량 데이터 분석을 위한 맵리듀스 기반 kNN join 질의처리 알고리즘을 제안한다. 제안하는 알고리즘은 시드 기반의 동적 분할을 통해 색인구조 구축비용을 감소시킨다. 또한 시드 간 평균 거리를 기반으로 후보 영역을 선정함으로써, 연산 오버헤드를 감소시킨다. 아울러, 성능 평가를 통해 제안하는 기법이 질의처리 시간 측면에서 기존 기법에 비해 우수함을 나타낸다.

A Study on Efficient Range Join For In-Memory Relational Database Management System (인메모리 관계형 데이터베이스에서 효율적인 범위 조인 방법에 대한 연구)

  • Han, Hyeok;Kang, Jo-Hyeon;Jin, Sung-Il
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.845-847
    • /
    • 2017
  • Rang Join은 관계형 데이터베이스 시스템에서 제공하는 Join 연산 중에서도 특수한 형태로 비교적 연구 사례가 적고, 동등연산자("=")를 사용하는 Equi Join보다 시간 소모가 많은 Join 중 하나이다. 특히, 대부분의 연구가 range Join 의 성능을 보장하기 위하여 별도의 신규 색인을 생성하여 처리하는 방법을 제안하고 있다. 본 논문에서는 관계형 데이터베이스 시스템에서 제공하는 기본 자료형 컬럼으로 구성된 Range Join Predicate과 인메모리 관계형 DBMS의 기본 제공 색인인 T-Tree를 활용하여 성능 효율적인 Range Join방법을 제안한다.

Efficient Top-k Join Processing over Encrypted Data in a Cloud Environment

  • Kim, Jong Wook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.10
    • /
    • pp.5153-5170
    • /
    • 2016
  • The benefit of the scalability and flexibility inherent in cloud computing motivates clients to upload data and computation to public cloud servers. Because data is placed on public clouds, which are very likely to reside outside of the trusted domain of clients, this strategy introduces concerns regarding the security of sensitive client data. Thus, to provide sufficient security for the data stored in the cloud, it is essential to encrypt sensitive data before the data are uploaded onto cloud servers. Although data encryption is considered the most effective solution for protecting sensitive data from unauthorized users, it imposes a significant amount of overhead during the query processing phase, due to the limitations of directly executing operations against encrypted data. Recently, substantial research work that addresses the execution of SQL queries against encrypted data has been conducted. However, there has been little research on top-k join query processing over encrypted data within the cloud computing environments. In this paper, we develop an efficient algorithm that processes a top-k join query against encrypted cloud data. The proposed top-k join processing algorithm is, at an early phase, able to prune unpromising data sets which are guaranteed not to produce top-k highest scores. The experiment results show that the proposed approach provides significant performance gains over the naive solution.

Transformation of Continuous Aggregation Join Queries over Data Streams

  • Tran, Tri Minh;Lee, Byung-Suk
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.1
    • /
    • pp.27-58
    • /
    • 2009
  • Aggregation join queries are an important class of queries over data streams. These queries involve both join and aggregation operations, with window-based joins followed by an aggregation on the join output. All existing research address join query optimization and aggregation query optimization as separate problems. We observe that, by putting them within the same scope of query optimization, more efficient query execution plans are possible through more versatile query transformations. The enabling idea is to perform aggregation before join so that the join execution time may be reduced. There has been some research done on such query transformations in relational databases, but none has been done in data streams. Doing it in data streams brings new challenges due to the incremental and continuous arrival of tuples. These challenges are addressed in this paper. Specifically, we first present a query processing model geared to facilitate query transformations and propose a query transformation rule specialized to work with streams. The rule is simple and yet covers all possible cases of transformation. Then we present a generic query processing algorithm that works with all alternative query execution plans possible with the transformation, and develop the cost formulas of the query execution plans. Based on the processing algorithm, we validate the rule theoretically by proving the equivalence of query execution plans. Finally, through extensive experiments, we validate the cost formulas and study the performances of alternative query execution plans.

DISSECTION TECHNIQUE FOR EFFICIENT JOIN OPERATION ON SEMI-STRUCTURED DOCUMENT STREAM

  • Seo, Dong-Hyeok;Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.11-13
    • /
    • 2007
  • There has been much interest in stream query processing. Various index techniques and advanced join techniques have been proposed to efficiently process data stream queries. Previous proposals support rapid and advanced response to the data stream queries. However, the amount of data stream is increasing and the data stream query processing needs more speedup than before. In this paper, we proposed novel query processing techniques for large number of incoming documents stream. We proposed Dissection Technique for efficient query processing in the data stream environment. We focused on the dissection technique in join query processing. Our technique shows efficient operation performance comparing with the other proposal in the data stream. Proposed technique is applied to the sensor network system and XML database.

  • PDF