• Title/Summary/Keyword: Join Algorithm

Search Result 138, Processing Time 0.025 seconds

Using a Greedy Algorithm for the Improvement of a MapReduce, Theta join, M-Bucket-I Heuristic (그리디 알고리즘을 이용한 맵리듀스 세타조인 M-Bucket-I 휴리스틱의 개선)

  • Kim, Wooyeol;Shim, Kyuseok
    • Journal of KIISE
    • /
    • v.43 no.2
    • /
    • pp.229-236
    • /
    • 2016
  • Theta join is one of the essential and important types of queries in database systems. As the amount of data needs to be processed increases, processing theta joins with a single machine becomes impractical. Therefore, theta join algorithms using distributed computing frameworks have been studied widely. Although one of the state-of-the-art theta-join algorithms uses M-Bucket-I heuristic, it is hard to use since running time of M-Bucket-I heuristic, which computes a mapping from a record to a reducer (i.e., reducer mapping), is O(n) where n is the size of input data. In this paper, we propose MBI-I algorithm which reduces the running time of M-Bucket-I heuristic to $O(r_{max}log\;n)$ and gives the same result as M-Bucket-I heuristic does. We also conducted several experiments to show algorithm and confirmed that our algorithm can improve the performance of a theta join by 10%.

Efficient Top-k Join Processing over Encrypted Data in a Cloud Environment

  • Kim, Jong Wook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.10
    • /
    • pp.5153-5170
    • /
    • 2016
  • The benefit of the scalability and flexibility inherent in cloud computing motivates clients to upload data and computation to public cloud servers. Because data is placed on public clouds, which are very likely to reside outside of the trusted domain of clients, this strategy introduces concerns regarding the security of sensitive client data. Thus, to provide sufficient security for the data stored in the cloud, it is essential to encrypt sensitive data before the data are uploaded onto cloud servers. Although data encryption is considered the most effective solution for protecting sensitive data from unauthorized users, it imposes a significant amount of overhead during the query processing phase, due to the limitations of directly executing operations against encrypted data. Recently, substantial research work that addresses the execution of SQL queries against encrypted data has been conducted. However, there has been little research on top-k join query processing over encrypted data within the cloud computing environments. In this paper, we develop an efficient algorithm that processes a top-k join query against encrypted cloud data. The proposed top-k join processing algorithm is, at an early phase, able to prune unpromising data sets which are guaranteed not to produce top-k highest scores. The experiment results show that the proposed approach provides significant performance gains over the naive solution.

A Sampling-based Algorithm for Top-${\kappa}$ Similarity Joins (Top-${\kappa}$ 유사도 조인을 위한 샘플링 기반 알고리즘)

  • Park, Jong Soo
    • Journal of KIISE:Databases
    • /
    • v.41 no.4
    • /
    • pp.256-261
    • /
    • 2014
  • The problem of top-${\kappa}$ set similarity joins finds the top-${\kappa}$ pairs of records ranked by their similarities between two sets of input records. We propose an efficient algorithm to return top-${\kappa}$ similarity join pairs using a sampling technique. From a sample of the input records, we construct a histogram of set similarity joins, and then compute an estimated similarity threshold in the histogram for top-${\kappa}$ join pairs within the error bound of 95% confidence level based on statistical inference. Finally, the estimated threshold is applied to the traditional similarity join algorithm which uses the min-heap structure to get top-${\kappa}$ similarity joins. The experimental results show the good performance of the proposed algorithm on large real datasets.

Join Operation of Parallel Database System with Large Main Memory (대용량 메모리를 가진 병렬 데이터베이스 시스템의 조인 연산)

  • Park, Young-Kyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.3
    • /
    • pp.51-58
    • /
    • 2007
  • The shared-nothing multiprocessor architecture has advantages in scalability, this architecture has been adopted in many multiprocessor database system. But, if the data are not uniformly distributed across the processors, load will be unbalanced. Therefore, the whole system performance will deteriorate. This is the data skew problem, which usually occurs in processing parallel hash join. Balancing the load before performing join will resolve this problem efficiently and the whole system performance can be improved. In this paper, we will present an algorithm using merit of very large memory to reduce disk access overhead in performing load balancing and to efficiently solve the data skew problem. Also, we will present analytical model of our new algorithm and present the result of some performance study we made comparing our algorithm with the other algorithms in handling data skew.

  • PDF

An Effective Algorithm for Constructing the Dominator Tree from Irreducible Directed Graphs (감축 불가능한 유향그래프로부터 지배자 트리를 구성하기 위한 효과적인 알고리즘)

  • Lee, Dae-Sik;Sim, Son-Kweon;Ahn, Heui-Hak
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.8
    • /
    • pp.2536-2542
    • /
    • 2000
  • The dominator tree presents the dominance frontier from directed graph to the tree. we present the effective algorithm for constructing the dominator tree from arbitrarY directed graph. The reducible flow graph was reduced to dominator tree after dominator calculation. And the irreducible flow graph was constructed to dominator-join graph using join-edge information of information table. For reducing the dominator tree from dominator-join graph, we present the effective sequency reducible algorithm and delay reducible algorithm.

  • PDF

Implementation of Effective Dominator Trees Using Eager Reduction Algorithm and Delay Reduction Algorithm (순차감축 알고리즘과 지연감축 알고리즘을 이용한 효과적인 지배자 트리의 구현)

  • Lee, Dae-Sik
    • Journal of Internet Computing and Services
    • /
    • v.6 no.6
    • /
    • pp.117-125
    • /
    • 2005
  • The dominator tree presents the dominance frontier from directed graph to the tree. we present the effective algorithm for constructing the dominator tree from arbitrary directed graph. The reducible flow graph was reduced to dominator tree after dominator calculation. And the irreducible flow graph was constructed to dominator-join graph using join-edge information of information table. For reducing the dominator tree from dominator-join graph, we implement the effective sequency reducible algorithm and delay reducible algorithm. As a result of implementation, we can see that the delay reducible algorithm takes less execution time than the sequency reducible algorithm. Therefore, we can reduce the flow graph to dominator tree effectively.

  • PDF

A Join Operations Benchmark in Users' Perspective (사용자 관점에서의 조인 연산 평가 방법론)

  • Jeong Hoe Jin;Lee Sang Ho
    • The KIPS Transactions:PartD
    • /
    • v.12D no.1 s.97
    • /
    • pp.13-20
    • /
    • 2005
  • The join operation is an important, fundamental operation in database systems, and it costs much to execute. In the literature, there are a number of technical attempts on development and evaluation of efficient join operations, all of which have been carried out In developers' perspective. This paper proposes a join operations benchmark that is dedicated to the evaluation of the join operations in database systems in users' perspective. This benchmark helps users select a database system that performs the join operations well in their work environment. The benchmark consists of 42 join queries, which are derived from on six performance factors that are picked out in two join categories. We have implemented this benchmark with two commercial database systems. The experimental results are also reported.

Transformation-based Spatial Partition Join (변환기반 공간 파티션 조인)

  • 이민재;한욱신;이재길;황규영
    • Journal of KIISE:Databases
    • /
    • v.31 no.4
    • /
    • pp.352-361
    • /
    • 2004
  • Spatial joins find all pairs of spatial objects that satisfy a given spatial relationship. In this paper, we propose the transformation-based spatial partition join algorithm (TSPJ), a new spatial join algorithm that performs join in the transform space without using indexes. Since the existing algorithms deal with extents of spatial objects in the original space, they either need to replicate the spatial objects or have a relatively complex partition structure-resulting in degrading performance. In contrast, TSPJ transforms objects in the original space into points in the transform space and deals only with points having no extents. The transformation does not incur any additional overhead. Thus, our algorithm has advantages over existing ones in that it obviates the need for replicating spatial objects, and its partition structure is simple. As a result, it always has better performance compared with existing algorithms. Extensive experiments show that TSPJ improves performance by 20.5∼38.0% over the existing algorithms compared.

A Similarity Join Algorithm Using a Median as a Filter (중앙값을 필터로 이용한 유사도 조인 알고리즘)

  • Park, Jong Soo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.2
    • /
    • pp.71-76
    • /
    • 2015
  • In similarity join processing, a general technique employs a generation-verification framework, which includes two phases: the first phase generates a set of candidate pairs from a collection of records; and the second phase verifies each candidate pair by computing real similarity. In order to reduce the number of candidate pairs in the verification phase, the median of one record of each candidate pair is used as a filter in this paper to test whether the other record can has the proper number of overlapped tokens. We propose a similarity join algorithm with the median filter, and show that the proposed algorithm has better performance in execution time than recent algorithms without the filter through extensive experiments on real-world datasets.

Implementation and Evaluation of Time Interval Partitioning Algorithm in Temporal Databases (시간 데이타베이스에서 시간 간격 분할 알고리즘의 구현 및 평가)

  • Lee, Kwang-Kyu;Shin, Ye-Ho;Ryu, Keun-Ho;Kim, Hong-Gi
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.9-16
    • /
    • 2002
  • Join operation exert a great effect on the performance of system in temporal database as in the relational database. Especially, as for the temporal join, the optimization of interval partition decides the performance of query processing. In this paper, to improve the efficiency of parallel join query in temporal database. I proposed Minimum Interval Partition(MIP) scheme that time interval partitioning. The validity of this MIP algorithm that decides minimum breakpoint of the partition is proved by example scenario and I confirmed improved efficiency as compared with existing partition algorithm.