• Title/Summary/Keyword: Join Algorithm

Search Result 138, Processing Time 0.032 seconds

Design and Performance Analysis of MapReduce-based kNN join Query Processing Algorithm (맵리듀스 기반 kNN join 질의처리 알고리즘의 설계 및 성능평가)

  • Kim, TaeHoon;Lee, HyunJo;Chang, JaeWoo
    • Annual Conference of KIPS
    • /
    • 2014.11a
    • /
    • pp.733-736
    • /
    • 2014
  • 최근 대용량 데이터에 대한 효율적인 데이터 분석 기법이 활발히 연구되고 있다. 대표적인 기법으로는 맵리듀스 환경에서 보로노이 다이어그램을 이용한 k 최근접점 조인(VkNN-join) 알고리즘이 존재한다. VkNN-join 알고리즘은 부분집합 Ri에 연관된 부분집합 Sj만을 후보탐색 영역으로 선정하여 질의를 처리하기 때문에 질의처리 시간을 감소시킨다. 그러나 VkNN-join은 색인 구축 비용이 높으며, kNN 연산 오버헤드가 큰 문제점이 존재한다. 이를 해결하기 위해, 본 논문에서는 대용량 데이터 분석을 위한 맵리듀스 기반 kNN join 질의처리 알고리즘을 제안한다. 제안하는 알고리즘은 시드 기반의 동적 분할을 통해 색인구조 구축비용을 감소시킨다. 또한 시드 간 평균 거리를 기반으로 후보 영역을 선정함으로써, 연산 오버헤드를 감소시킨다. 아울러, 성능 평가를 통해 제안하는 기법이 질의처리 시간 측면에서 기존 기법에 비해 우수함을 나타낸다.

Effective Parallel Hash Join Algorithm Based on Histoftam Equalization in the Presence of Data Skew (데이터 편재 하에서 히스토그램 변환기법에 기초한 효율적인 병렬 해쉬 결합 알고리즘)

  • Park, Ung-Gyu;Choe, Hwang-Gyu;Kim, Tak-Gon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.2
    • /
    • pp.338-348
    • /
    • 1997
  • In this pater, we first propose a data distribution framework to resolve load imbalance and bucket oerflow in parallel hash join.Using the histogram equalization technique, the framework transforms a histogram of skewed data to the desired uniform distribution that corresponds to the relative computing power of node processors in the system.Next we propose an effcient parallel hash join algorithm for handing skwed data based on the proposed data distribution methodology.For performance comparison of our algorithm with other hash join algorithms.we perform similation experiments and actual exeution on COREDB database computer with 8-node hyperube architecture. In these experiments, skwed data distebution of the join atteibute is modeled using a Zipf-like distribution.The perfomance studies undicate that our algorithm outperforms other algorithms in the skewed cases.

  • PDF

An Energy-Efficient In-Network Join Query Processing using Synopsis and Encoding in Sensor Network (센서 네트워크에서 시놉시스와 인코딩을 이용한 에너지 효율적인 인-네트워크 조인 질의 처리)

  • Yeo, Myung-Ho;Jang, Yong-Jin;Kim, Hyun-Ju;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.2
    • /
    • pp.126-134
    • /
    • 2011
  • Recently, many researchers are interested in using join queries to correlate sensor readings stored in different regions. In the conventional algorithm, the preliminary join coordinator collects the synopsis from sensor nodes and determines a set of sensor readings that are required for processing the join query. Then, the base station collects only a part of sensor readings instead of whole readings and performs the final join process. However, it has a problem that incurs communication overhead for processing the preliminary join. In this paper, we propose a novel energy-efficient in-network join scheme that solves such a problem. The proposed scheme determines a preliminary join coordinator located to minimize the communication cost for the preliminary join. The coordinator prunes data that do not contribute to the join result and performs the compression of sensor readings in the early stage of the join processing. Therefore, the base station just collects a part of compressed sensor readings with the decompression table and determines the join result from them. In the result, the proposed scheme reduces communication costs for the preliminary join processing and prolongs the network lifetime.

Continuous Query Processing in Data Streams Using Duality of Data and Queries (데이타와 질의의 이원성을 이용한 데이타스트림에서의 연속질의 처리)

  • Lim Hyo-Sang;Lee Jae-Gil;Lee Min-Jae;Whang Kyu-Young
    • Journal of KIISE:Databases
    • /
    • v.33 no.3
    • /
    • pp.310-326
    • /
    • 2006
  • In this paper, we deal with a method of efficiently processing continuous queries in a data stream environment. We classify previous query processing methods into two dual categories - data-initiative and query-initiative - depending on whether query processing is initiated by selecting a data element or a query. This classification stems from the fact that data and queries have been treated asymmetrically. For processing continuous queries, only data-initiative methods have traditionally been employed, and thus, the performance gain that could be obtained by query-initiative methods has been overlooked. To solve this problem, we focus on an observation that data and queries can be treated symmetrically. In this paper, we propose the duality model of data and queries and, based on this model, present a new viewpoint of transforming the continuous query processing problem to a multi-dimensional spatial join problem. We also present a continuous query processing algorithm based on spatial join, named Spatial Join CQ. Spatial Join CQ processes continuous queries by finding the pairs of overlapping regions from a set of data elements and a set of queries defined as regions in the multi-dimensional space. The algorithm achieves the effects of both of the two dual methods by using the spatial join, which is a symmetric operation. Experimental results show that the proposed algorithm outperforms earlier methods by up to 36 times for simple selection continuous queries and by up to 7 times for sliding window join continuous queries.

An Efficient Path Expression Join Algorithm Using XML Structure Context (XML 구조 문맥을 사용한 효율적인 경로 표현식 조인 알고리즘)

  • Kim, Hak-Soo;Shin, Young-Jae;Hwang, Jin-Ho;Lee, Seung-Mi;Son, Jin-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.14D no.6
    • /
    • pp.605-614
    • /
    • 2007
  • As a standard query language to search XML data, XQuery and XPath were proposed by W3C. By widely using XQuery and XPath languages, recent researches focus on the development of query processing algorithm and data structure for efficiently processing XML query with the enormous XML database system. Recently, when processing XML path expressions, the concept of the structural join which may determine the structural relationship between XML elements, e.g., ancestor-descendant or parent-child, has been one of the dominant XPath processing mechanisms. However, structural joins which frequently occur in XPath query processing require high cost. In this paper, we propose a new structural join algorithm, called SISJ, based on our structured index, called SI, in order to process XPath queries efficiently. Experimental results show that our algorithm performs marginally better than previous ones. However, in the case of high recursive documents, it performed more than 30% by the pruning feature of the proposed method.

Conjunctive Boolean Query Optimization based on Join Sequence Separability in Information Retrieval Systems (정보검색시스템에서 조인 시퀀스 분리성 기반 논리곱 불리언 질의 최적화)

  • 박병권;한욱신;황규영
    • Journal of KIISE:Databases
    • /
    • v.31 no.4
    • /
    • pp.395-408
    • /
    • 2004
  • A conjunctive Boolean text query refers to a query that searches for tort documents containing all of the specified keywords, and is the most frequently used query form in information retrieval systems. Typically, the query specifies a long list of keywords for better precision, and in this case, the order of keyword processing has a significant impact on the query speed. Currently known approaches to this ordering are based on heuristics and, therefore, cannot guarantee an optimal ordering. We can use a systematic approach by leveraging a database query processing algorithm like the dynamic programming, but it is not suitable for a text query with a typically long list of keywords because of the algorithm's exponential run-time (Ο(n2$^{n-1}$)) for n keywords. Considering these problems, we propose a new approach based on a property called the join sequence separability. This property states that the optimal join sequence is separable into two subsequences of different join methods under a certain condition on the joined relations, and this property enables us to find a globally optimal join sequence in Ο(n2$^{n-1}$). In this paper we describe the property formally, present an optimization algorithm based on the property, prove that the algorithm finds an optimal join sequence, and validate our approach through simulation using an analytic cost model. Comparison with the heuristic text query optimization approaches shows a maximum of 100 times faster query processing, and comparison with the dynamic programming approach shows exponentially faster query optimization (e.g., 600 times for a 10-keyword query).

An Efficient Method for Finding K Nearest Pairs in Spatial Databases (공간 데이타베이스에서 최근접 K쌍을 찾는 효율적 기법)

  • Shin, Hyo-Seop;Lee, Suk-Ho
    • Journal of KIISE:Databases
    • /
    • v.27 no.2
    • /
    • pp.238-246
    • /
    • 2000
  • The distance join has been introduced previously, which finds nearest pairs in the order of distance incrementally among two spatial data sets built with multidimensional indexes like R-trees. We propose efficient K-distance joins when the number(K) of pairs to find is preset. Especially, we develop a distance join algorithm with bi-directional expansion and optimized plane sweeping using selection method of sweep axis and direction. The experiments on real spatial data sets show that the proposed algorithm is much better than the former algorithms.

  • PDF

Closest Pairs and e-distance Join Query Processing Algorithms using a POI-based Materialization Technique in Spatial Network Databases (공간 네트워크 데이터베이스에서 POI 기반 실체화 기법을 이용한 Closest Pairs 및 e-distance 조인 질의처리 알고리즘)

  • Kim, Yong-Ki;Chang, Jae-Woo
    • Journal of Korea Spatial Information System Society
    • /
    • v.9 no.3
    • /
    • pp.67-80
    • /
    • 2007
  • Recently, many studies on query processing algorithms has been done for spatial networks, such as roads and railways, instead of Euclidean spaces, in order to efficiently support LBS(location-based service) and Telematics applications. However, both a closest pairs query and an e-distance join query require a very high cost in query processing because they can be answered by processing a set of POIs, instead of a single POI. Nevertheless, the query processing cost for closest pairs and e-distance join queries is rapidly increased as the number of k (or the length of radius) is increased. Therefore, we propose both a closest pairs query processing algorithm and an e-distance join query processing algorithm using a POI-based materialization technique so that we can process closest pairs and e-distance join queries in an efficient way. In addition, we show the retrieval efficiency of the proposed algorithms by making a performance comparison of the conventional algorithms.

  • PDF

Efficient Multicast Routing Algorithm in the Wire/wireless Integrated Environment (유무선 통합 환경에서 효과적인 멀티캐스트를 위한 라우팅 알고리즘)

  • 이재영;권영애;지홍일;최승권;조용환
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.8A
    • /
    • pp.937-941
    • /
    • 2004
  • In this paper, we propose a efficient multicast routing algorithm in the wire/wireless integrated environment. Through simulations and comparing to another multicast algorithm, we reach a conclusion is that the routing algorithm can simply and dynamically adjusts the construction of multicast tree with little delay and the most reducible bandwidth resources.

A Data Mining Approach for Selecting Bitmap Join Indices

  • Bellatreche, Ladjel;Missaoui, Rokia;Necir, Hamid;Drias, Habiba
    • Journal of Computing Science and Engineering
    • /
    • v.1 no.2
    • /
    • pp.177-194
    • /
    • 2007
  • Index selection is one of the most important decisions to take in the physical design of relational data warehouses. Indices reduce significantly the cost of processing complex OLAP queries, but require storage cost and induce maintenance overhead. Two main types of indices are available: mono-attribute indices (e.g., B-tree, bitmap, hash, etc.) and multi-attribute indices (join indices, bitmap join indices). To optimize star join queries characterized by joins between a large fact table and multiple dimension tables and selections on dimension tables, bitmap join indices are well adapted. They require less storage cost due to their binary representation. However, selecting these indices is a difficult task due to the exponential number of candidate attributes to be indexed. Most of approaches for index selection follow two main steps: (1) pruning the search space (i.e., reducing the number of candidate attributes) and (2) selecting indices using the pruned search space. In this paper, we first propose a data mining driven approach to prune the search space of bitmap join index selection problem. As opposed to an existing our technique that only uses frequency of attributes in queries as a pruning metric, our technique uses not only frequencies, but also other parameters such as the size of dimension tables involved in the indexing process, size of each dimension tuple, and page size on disk. We then define a greedy algorithm to select bitmap join indices that minimize processing cost and verify storage constraint. Finally, in order to evaluate the efficiency of our approach, we compare it with some existing techniques.