• Title/Summary/Keyword: Join Processing

Search Result 229, Processing Time 0.026 seconds

A Join Query with Aggregation functions Using Mapreduce (집계 함수를 포함하는 조인 질의의 맵리듀스를 사용한 효율적인 처리 기법)

  • Oh, So Hyeon;Lee, Ki Yong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.04a
    • /
    • pp.132-135
    • /
    • 2015
  • 맵리듀스(MapReduce)는 분산 환경에서의 빅데이터(Big Data), 즉 대용량 데이터를 처리하는 프로그래밍 모델이다. 대용량의 데이터를 분석하기 위해서 집계 함수(Aggregation function)로 데이터를 처리할 수 있다. 본 논문에서는 맵리듀스 환경을 기반으로 SQL 쿼리에서 집계 함수를 더 적은 비용으로 수행하며 효율적으로 처리할 수 있는 두 가지 전략을 제안한다. 두 가지 전략 중 더 높은 성능을 보이는 전략을 더 효율적인 처리 방법으로 판단한다. 첫 번째 전략은 두 테이블을 Join하여 집계 함수를 처리하는 방법이다. 두 번째 전략은 집계 함수를 처리하여 Join에 참여할 튜플의 수를 최소로 줄인 후 Join을 수행하고 다시 집계 함수를 처리하는 방법이다. 두 제안 방법을 비교하기 위하여 실험을 한 결과 두 번째 전략이 더 적은 비용이 드므로 더 효율적인 처리 방법인 것으로 보인다.

An Efficient Block Index Scheme with Segmentation for Spatio-Textual Similarity Join

  • Xiang, Yiming;Zhuang, Yi;Jiang, Nan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3578-3593
    • /
    • 2017
  • Given two collections of objects that carry both spatial and textual information in the form of tags, a $\text\underline{S}patio$-$\text\underline{T}extual$-based object $\text\underline{S}imilarity$ $\text\underline{JOIN}$ (ST-SJOIN) retrieves the pairs of objects that are textually similar and spatially close. In this paper, we have proposed a block index-based approach called BIST-JOIN to facilitate the efficient ST-SJOIN processing. In this approach, a dual-feature distance plane (DFDP) is first partitioned into some blocks based on four segmentation schemes, and the ST-SJOIN is then transformed into searching the object pairs falling in some affected blocks in the DFDP. Extensive experiments on real and synthetic datasets demonstrate that our proposed join method outperforms the state-of-the-art solutions.

Processing Sliding Window Multi-Joins using a Graph-Based Method over Data Streams (데이터 스트림에서 그래프 기반 기법을 이용한 슬라이딩 윈도우 다중 조인 처리)

  • Zhang, Liang;Ge, Jun-Wei;Kim, Gyoung-Bae;Lee, Soon-Jo;Bae, Hae-Young;You, Byeong-Seob
    • Journal of Korea Spatial Information System Society
    • /
    • v.9 no.2
    • /
    • pp.25-34
    • /
    • 2007
  • Existing approaches that select an order for the join of three or more data streams have always used the simple heuristics. For their disadvantage - only one factor is considered and that is join selectivity or arrival rate, these methods lead to poor performance and inefficiency In some applications. The graph-based sliding window multi -join algorithm with optimal join sequence is proposed in this paper. In this method, sliding window join graph is set up primarily, in which a vertex represents a join operator and an edge indicates the join relationship among sliding windows, also the vertex weight and the edge weight represent the cost of join and the reciprocity of join operators respectively. Then the optimal join order can be found in the graph by using improved MVP algorithm. The final result can be produced by executing the join plan with the nested loop join procedure, The advantages of our algorithm are proved by the performance comparison with existing join algorithms.

  • PDF

A Hybrid In-network Join Strategy using Bloom Filter in Sensor Network (센서 네트워크에서 블룸 필터를 이용한 하이브리드 인-네트워크 조인 기법)

  • Song, Im-Young;Kim, Kyung-Chang
    • Journal of KIISE:Databases
    • /
    • v.37 no.3
    • /
    • pp.165-170
    • /
    • 2010
  • This paper proposes an in-network join strategy SBJ(Semi & Bloom Join), an efficient join strategy for sensor networks, that minimizes communication cost. SBJ is a hybrid join strategy that can reduce energy consumption by using a bloom filter to reduce the size of data that needs to be sent or received in sensor network. The key to reducing the communication cost in SBJ is to eliminate data not involved in the join result in the early stages of join processing. Through simulation, the paper shows that compared to other join strategies in sensor network, SBJ join strategy is more efficient in reducing the communication cost resulting in a significant reduction in battery consumption.

Performance Evaluation of Hash Join Algorithms Supporting Dynamic Load Balancing for a Database Sharing System (데이타베이스 공유 시스템에서 동적 부하분산을 지원하는 해쉬 조인 알고리즘들의 성능 평가)

  • Moon, Ae-Kyung;Cho, Haeng-Rae
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.12
    • /
    • pp.3456-3468
    • /
    • 1999
  • Most of previous parallel join algorithms assume a database partition system(DPS), where each database partition is owned by a single processing node. While the DPS is novel in the sense that it can interconnect a large number of nodes and support a geographically distributed environment, it may suffer from poor facility for load balancing and system availability compared to the database sharing system(DSS). In this paper, we propose a dynamic load balancing strategy by exploiting the characteristics of the DSS, and then extend the conventional hash join algorithms to the DSS by using the dynamic load balancing strategy. With simulation studies under a wide variety of system configurations and database workloads, we analyze the effects of the dynamic load balancing strategy and differences in the performances of hash join algorithms in the DSS.

  • PDF

A Study on Selecting Bitmap Join Index to Speed up Complex Queries in Relational Data Warehouses (관계형 데이터 웨어하우스의 복잡한 질의의 처리 효율 향상을 위한 비트맵 조인 인덱스 선택에 관한 연구)

  • An, Hyoung-Geun;Koh, Jae-Jin
    • The KIPS Transactions:PartD
    • /
    • v.19D no.1
    • /
    • pp.1-14
    • /
    • 2012
  • As the size of the data warehouse is large, the selection of indices on the data warehouse affects the efficiency of the query processing of the data warehouse. Indices induce the lower query processing cost, but they occupy the large storage areas and induce the index maintenance cost which are accompanied by database updates. The bitmap join indices are well applied when we optimize the star join queries which join a fact table and many dimension tables and the selection on dimension tables in data warehouses. Though the bitmap join indices with the binary representations induce the lower storage cost, the task to select the indexing attributes among the huge candidate attributes which are generated is difficult. The processes of index selection are to reduce the number of candidate attributes to be indexed and then select the indexing attributes. In this paper on bitmap join index selection problem we reduce the number of candidate attributes by the data mining techniques. Compared to the existing techniques which reduce the number of candidate attributes by the frequencies of attributes we consider the frequencies of attributes and the size of dimension tables and the size of the tuples of the dimension tables and the page size of disk. We use the mining of the frequent itemsets as mining techniques and reduce the great number of candidate attributes. We make the bitmap join indices which have the least costs and the least storage area adapted to storage constraints by using the cost functions applied to the bitmap join indices of the candidate attributes. We compare the existing techniques and ours and analyze them in order to evaluate the efficiencies of ours.

An Efficient Path Expression Join Algorithm Using XML Structure Context (XML 구조 문맥을 사용한 효율적인 경로 표현식 조인 알고리즘)

  • Kim, Hak-Soo;Shin, Young-Jae;Hwang, Jin-Ho;Lee, Seung-Mi;Son, Jin-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.14D no.6
    • /
    • pp.605-614
    • /
    • 2007
  • As a standard query language to search XML data, XQuery and XPath were proposed by W3C. By widely using XQuery and XPath languages, recent researches focus on the development of query processing algorithm and data structure for efficiently processing XML query with the enormous XML database system. Recently, when processing XML path expressions, the concept of the structural join which may determine the structural relationship between XML elements, e.g., ancestor-descendant or parent-child, has been one of the dominant XPath processing mechanisms. However, structural joins which frequently occur in XPath query processing require high cost. In this paper, we propose a new structural join algorithm, called SISJ, based on our structured index, called SI, in order to process XPath queries efficiently. Experimental results show that our algorithm performs marginally better than previous ones. However, in the case of high recursive documents, it performed more than 30% by the pruning feature of the proposed method.

Improving Join Performance for SPARQL Query Processing in the Clouds (클라우드에서 SPARQL 질의 처리를 위한 조인 성능 향상)

  • Choi, Gyu-Jin;Son, Yun-Hee;Lee, Kyu-Chul
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.700-709
    • /
    • 2016
  • Recently, with the rapid growth of LOD (Linked Open Data) existing methods based on a single machine have limitation in performance. Existing solutions use distributed framework such as Mapreduce in order to improve the performance. However, the MapReduce framework for processing SPARQL queries involves multiple MapReduce jobs and additional costs incurred. In addition, the problem of unnecessary data processing arises. In this study, we proposed a method to reduce the number of MapReduce jobs during SPARQL query processing and join indexes based on Bitmap for minimizing the costs of processing unnecessary data.

Using Indirect Predicates in Multi-way Spatial Joins (다중 공간 조인에서 간접 술어의 활용)

  • 박호현;정진완
    • Journal of KIISE:Databases
    • /
    • v.30 no.6
    • /
    • pp.593-605
    • /
    • 2003
  • Since spatial join processing consumes much time, several algorithms have been proposed to improve spatial join performance. The M-way R-tree join (MRJ) is a join algorithm which synchronously traverses M R-trees in the M-way spatial join. In this paper, we introduce indirect predicates which do not directly come from the multi-way join conditions but are indirectly derived from them. By applying the concept of indirect predicates to MRJ, we improve the performance of MRJ. We call such a multi-way R-tree join algorithm using indirect predicates indirect predicate filtering (IPF). Through experiments using synthetic data and real data, we show that IPF significantly

An Advanced Parallel Join Algorithm for Managing Data Skew on Hypercube Systems (하이퍼큐브 시스템에서 데이타 비대칭성을 고려한 향상된 병렬 결합 알고리즘)

  • 원영선;홍만표
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.3_4
    • /
    • pp.117-129
    • /
    • 2003
  • In this paper, we propose advanced parallel join algorithm to efficiently process join operation on hypercube systems. This algorithm uses a broadcasting method in processing relation R which is compatible with hypercube structure. Hence, we can present optimized parallel join algorithm for that hypercube structure. The proposed algorithm has a complete solution of two essential problems - load balancing problem and data skew problem - in parallelization of join operation. In order to solve these problems, we made good use of the characteristics of clustering effect in the algorithm. As a result of this, performance is improved on the whole system than existing algorithms. Moreover. new algorithm has an advantage that can implement non-equijoin operation easily which is difficult to be implemented in hash based algorithm. Finally, according to the cost model analysis. this algorithm showed better performance than existing parallel join algorithms.