• Title/Summary/Keyword: join cost

Search Result 133, Processing Time 0.029 seconds

Efficient Record Filtering In-network Join Strategy using Bit-Vector in Sensor Networks (센서 네트워크에서 비트 벡터를 이용한 효율적인 레코드 필터링 인-네트워크 조인 전략)

  • Song, Im-Young;Kim, Kyung-Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.4
    • /
    • pp.27-36
    • /
    • 2010
  • The paper proposes RFB(Record Filtering using Bit-vector) join algorithm, an in-network strategy that uses bit-vector to drastically reduce the size of data and hence the communication cost. In addition, by eliminating data not involved in join result prior to actual join, communication cost can be minimized since not all data need to be moved to the join nodes. The simulation result shows that the proposed RFB algorithm significantly reduces the number of bytes to be moved to join nodes compared to the popular synopsis join(SNJ) algorithm.

A Hybrid In-network Join Strategy using Bloom Filter in Sensor Network (센서 네트워크에서 블룸 필터를 이용한 하이브리드 인-네트워크 조인 기법)

  • Song, Im-Young;Kim, Kyung-Chang
    • Journal of KIISE:Databases
    • /
    • v.37 no.3
    • /
    • pp.165-170
    • /
    • 2010
  • This paper proposes an in-network join strategy SBJ(Semi & Bloom Join), an efficient join strategy for sensor networks, that minimizes communication cost. SBJ is a hybrid join strategy that can reduce energy consumption by using a bloom filter to reduce the size of data that needs to be sent or received in sensor network. The key to reducing the communication cost in SBJ is to eliminate data not involved in the join result in the early stages of join processing. Through simulation, the paper shows that compared to other join strategies in sensor network, SBJ join strategy is more efficient in reducing the communication cost resulting in a significant reduction in battery consumption.

A Data Mining Approach for Selecting Bitmap Join Indices

  • Bellatreche, Ladjel;Missaoui, Rokia;Necir, Hamid;Drias, Habiba
    • Journal of Computing Science and Engineering
    • /
    • v.1 no.2
    • /
    • pp.177-194
    • /
    • 2007
  • Index selection is one of the most important decisions to take in the physical design of relational data warehouses. Indices reduce significantly the cost of processing complex OLAP queries, but require storage cost and induce maintenance overhead. Two main types of indices are available: mono-attribute indices (e.g., B-tree, bitmap, hash, etc.) and multi-attribute indices (join indices, bitmap join indices). To optimize star join queries characterized by joins between a large fact table and multiple dimension tables and selections on dimension tables, bitmap join indices are well adapted. They require less storage cost due to their binary representation. However, selecting these indices is a difficult task due to the exponential number of candidate attributes to be indexed. Most of approaches for index selection follow two main steps: (1) pruning the search space (i.e., reducing the number of candidate attributes) and (2) selecting indices using the pruned search space. In this paper, we first propose a data mining driven approach to prune the search space of bitmap join index selection problem. As opposed to an existing our technique that only uses frequency of attributes in queries as a pruning metric, our technique uses not only frequencies, but also other parameters such as the size of dimension tables involved in the indexing process, size of each dimension tuple, and page size on disk. We then define a greedy algorithm to select bitmap join indices that minimize processing cost and verify storage constraint. Finally, in order to evaluate the efficiency of our approach, we compare it with some existing techniques.

Transformation of Continuous Aggregation Join Queries over Data Streams

  • Tran, Tri Minh;Lee, Byung-Suk
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.1
    • /
    • pp.27-58
    • /
    • 2009
  • Aggregation join queries are an important class of queries over data streams. These queries involve both join and aggregation operations, with window-based joins followed by an aggregation on the join output. All existing research address join query optimization and aggregation query optimization as separate problems. We observe that, by putting them within the same scope of query optimization, more efficient query execution plans are possible through more versatile query transformations. The enabling idea is to perform aggregation before join so that the join execution time may be reduced. There has been some research done on such query transformations in relational databases, but none has been done in data streams. Doing it in data streams brings new challenges due to the incremental and continuous arrival of tuples. These challenges are addressed in this paper. Specifically, we first present a query processing model geared to facilitate query transformations and propose a query transformation rule specialized to work with streams. The rule is simple and yet covers all possible cases of transformation. Then we present a generic query processing algorithm that works with all alternative query execution plans possible with the transformation, and develop the cost formulas of the query execution plans. Based on the processing algorithm, we validate the rule theoretically by proving the equivalence of query execution plans. Finally, through extensive experiments, we validate the cost formulas and study the performances of alternative query execution plans.

A Study on Selecting Bitmap Join Index to Speed up Complex Queries in Relational Data Warehouses (관계형 데이터 웨어하우스의 복잡한 질의의 처리 효율 향상을 위한 비트맵 조인 인덱스 선택에 관한 연구)

  • An, Hyoung-Geun;Koh, Jae-Jin
    • The KIPS Transactions:PartD
    • /
    • v.19D no.1
    • /
    • pp.1-14
    • /
    • 2012
  • As the size of the data warehouse is large, the selection of indices on the data warehouse affects the efficiency of the query processing of the data warehouse. Indices induce the lower query processing cost, but they occupy the large storage areas and induce the index maintenance cost which are accompanied by database updates. The bitmap join indices are well applied when we optimize the star join queries which join a fact table and many dimension tables and the selection on dimension tables in data warehouses. Though the bitmap join indices with the binary representations induce the lower storage cost, the task to select the indexing attributes among the huge candidate attributes which are generated is difficult. The processes of index selection are to reduce the number of candidate attributes to be indexed and then select the indexing attributes. In this paper on bitmap join index selection problem we reduce the number of candidate attributes by the data mining techniques. Compared to the existing techniques which reduce the number of candidate attributes by the frequencies of attributes we consider the frequencies of attributes and the size of dimension tables and the size of the tuples of the dimension tables and the page size of disk. We use the mining of the frequent itemsets as mining techniques and reduce the great number of candidate attributes. We make the bitmap join indices which have the least costs and the least storage area adapted to storage constraints by using the cost functions applied to the bitmap join indices of the candidate attributes. We compare the existing techniques and ours and analyze them in order to evaluate the efficiencies of ours.

Uniform Load Distribution Using Sampling-Based Cost Estimation in Parallel Join (병렬 조인에서 샘플링 기반 비용 예측 기법을 이용한 균등 부하 분산)

  • Park, Ung-Gyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.6
    • /
    • pp.1468-1480
    • /
    • 1999
  • In database systems, join operations are the most complex and time consuming ones which limit performance of such system. Many parallel join algorithms have been proposed for the systems. However, they did not consider data skew, such as attribute value skew (AVS) and join product skew (JPS). In the skewness environments, performance of framework for a uniform load distribution and an efficient parallel join algorithm using the framework to handle AVS and JPS. In our algorithm, we estimate data distributions of input and output relations of join operations using the sampling methodology and evaluate join cost for the estimated data distributions. Finally, using the histogram equalization method we distribute data among nodes to achieve good load balancing among nodes in the local joining phase. For performance comparison, we present simulation model of our algorithm and other join algorithms and present the result of some simulation experiments. The results indicate that our algorithm outperforms other algorithms in the skewed case.

  • PDF

Matrix-based Filtering and Load-balancing Algorithm for Efficient Similarity Join Query Processing in Distributed Computing Environment (분산 컴퓨팅 환경에서 효율적인 유사 조인 질의 처리를 위한 행렬 기반 필터링 및 부하 분산 알고리즘)

  • Yang, Hyeon-Sik;Jang, Miyoung;Chang, Jae-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.7
    • /
    • pp.667-680
    • /
    • 2016
  • As distributed computing platforms like Hadoop MapReduce have been developed, it is necessary to perform the conventional query processing techniques, which have been executed in a single computing machine, in distributed computing environments efficiently. Especially, studies on similarity join query processing in distributed computing environments have been done where similarity join means retrieving all data pairs with high similarity between given two data sets. But the existing similarity join query processing schemes for distributed computing environments have a problem of skewed computing load balance between clusters because they consider only the data transmission cost. In this paper, we propose Matrix-based Load-balancing Algorithm for efficient similarity join query processing in distributed computing environment. In order to uniform load balancing of clusters, the proposed algorithm estimates expected computing cost by using matrix and generates partitions based on the estimated cost. In addition, it can reduce computing loads by filtering out data which are not used in query processing in clusters. Finally, it is shown from our performance evaluation that the proposed algorithm is better on query processing performance than the existing one.

Optimizing Multi-way Join Query Over Data Streams (데이타 스트림에서의 다중 조인 질의 최적화 방법)

  • Park, Hong-Kyu;Lee, Won-Suk
    • Journal of KIISE:Databases
    • /
    • v.35 no.6
    • /
    • pp.459-468
    • /
    • 2008
  • A data stream which is a massive unbounded sequence of data elements continuously generated at a rapid rate. Many recent research activities for emerging applications often need to deal with the data stream. Such applications can be web click monitoring, sensor data processing, network traffic analysis. telephone records and multi-media data. For this. data processing over a data stream are not performed on the stored data but performed the newly updated data with pre-registered queries, and then return a result immediately or periodically. Recently, many studies are focused on dealing with a data stream more than a stored data set. Especially. there are many researches to optimize continuous queries in order to perform them efficiently. This paper proposes a query optimization algorithm to manage continuous query which has multiple join operators(Multi-way join) over data streams. It is called by an Extended Greedy query optimization based on a greedy algorithm. It defines a join cost by a required operation to compute a join and an operation to process a result and then stores all information for computing join cost and join cost in the statistics catalog. To overcome a weak point of greedy algorithm which has poor performance, the algorithm selects the set of operators with a small lay, instead of operator with the smallest cost. The set is influenced the accuracy and execution time of the algorithm and can be controlled adaptively by two user-defined values. Experiment results illustrate the performance of the EGA algorithm in various stream environments.

Estimating Join Selectivity of Global XQuery Queries in Distributed Environments (분산 환경에서 전역 XQuery 질의의 조인 선택치 추정 방법)

  • Park, Jong-Hyun;Kang, Ji-Hoon
    • Journal of KIISE:Databases
    • /
    • v.34 no.6
    • /
    • pp.564-571
    • /
    • 2007
  • One of the methods for integrating XML data in distributed environments is using XML view. User can query toward distributed local XML views by using global XQuery queries in XQuery which is a standard query language for searching XML data. The global XQuery queries naturally contain join operations because of integrating and searching distributed heterogeneous data. Since join operations are generally expensive for processing a query, its processing technique is very important for efficient processing of global XQuery queries. Therefore there are some studies on the efficient processing of join operations and one of these studies is that selects minimum join cost by estimating a join selectivity. In case of SQL, there are already some researches for estimating a join selectivity and join cost of global SQL queries. However we can not apply their methods for estimating the selectivity of join operations in SQL queries into XQuery queries because of the structural difference between relational data and XML data. Therefore this paper proposes a method for estimating a selectivity of join operations in XQuery queries using the information of XML views. Our contribution is three threefold. First, we define the difference point for estimating join selectivity between SQL and XQuery. Second, we estimate join selectivity in XQuery queries by referring XML views. Third, we evaluate our estimating method.

Processing Sliding Window Multi-Joins using a Graph-Based Method over Data Streams (데이터 스트림에서 그래프 기반 기법을 이용한 슬라이딩 윈도우 다중 조인 처리)

  • Zhang, Liang;Ge, Jun-Wei;Kim, Gyoung-Bae;Lee, Soon-Jo;Bae, Hae-Young;You, Byeong-Seob
    • Journal of Korea Spatial Information System Society
    • /
    • v.9 no.2
    • /
    • pp.25-34
    • /
    • 2007
  • Existing approaches that select an order for the join of three or more data streams have always used the simple heuristics. For their disadvantage - only one factor is considered and that is join selectivity or arrival rate, these methods lead to poor performance and inefficiency In some applications. The graph-based sliding window multi -join algorithm with optimal join sequence is proposed in this paper. In this method, sliding window join graph is set up primarily, in which a vertex represents a join operator and an edge indicates the join relationship among sliding windows, also the vertex weight and the edge weight represent the cost of join and the reciprocity of join operators respectively. Then the optimal join order can be found in the graph by using improved MVP algorithm. The final result can be produced by executing the join plan with the nested loop join procedure, The advantages of our algorithm are proved by the performance comparison with existing join algorithms.

  • PDF