• Title/Summary/Keyword: Join stream

Search Result 39, Processing Time 0.031 seconds

DISSECTION TECHNIQUE FOR EFFICIENT JOIN OPERATION ON SEMI-STRUCTURED DOCUMENT STREAM

  • Seo, Dong-Hyeok;Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.11-13
    • /
    • 2007
  • There has been much interest in stream query processing. Various index techniques and advanced join techniques have been proposed to efficiently process data stream queries. Previous proposals support rapid and advanced response to the data stream queries. However, the amount of data stream is increasing and the data stream query processing needs more speedup than before. In this paper, we proposed novel query processing techniques for large number of incoming documents stream. We proposed Dissection Technique for efficient query processing in the data stream environment. We focused on the dissection technique in join query processing. Our technique shows efficient operation performance comparing with the other proposal in the data stream. Proposed technique is applied to the sensor network system and XML database.

  • PDF

Causality join query processing for data stream by spatio-temporal sliding window (시공간 슬라이딩윈도우기법을 이용한 데이터스트림의 인과관계 결합질의처리방법)

  • Kwon, O-Je;Li, Ki-Joune
    • Spatial Information Research
    • /
    • v.16 no.2
    • /
    • pp.219-236
    • /
    • 2008
  • Data stream collected from sensors contain a large amount of useful information including causality relationships. The causality join query for data stream is to retrieve a set of pairs (cause, effect) from streams of data. A part of causality pairs may however be lost from the query result, due to the delay from sensors to a data stream management system, and the limited size of sliding windows. In this paper, we first investigate spatial, temporal, and spatio-temporal aspects of the causality join query for data stream. Second, we propose several strategies for sliding window management based on these observations. The accuracy of the proposed strategies is studied by intensive experiments, and the result shows that we improve the accuracy of causality join query in data stream from simple FIFO strategy.

  • PDF

An Efficient M-way Stream Join Algorithm Exploiting a Bit-vector Hash Table (비트-벡터 해시 테이블을 이용한 효율적인 다중 스트림 조인 알고리즘)

  • Kwon, Tae-Hyung;Kim, Hyeon-Gyu;Lee, Yu-Won;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.4
    • /
    • pp.297-306
    • /
    • 2008
  • MJoin is proposed as an algorithm to join multiple data streams efficiently, whose characteristics are unpredictably changed. It extends a symmetric hash join to handle multiple data streams. Whenever a tuple arrives from a remote stream source, MJoin checks whether all of hash tables have matching tuples. However, when a join involves many data streams with low join selectivity, the performance of this checking process is significantly influenced by the checking order of hash tables. In this paper, we propose a BiHT-Join algorithm which extends MJoin to conduct this checking in a constant time regardless of a join order. BiHT-Join maintains a bit-vector which represents the existence of tuples in streams and decides a successful/unsuccessful join through comparing a bit-vector. Based on the bit-vector comparison, BiHT-Join can conduct a hash join only for successful joining tuples based on this decision. Our experimental results show that the proposed BiHT-Join provides better performance than MJoin in the processing of multiple streams.

Strategies and Cost Model for Spatial Data Stream Join (공간 데이터스트림을 위한 조인 전략 및 비용 모델)

  • Yoo, Ki-Hyun;Nam, Kwang-Woo
    • Journal of Korea Spatial Information System Society
    • /
    • v.10 no.4
    • /
    • pp.59-66
    • /
    • 2008
  • GeoSensor network means sensor network infra and related software of specific form monitoring a variety of circumstances over geospatial. And these GeoSensor network is implemented by mixing data stream with spatial attribute, spatial relation. But, until a recent date sensor network system has been concentrated on a store and search method of sensor data stream except for a spatial information. In this paper, we propose a definition of spatial data stream and its join strategy model at GeoSensor network, which combine data stream with spatial data. Spatial data stream s defining in this paper are dynamic spatial data stream of a moving object type and static spatial data stream of a fixed type. Dynamic spatial data stream is data stream transmitted by moving sensor as GPS, while static spatial data stream is generated by joining a data stream of general sensor and a relation with location values of these sensors. This paper propose joins of dynamic spatial data stream and static spatial data stream, and cost models estimating join cost. Finally, we show verification of proposed cost models and performance by join strategy.

  • PDF

X+ Join : The improved X join scheme for the duplicate check overhead reduction (엑스플러스 조인 : 조인 중복체크의 오버헤드를 줄이기 위한 개선된 방법)

  • Baek, Joo-Hyun;Park, Sung-Wook;Jung, Sung-Won
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10c
    • /
    • pp.28-32
    • /
    • 2006
  • 유비쿼터스(Ubiquitous)환경과 같이 외부로부터 입력되는 데이터가 stream의 형식으로 실시간으로 들어오고, 입력의 끝을 알 수 없는 환경에서는 기존의 join방식으로는 문제를 해결 할 수 없다. 또한 이러한 환경 하에서는 데이터의 크기나 특성이 모두 다르고 네트워크 상태에 따라 입력이 많은 영향을 받게 된다. 이런 stream환경의 join연산을 위하여 double pipelined hash join, Xjoin, Pjoin등 많은 알고리즘이 기존의 연구를 대표하여 왔다. 그 중 Xjoin은 symmetric hash join과 hybrid hash join의 특징들을 이용해서 들어오는 data의 흐름에 따라서 reactive하게 join과정을 조절함으로써 streaming data에 대한 join을 수행한다. 그러나 여러 단계의 수행에 따른 연산의 중복결과를 체크하기 위한 overhead로 인해 성능이 떨어진다. 이 논문에서는 이러한 점을 개선하기 위해서 Xjoin의 수행과정을 수정한 방법을 제시할 것이다. 각 partition마다 구분자만을 추가함으로써 간단하게 중복을 만들어내지 않는 방법을 제안하고 불필요한 연산과 I/O를 줄일 수 있도록 partition선택방법을 추가할 것이다. 이를 통해서 중복된 연산인지 체크하는 과정을 상당히 단순화함으로써 좀 더 좋은 성능을 가지게 될 것이고 또한 timestamp를 저장해야 하는 overhead를 줄여서 전체 연산에 필요한 저장 공간을 절약할 수 있다.

  • PDF

Continuous Spatio-Temporal Self-Join Queries over Stream Data of Moving Objects for Symbolic Space (기호공간에서 이동객체 스트림 데이터의 연속 시공간 셀프조인 질의)

  • Hwang, Byung-Ju;Li, Ki-Joune
    • Spatial Information Research
    • /
    • v.18 no.1
    • /
    • pp.77-87
    • /
    • 2010
  • Spatio-temporal join operators are essential to the management of spatio-temporal data such as moving objects. For example, the join operators are parts of processing to analyze movement of objects and search similar patterns of moving objects. Various studies on spatio-temporal join queries in outdoor space have been done. Recently with advance of indoor positioning techniques, location based services are required in indoor space as well as outdoor space. Nevertheless there is no one about processing of spatio-temporal join query in indoor space. In this paper, we introduce continuous spatio-temporal self-join queries in indoor space and propose a method of processing of the join queries over stream data of moving objects. The continuous spatio-temporal self-join query is to update the joined result set satisfying spatio-temporal predicates continuously. We assume that positions of moving objects are represented by symbols such as a room or corridor. This paper proposes a data structure, called Candidate Pairs Buffer, to filter and maintain massive stream data efficiently and we also investigate performance of proposed method in experimental study.

Optimizing Multi-way Join Query Over Data Streams (데이타 스트림에서의 다중 조인 질의 최적화 방법)

  • Park, Hong-Kyu;Lee, Won-Suk
    • Journal of KIISE:Databases
    • /
    • v.35 no.6
    • /
    • pp.459-468
    • /
    • 2008
  • A data stream which is a massive unbounded sequence of data elements continuously generated at a rapid rate. Many recent research activities for emerging applications often need to deal with the data stream. Such applications can be web click monitoring, sensor data processing, network traffic analysis. telephone records and multi-media data. For this. data processing over a data stream are not performed on the stored data but performed the newly updated data with pre-registered queries, and then return a result immediately or periodically. Recently, many studies are focused on dealing with a data stream more than a stored data set. Especially. there are many researches to optimize continuous queries in order to perform them efficiently. This paper proposes a query optimization algorithm to manage continuous query which has multiple join operators(Multi-way join) over data streams. It is called by an Extended Greedy query optimization based on a greedy algorithm. It defines a join cost by a required operation to compute a join and an operation to process a result and then stores all information for computing join cost and join cost in the statistics catalog. To overcome a weak point of greedy algorithm which has poor performance, the algorithm selects the set of operators with a small lay, instead of operator with the smallest cost. The set is influenced the accuracy and execution time of the algorithm and can be controlled adaptively by two user-defined values. Experiment results illustrate the performance of the EGA algorithm in various stream environments.

Preprocessing Method for Handling Multi-Way Join Continuous Queries over Data Streams (데이터 스트림에서 다중 조인 연속질의의 효과적인 처리를 위한 전처리 기법)

  • Seo, Ki-Yeon;Lee, Joo-Il;Lee, Won-Suk
    • Journal of Internet Computing and Services
    • /
    • v.13 no.3
    • /
    • pp.93-105
    • /
    • 2012
  • A data stream is a series of tuples which are generated in real-time, incessant, immense, and volatile manner. As new information technologies are actively emerging, stream processing methods are being needed to efficiently handle data streams. Especially, finding out an efficient evaluation for a multi-way join would make outstanding contributions toward improving the performance of a data stream management system because a join operation is one of the most resource-consuming operators for evaluating queries. In this paper, in order to evaluate efficiently a multi-way join continuous query, we propose a novel method to decrease the cost of a query by eliminating unsuccessful intermediate results. For this, we propose a matrix-based structure for monitoring data streams and estimate the number of final result tuples of the query and find out unsuccessful tuples by matrix multiplication operations. And then using these information, we process efficiently a multi-way join continuous query by filtering out the unsuccessful tuples in advance before actual evaluation of the query.

MMJoin: An Optimization Technique for Multiple Continuous MJoins over Data Streams (데이타 스트림 상에서 다중 연속 복수 조인 질의 처리 최적화 기법)

  • Byun, Chang-Woo;Lee, Hun-Zu;Park, Seog
    • Journal of KIISE:Databases
    • /
    • v.35 no.1
    • /
    • pp.1-16
    • /
    • 2008
  • Join queries having heavy cost are necessary to Data Stream Management System in Sensor Network where plural short information is generated. It is reasonable that each join operator has a sliding-window constraint for preventing DISK I/O because the data stream represents the infinite size of data. In addition, the join operator should be able to take multiple inputs for overall results. It is possible for the MJoin operator with sliding-windows to do so. In this paper, we consider the data stream environment where multiple MJoin operators are registered and propose MMJoin which deals with issues of building and processing a globally shared query considering characteristics of the MJoin operator with sliding-windows. First, we propose a solution of building the global shared query execution plan. Second, we solved the problems of updating a window size and routing for a join result. Our study can be utilized as a fundamental research for an optimization technique for multiple continuous joins in the data stream environment.

Flow Characteristics of Turbulent Flow in the Exit Region of Join Stream Curved Duct (합류 곡관덕트 출구영역에서 난류유동의 유동특성)

  • Sohn, Hyun-Chull;Park, Sang-Kyoo
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.27 no.5
    • /
    • pp.569-578
    • /
    • 2003
  • In the present steady the flow characteristics of turbulent steady flows were experimentally investigated in the exit region of join stream. The experimental was carry out to measure the velocity profiles of air in a square duct. For the measurement of velocity profiles, a hot-wire anemometer was used. The experimental results shows that the velocity profiles do not change behind the fully developed flow region , which is defined as dimensionless axial direction x/Dh=50. In addition, the gradient of shear stress distribution became stable as the flow reached progress downstream.