• Title/Summary/Keyword: Parallel Query Evaluation

Search Result 8, Processing Time 0.018 seconds

Development of a CUBRID-Based Distributed Parallel Query Processing System

  • Kim, Hyeong-Il;Yang, HyeonSik;Yoon, Min;Chang, Jae-Woo
    • Journal of Information Processing Systems
    • /
    • v.13 no.3
    • /
    • pp.518-532
    • /
    • 2017
  • Due to the rapid growth of the amount of data, research on bigdata processing has been highlighted. For bigdata processing, CUBRID Shard is able to support query processing in parallel way by dividing the database into a number of CUBRID servers. However, CUBRID Shard can answer a user's query only when the query is required to gain accesses to a single CUBRID server, instead of multiple ones. To solve the problem, in this paper we propose a CUBRID based distributed parallel query processing system that can answer a user's query in parallel and distributed manner. Finally, through the performance evaluation, we show that our proposed system provides 2-3 times better performance on query processing time than the existing CUBRID Shard.

A Parallel Match Method for Path-oriented Query Processing in iW- Databases (XML 데이타베이스에서 경로-지향 질의처리를 위한 병렬 매치 방법)

  • Park Hee-Sook;Cho Woo-Hyun
    • Journal of KIISE:Databases
    • /
    • v.32 no.5
    • /
    • pp.558-566
    • /
    • 2005
  • The XML is the new standard fir data representation and exchange on the Internet. In this paper, we describe a new approach for evaluating a path-oriented query against XML document. In our approach, we propose the Parallel Match Indexing Fabric to speed up evaluation of path-oriented query using path signature and design the parallel match algorithm to perform a match process between a path signature of input query and path signatures of elements stored in the database. To construct a structure of the parallel match indexing, we first make the binary tie for all path signatures on an XML document and then which trie is transformed to the Parallel Match Indexing Fabric. Also we use the Parallel Match Indexing Fabric and a parallel match algorithm for executing a search operation of a path-oriented query. In our proposed approach, Time complexity of the algorithm is proportional to the logarithm of the number of path signatures in the XML document.

Implementation and Evaluation of Time Interval Partitioning Algorithm in Temporal Databases (시간 데이타베이스에서 시간 간격 분할 알고리즘의 구현 및 평가)

  • Lee, Kwang-Kyu;Shin, Ye-Ho;Ryu, Keun-Ho;Kim, Hong-Gi
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.9-16
    • /
    • 2002
  • Join operation exert a great effect on the performance of system in temporal database as in the relational database. Especially, as for the temporal join, the optimization of interval partition decides the performance of query processing. In this paper, to improve the efficiency of parallel join query in temporal database. I proposed Minimum Interval Partition(MIP) scheme that time interval partitioning. The validity of this MIP algorithm that decides minimum breakpoint of the partition is proved by example scenario and I confirmed improved efficiency as compared with existing partition algorithm.

An R-tree Index Scheduling Method for kNN Query Processing in Multiple Wireless Broadcast Channels (다중 무선 방송채널에서 kNN 질의 처리를 위한 R-tree 인덱스 스케줄링 기법)

  • Jung, Eui-Jun;Jung, Sung-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.2
    • /
    • pp.121-126
    • /
    • 2010
  • This paper proposes an efficient index scheduling technique for kNN query processing in multiple wireless broadcast channel environment. Previous works have to wait for the next cycle if the required child nodes of the same parent node are allocated in the same time slot on multiple channel. Our proposed method computes the access frequencies of each node of R tree at the server before the generation of the R-tree index broadcast schedule. If they have high frequencies, we allocate them serially on the single channel. If they have low frequencies, we allocate them in parallel on the multiple channels. As a result, we can reduce the index node access conflicts and the long broadcast cycle. The performance evaluation shows that our scheme gives the better performance than the existing schemes.

Declustering Method for Moving Object Database (이동체 데이터베이스를 위한 디클러스터링 정책)

  • Seo YoungDuk;Hong EnSuk;Hong BongHee
    • The KIPS Transactions:PartD
    • /
    • v.11D no.7 s.96
    • /
    • pp.1399-1408
    • /
    • 2004
  • Because there are so many spatio-temporal data in Moving Object Databases, a single disk system can not gain the fast response time and tota throughput. So it is needed to take a parallel processing system for the high effectiveness query process. In these existing parallel process-ing system. it does not consider characters of moving object data. Moving object data have to be thought about continuous report to the Moving Object Databases. So it is necessary think about the new Declustering System for the high performance system. In this paper, we propose the new Dechustering Policies of Moving objet data for high effectiveness query processing. At first, consider a spatial part of MBB(Minimum Bounding Box) then take a SD(SemiAllocation Disk) value. Second time, consider a SD value and time value which is node made at together as SDT-Proximity. And for more accuracy Declustering effect, consider a Load Balancing. Evaluation shows performance improvement of aver-age %15\%$ compare with Round-Robin method about $5\%\;and\;10\%$ query area. And performance improvement of average $6\%$ compare with Spatial Proximity method.

A Load Balancing Method using Partition Tuning for Pipelined Multi-way Hash Join (다중 해시 조인의 파이프라인 처리에서 분할 조율을 통한 부하 균형 유지 방법)

  • Mun, Jin-Gyu;Jin, Seong-Il;Jo, Seong-Hyeon
    • Journal of KIISE:Databases
    • /
    • v.29 no.3
    • /
    • pp.180-192
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new harsh join methods in the shared-nothing multiprocessor environment. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets dynamically via a frequency distribution. Using harsh-based joins, multiple joins can be pipelined to that the early results from a join, before the whole join is completed, are sent to the next join processing without staying in disks. Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. However, this hardware structure is very sensitive to the data skew. Unless the pipelining execution of multiple hash joins includes some dynamic load balancing mechanism, the skew effect can severely deteriorate the system performance. In this parer, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

Efficient Multiple Joins using the Synchronization of Page Execution Time in Limited Processors Environments (한정된 프로세서 환경에서 체이지 실행시간 동기화를 이용한 효율적인 다중 결합)

  • Lee, Kyu-Ock;Weon, Young-Sun;Hong, Man-Pyo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.732-741
    • /
    • 2001
  • In the relational database systems the join operation is one of the most time-consuming query operations. Many parallel join algorithms have been developed 개 reduce the execution time Multiple hash join algorithm using allocation tree is one of the most efficient ones. However, it may have some delay on the processing each node of allocation tree, which is occurred in tuple-probing phase by the difference between one page reading time of outer relation and the processing time of already read one. This delay problem was solved by using the concept of synchronization of page execution time with we had proposed In this paper the effects of the performance improvements in each node of the allocation tree are extended to the whole allocation tree and the performance evaluation about that is processed. In addition we propose an efficient algorithm for multiple hash joins in limited number of processor environments according to the relationship between the number of input relations in the allocation tree and the number of processors allocated to the tree. Finally. we analyze the performance by building the analytical cost model and verify the validity of it by various performance comparison with previous method.

  • PDF

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.