• Title/Summary/Keyword: Column-Oriented Databases

Search Result 7, Processing Time 0.02 seconds

Performance Comparison of Column-Oriented and Row-Oriented Database Systems for Star Schema Join Processing (스타 스키마 조인 처리에 대한 세로-지향 데이터베이스 시스템과 가로-지향 데이터베이스 시스템의 성능 비교)

  • Oh, Byung-Jung;Ahn, Soo-Min;Kim, Kyung-Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.8
    • /
    • pp.29-38
    • /
    • 2011
  • Unlike in traditional row-oriented database systems, a column-oriented database system stores data in column-oriented and not row-oriented order. Recently, research results revealed the effectiveness of column-oriented databases for applications such as data warehouse and decision support systems that access large volumes of data in a read only manner. In this paper, we investigate the join strategies for column-oriented databases and prove the effectiveness of column-oriented databases in data warehouse systems. For unbiased comparison, the two database systems are analyzed using the star schema benchmark and the performance analysis of a star schema join query is carried out. We experimented with well-known join algorithms and considered early materialization and late materialization join strategies for column-oriented databases. The performance results confirm that star schema join queries perform better in terms of disk I/O cost in column-oriented databases than in row-oriented databases. In addition, the late materialization strategy showed more performance gain than the early materialization strategy in column-oriented databases.

A Column-Aware Index Management Using Flash Memory for Read-Intensive Databases

  • Byun, Si-Woo;Jang, Seok-Woo
    • Journal of Information Processing Systems
    • /
    • v.11 no.3
    • /
    • pp.389-405
    • /
    • 2015
  • Most traditional database systems exploit a record-oriented model where the attributes of a record are placed contiguously in a hard disk to achieve high performance writes. However, for read-mostly data warehouse systems, the column-oriented database has become a proper model because of its superior read performance. Today, flash memory is largely recognized as the preferred storage media for high-speed database systems. In this paper, we introduce a column-oriented database model based on flash memory and then propose a new column-aware flash indexing scheme for the high-speed column-oriented data warehouse systems. Our index management scheme, which uses an enhanced $B^+$-Tree, achieves superior search performance by indexing an embedded segment and packing an unused space in internal and leaf nodes. Based on the performance results of two test databases, we concluded that the column-aware flash index management outperforms the traditional scheme in the respect of the mixed operation throughput and its response time.

Comparision of Join Query Processing Cost in Row-Oriented and Column-Oriented Databases (Row-지향과 Column-지향 데이터베이스의 조인 질의 처리 비용 비교)

  • Oh, Byung-Jung;Ahn, Soo-Min;Kim, Kyung-Chang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.1214-1217
    • /
    • 2011
  • 데이터 레코드를 가로(row-wise)로 저장하는 기존의 데이터베이스를 Row-지향 데이터베이스, 세로(column-wise)로 저장하는 데이터베이스를 Column-지향 데이터베이스라 정의하자. 본 논문에서는 Row-지향 데이터베이스와 Column-지향 데이터베이스에서 분석 workload 형태의 조인 질의를 처리하여 비교 우위 성능을 보이는 데이터베이스 시스템을 고찰하고자 한다. 객관적인 성능 실험을 위해 분석적 모델인 스타 스키마 벤치마크를 이용하였다. Nested Loop 조인과 Sort Merge 조인 기법을 사용한 실험에서 Column-지향 데이터베이스의 성능이 우수하게 나타났음을 확인할 수 있다.

Search Performance Improvement of Column-oriented Flash Storages using Segmented Compression Index (분할된 압축 인덱스를 이용한 컬럼-지향 플래시 스토리지의 검색 성능 개선)

  • Byun, Siwoo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.1
    • /
    • pp.393-401
    • /
    • 2013
  • Most traditional databases exploit record-oriented storage model where the attributes of a record are placed contiguously in hard disk to achieve high performance writes. However, for search-mostly datawarehouse systems, column-oriented storage has become a proper model because of its superior read performance. Today, flash memory is largely recognized as the preferred storage media for high-speed database systems. In this paper, we introduce fast column-oriented database model and then propose a new column-aware index management scheme for the high-speed column-oriented datawarehouse system. Our index management scheme which is based on enhanced $B^+$-Tree achieves high search performance by embedded flash index and unused space compression in internal and leaf nodes. Based on the results of the performance evaluation, we conclude that our index management scheme outperforms the traditional scheme in the respect of the search throughput and response time.

Shadow Recovery for Column-based Databases (컬럼-기반 데이터베이스를 위한 그림자 복구)

  • Byun, Si-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.4
    • /
    • pp.2784-2790
    • /
    • 2015
  • The column-oriented database storage is a very advanced model for large-volume data transactions because of its superior I/O performance. Traditional data storages exploit row-oriented storage where the attributes of a record are placed contiguously in hard disk for fast write operations. However, for search-mostly data warehouse systems, column-oriented storage has become a more proper model because of its superior read performance. Recently, solid state drive using flash memory is largely recognized as the preferred storage media for high-speed data analysis systems. In this research, we propose a new transaction recovery scheme for a column-oriented database environment which is based on a flash media file system. We improved traditional shadow paging schemes by reusing old data pages which are supposed to be invalidated in the course of writing a new data page in the flash file system environment. In order to reuse these data pages, we exploit reused shadow list structure in our column-oriented shadow recovery(CoSR) scheme. CoSR scheme minimizes the additional storage overhead for keeping shadow pages and minimizes the I/O performance degradation caused by column data compression of traditional recovery schemes. Based on the results of the performance evaluation, we conclude that CoSR outperforms the traditional schemes by 17%.

Column-aware Transaction Management Scheme for Column-Oriented Databases (컬럼-지향 데이터베이스를 위한 컬럼-인지 트랜잭션 관리 기법)

  • Byun, Si-Woo
    • Journal of Internet Computing and Services
    • /
    • v.15 no.4
    • /
    • pp.125-133
    • /
    • 2014
  • The column-oriented database storage is a very advanced model for large-volume data analysis systems because of its superior I/O performance. Traditional data storages exploit row-oriented storage where the attributes of a record are placed contiguously in hard disk for fast write operations. However, for search-mostly datawarehouse systems, column-oriented storage has become a more proper model because of its superior read performance. Recently, solid state drive using MLC flash memory is largely recognized as the preferred storage media for high-speed data analysis systems. The features of non-volatility, low power consumption, and fast access time for read operations are sufficient grounds to support flash memory as major storage components of modern database servers. However, we need to improve traditional transaction management scheme due to the relatively slow characteristics of column compression and flash operation as compared to RAM memory. In this research, we propose a new scheme called Column-aware Multi-Version Locking (CaMVL) scheme for efficient transaction processing. CaMVL improves transaction performance by using compression lock and multi version reads for efficiently handling slow flash write/erase operation in lock management process. We also propose a simulation model to show the performance of CaMVL. Based on the results of the performance evaluation, we conclude that CaMVL scheme outperforms the traditional scheme.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.