• Title/Summary/Keyword: 다차원데이타

Search Result 89, Processing Time 0.019 seconds

A Z-Index based MOLAP Cube Storage Scheme (Z-인덱스 기반 MOLAP 큐브 저장 구조)

  • Kim, Myung;Lim, Yoon-Sun
    • Journal of KIISE:Databases
    • /
    • v.29 no.4
    • /
    • pp.262-273
    • /
    • 2002
  • MOLAP is a technology that accelerates multidimensional data analysis by storing data in a multidimensional array and accessing them using their position information. Depending on a mapping scheme of a multidimensional array onto disk, the sliced of MOLAP operations such as slice and dice varies significantly. [1] proposed a MOLAP cube storage scheme that divides a cube into small chunks with equal side length, compresses sparse chunks, and stores the chunks in row-major order of their chunk indexes. This type of cube storage scheme gives a fair chance to all dimensions of the input data. Here, we developed a variant of their cube storage scheme by placing chunks in a different order. Our scheme accelerates slice and dice operations by aligning chunks to physical disk block boundaries and clustering neighboring chunks. Z-indexing is used for chunk clustering. The efficiency of the proposed scheme is evaluated through experiments. We showed that the proposed scheme is efficient for 3~5 dimensional cubes that are frequently used to analyze business data.

An Improved Algorithm for Building Multi-dimensional Histograms with Overlapped Buckets (중첩된 버킷을 사용하는 다차원 히스토그램에 대한 개선된 알고리즘)

  • 문진영;심규석
    • Journal of KIISE:Databases
    • /
    • v.30 no.3
    • /
    • pp.336-349
    • /
    • 2003
  • Histograms have been getting a lot of attention recently. Histograms are commonly utilized in commercial database systems to capture attribute value distributions for query optimization Recently, in the advent of researches on approximate query answering and stream data, the interests in histograms are widely being spread. The simplest approach assumes that the attributes in relational tables are independent by AVI(Attribute Value Independence) assumption. However, this assumption is not generally valid for real-life datasets. To alleviate the problem of approximation on multi-dimensional data with multiple one-dimensional histograms, several techniques such as wavelet, random sampling and multi-dimensional histograms are proposed. Among them, GENHIST is a multi-dimensional histogram that is designed to approximate the data distribution with real attributes. It uses overlapping buckets that allow more efficient approximation on the data distribution. In this paper, we propose a scheme, OPT that can determine the optimal frequencies of overlapped buckets that minimize the SSE(Sum Squared Error). A histogram with overlapping buckets is first generated by GENHIST and OPT can improve the histogram by calculating the optimal frequency for each bucket. Our experimental result confirms that our technique can improve the accuracy of histograms generated by GENHIST significantly.

DEhBT:A Multidimensional Data Partitioning Scheme using hB-tree (DEhBT: hB-tree를 이용한 다차원 데이타 분할 기법)

  • Kim, Dong-Yeon;O, Yeong-Bae;Choe, Dong-Hun;Han, Sang-Yeong;Lee, Sang-Gu
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.1
    • /
    • pp.16-24
    • /
    • 1999
  • 본 논문에서는 병렬 DBMS를 사용하는 데이터 웨어하우스의 성능을 개선하기 위한 새로운 다차원 데이터 분할 기법을 제안한다. 데이터 웨어하우스는 많은 양의 데이터를 저장하는 대용량 데이터베이스이며 분석적인 정보를 얻기 위한 다차원 범위 질의가 대부분을 차지한다. 단일 차원분할 기법으로는 다차원 질의를 효과적으로 처리하기 어렵고 기존의 다차원 분할 기법은 임의의 알 수 없는 분포를 가진 데이터에 대해 균등한 분할을 보장하기 어렵다. 본 논문에서는 hB-tree 구조를 이용하여 균등한 분할을 보장하는 다차원 분할 기법을 제안하고 그 성능을 측정하기 위한 시뮬레이터 결과를 보인다. 시뮬레이션에서 hB-tree 분할 기법은 균등 분포뿐만 아니라 비균등 분포 데이터 집합에 대해서도 균등한 분할을 보인다.

Continuous Query Processing in Data Streams Using Duality of Data and Queries (데이타와 질의의 이원성을 이용한 데이타스트림에서의 연속질의 처리)

  • Lim Hyo-Sang;Lee Jae-Gil;Lee Min-Jae;Whang Kyu-Young
    • Journal of KIISE:Databases
    • /
    • v.33 no.3
    • /
    • pp.310-326
    • /
    • 2006
  • In this paper, we deal with a method of efficiently processing continuous queries in a data stream environment. We classify previous query processing methods into two dual categories - data-initiative and query-initiative - depending on whether query processing is initiated by selecting a data element or a query. This classification stems from the fact that data and queries have been treated asymmetrically. For processing continuous queries, only data-initiative methods have traditionally been employed, and thus, the performance gain that could be obtained by query-initiative methods has been overlooked. To solve this problem, we focus on an observation that data and queries can be treated symmetrically. In this paper, we propose the duality model of data and queries and, based on this model, present a new viewpoint of transforming the continuous query processing problem to a multi-dimensional spatial join problem. We also present a continuous query processing algorithm based on spatial join, named Spatial Join CQ. Spatial Join CQ processes continuous queries by finding the pairs of overlapping regions from a set of data elements and a set of queries defined as regions in the multi-dimensional space. The algorithm achieves the effects of both of the two dual methods by using the spatial join, which is a symmetric operation. Experimental results show that the proposed algorithm outperforms earlier methods by up to 36 times for simple selection continuous queries and by up to 7 times for sliding window join continuous queries.

Optimal Configurations of Multidimensional Path Indexes for the Efficient Execution of Object-Oriented Queries (객체지향 질의의 효율적 처리를 위한 다차원 경로 색인구조의 최적 구성방법)

  • Lee, Jong-Hak
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.7
    • /
    • pp.859-876
    • /
    • 2004
  • This paper presents optimal configurations of multidimensional path indexes (MPIs) for the efficient execution of object-oriented queries in object databases. MPI uses a multidimensional index structure for efficiently supporting nested predicates that involve both nested attribute and class hierarchies, which are not supported by the nested attribute index using one-dimensional index structure such as $B^+$-tree. In this paper, we have analyzed the MPIs in the framework of complex queries, containing conjunctions of nested predicates, each one involving a path expression having target classes and domain classes substitution. First of all, we have considered MPI operations caused by updating of object databases, and the use of the MPI in the case of a query containing a single nested predicate. And then, we have considered the use of the MPIs in the framework of more general queries containing nested predicates over both overlapping and non-overlapping paths. The former are paths having common subpaths, while the latter have no common subpaths.

  • PDF

An Efficient Query Transformation for Multidimensional Data Views on Relational Databases (관계형 데이타베이스에서 다차원 데이타의 뷰를 위한 효율적인 질의 변환)

  • Shin, Sung-Hyun;Kim, Jin-Ho;Moon, Yang-Sae
    • Journal of KIISE:Databases
    • /
    • v.34 no.1
    • /
    • pp.18-34
    • /
    • 2007
  • In order to provide various business analysis methods, OLAP(On-Line Analytical Processing) systems represent their data with multidimensional structures. These multidimensional data are often delivered to users in the horizontal format of tables whose columns are corresponding to values of dimension attributes. Since the horizontal tables nay have a large number of columns, they cannot be stored directly in relational database systems. Furthermore, the tables are likely to have many null values (i.e., sparse tables). In order to manage the horizontal tables efficiently, we can store them as the vertical format of tables which has dimension attribute names as their columns thus transforms the columns of horizontal tables into rows. In this way, every queries for horizontal tables have to be transformed into those for vertical tables. This paper proposed a technique for transforming horizontal table queries into vertical table ones by utilizing not only traditional relational algebraic operators but also the PIVOT operator which recent DBMS versions are providing. For achieving this goal, we designed a relational algebraic expression equivalent to the PIVOT operator and we formally proved their equivalence. Then, we developed a transformation technique for horizontal table queries using the PIVOT operator. We also performed experiments to analyze the performance of the proposed method. From the experimental results, we revealed that the proposed method has better performance than existing methods.

An Efficient Method for Finding K Nearest Pairs in Spatial Databases (공간 데이타베이스에서 최근접 K쌍을 찾는 효율적 기법)

  • Shin, Hyo-Seop;Lee, Suk-Ho
    • Journal of KIISE:Databases
    • /
    • v.27 no.2
    • /
    • pp.238-246
    • /
    • 2000
  • The distance join has been introduced previously, which finds nearest pairs in the order of distance incrementally among two spatial data sets built with multidimensional indexes like R-trees. We propose efficient K-distance joins when the number(K) of pairs to find is preset. Especially, we develop a distance join algorithm with bi-directional expansion and optimized plane sweeping using selection method of sweep axis and direction. The experiments on real spatial data sets show that the proposed algorithm is much better than the former algorithms.

  • PDF

A Type Hierarchy Index for XML Databases with XML Schema (XML Schema에 의한 XML 데이타베이스의 타입 상속 색인구조)

  • Lim Yun-Ju;Lee Jong-Hak
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.11a
    • /
    • pp.85-88
    • /
    • 2004
  • 최근 XML데이터베이스는 웹의 발전과 더불어 광범위한 인터넷의 자원 공유에 크게 기여하고 있으며 이러한 자원 공유를 위해서는 XML데이타베이스에 대한 구조적 정의로 타입 상속 구조를 가지는 XML Schema를 사용한다. 그러므로 XML Schema를 따르는 XML데이타베이스에 대한 효율적인 색인기법에 대한 연구가 필요하다. 따라서 본 논문에서는 기존의 다차원 색인구조와 사전에 분석한 사용자 질의 패턴에 대한 정보를 이용하여 주어진 질의들에 의해서 액세스되는 색인 페이지의 평균 개수가 최소가 되게 하는 최적의 이차원 타입 색인 구조를 구성 할 수 있는 2D-THI를 제안한다. 제안한 2D-THI의 성능을 비교 평가하기 위해서 기존의 객체지향 데이터베이스에서 클래스 상속에 대한 색인구조로 널리 사용되고 있는 CH-index와 CG-tree를 XML데이타베이스에 적용하여 이들과 2D-THI를 비용모델을 통해서 비교 분석한다. 그 결과로 본 논문에서 제안한 2D-THI로서 다양한 질의 패턴에 대해서 최적의 색인구조를 구성할 수 있음을 보인다.

  • PDF

A Sliding Window-based Multivariate Stream Data Classification (슬라이딩 윈도우 기반 다변량 스트림 데이타 분류 기법)

  • Seo, Sung-Bo;Kang, Jae-Woo;Nam, Kwang-Woo;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.163-174
    • /
    • 2006
  • In distributed wireless sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. We propose a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes input as a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a standard text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Bayesian classifier and SVM, and for unsupervised, we tested Jaccard, TFIDF Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.