• Title/Summary/Keyword: 다차원 온라인 분석처리

Search Result 19, Processing Time 0.024 seconds

Home Shopping Product Trend Analysis Method Based on On-Line Analytical Processing (OLAP) cube (온라인 분석 처리(OLAP) 기반 홈쇼핑 상품 트렌드 분석 방법)

  • Park, Hansaem;Kwon, Kyunglag;Kang, Daehyun;Lee, Jeungmin;Chung, In-Jeong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.1046-1049
    • /
    • 2014
  • 최근 웹 2.0 의 폭발적인 성장과 스마트기기의 대중화 및 모바일 서비스의 활성화로 인하여 다양하고 방대한 양의 정보들이 생성되었다. 또한, 현재 산업분야에서는 이와같은 방대한 양의 데이터들을 처리하기 위하여 데이터웨어하우스와 OnLine Analytical Processing(OLAP)을 통한 정보 분석 사례가 많아지고 있다. 특히, 의사결정자들은 이러한 수많은 정보들 중에서 의사결정에 도움이 되는 정보들을 찾는 것을 목표로 하지만 아직까지도 의사결정자들은 자신들이 원하는 정보들을 찾는데 많은 어려움을 겪고 있다. 따라서, 최근에 수많은 정보들을 효과적으로 활용하기 위한 다양한 연구가 수행어지고 있고 의사결정자들의 올바른 의사결정을 도와주는 시스템에 대한 중요도가 나날이 급증하고 있다. 본 논문에서는 이러한 의사결정자들의 올바른 의사결정을 위해 OLAP 을 활용하여 TV 홈쇼핑에서 발생하는 수많은 정보들을 분류 목적에 따라 다차원적으로 분석하고 분석된 정보들을 바탕으로 하여 TV 홈쇼핑에서 판매하고 있는 상품의 트렌드를 분석한다.

Sort-Based Distributed Parallel Data Cube Computation Algorithm using MapReduce (맵리듀스를 이용한 정렬 기반의 데이터 큐브 분산 병렬 계산 알고리즘)

  • Lee, Suan;Kim, Jinho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.9
    • /
    • pp.196-204
    • /
    • 2012
  • Recently, many applications perform OLAP(On-Line Analytical Processing) over a very large volume of data. Multidimensional data cube is regarded as a core tool in OLAP analysis. This paper focuses on the method how to efficiently compute data cubes in parallel by using a popular parallel processing tool, MapReduce. We investigate efficient ways to implement PipeSort algorithm, a well-known data cube computation method, on the MapReduce framework. The PipeSort executes several (descendant) cuboids at the same time as a pipeline by scanning one (ancestor) cuboid once, which have the same sorting order. This paper proposed four ways implementing the pipeline of the PipeSort on the MapReduce framework which runs across 20 servers. Our experiments show that PipeMap-NoReduce algorithm outperforms the rest algorithms for high-dimensional data. On the contrary, Post-Pipe stands out above the others for low-dimensional data.

Determining the number of Clusters in On-Line Document Clustering Algorithm (온라인 문서 군집화에서 군집 수 결정 방법)

  • Jee, Tae-Chang;Lee, Hyun-Jin;Lee, Yill-Byung
    • The KIPS Transactions:PartB
    • /
    • v.14B no.7
    • /
    • pp.513-522
    • /
    • 2007
  • Clustering is to divide given data and automatically find out the hidden meanings in the data. It analyzes data, which are difficult for people to check in detail, and then, makes several clusters consisting of data with similar characteristics. On-Line Document Clustering System, which makes a group of similar documents by use of results of the search engine, is aimed to increase the convenience of information retrieval area. Document clustering is automatically done without human interference, and the number of clusters, which affect the result of clustering, should be decided automatically too. Also, the one of the characteristics of an on-line system is guarantying fast response time. This paper proposed a method of determining the number of clusters automatically by geometrical information. The proposed method composed of two stages. In the first stage, centers of clusters are projected on the low-dimensional plane, and in the second stage, clusters are combined by use of distance of centers of clusters in the low-dimensional plane. As a result of experimenting this method with real data, it was found that clustering performance became better and the response time is suitable to on-line circumstance.

An Efficient Query Transformation for Multidimensional Data Views on Relational Databases (관계형 데이타베이스에서 다차원 데이타의 뷰를 위한 효율적인 질의 변환)

  • Shin, Sung-Hyun;Kim, Jin-Ho;Moon, Yang-Sae
    • Journal of KIISE:Databases
    • /
    • v.34 no.1
    • /
    • pp.18-34
    • /
    • 2007
  • In order to provide various business analysis methods, OLAP(On-Line Analytical Processing) systems represent their data with multidimensional structures. These multidimensional data are often delivered to users in the horizontal format of tables whose columns are corresponding to values of dimension attributes. Since the horizontal tables nay have a large number of columns, they cannot be stored directly in relational database systems. Furthermore, the tables are likely to have many null values (i.e., sparse tables). In order to manage the horizontal tables efficiently, we can store them as the vertical format of tables which has dimension attribute names as their columns thus transforms the columns of horizontal tables into rows. In this way, every queries for horizontal tables have to be transformed into those for vertical tables. This paper proposed a technique for transforming horizontal table queries into vertical table ones by utilizing not only traditional relational algebraic operators but also the PIVOT operator which recent DBMS versions are providing. For achieving this goal, we designed a relational algebraic expression equivalent to the PIVOT operator and we formally proved their equivalence. Then, we developed a transformation technique for horizontal table queries using the PIVOT operator. We also performed experiments to analyze the performance of the proposed method. From the experimental results, we revealed that the proposed method has better performance than existing methods.

A PIVOT based Query Optimization Technique for Horizontal View Tables in Relational Databases (관계 데이터베이스에서 수평 뷰 테이블에 대한 PIVOT 기반의 질의 최적화 방법)

  • Shin, Sung-Hyun;Moon, Yang-Sae;Kim, Jin-Ho;Kang, Gong-Mi
    • The KIPS Transactions:PartD
    • /
    • v.14D no.2
    • /
    • pp.157-168
    • /
    • 2007
  • For effective analyses in various business applications, OLAP(On-Line Analytical Processing) systems represent the multidimensional data as the horizontal format of tables whose columns are corresponding to values of dimension attributes. Because the traditional RDBMSs have the limitation on the maximum number of attributes in table columns(MS SQLServer and Oracle permit each table to have up to 1,024 columns), horizontal tables cannot be directly stored into relational database systems. In this paper, we propose various efficient optimization strategies in transforming horizontal queries to equivalent vertical queries. To achieve this goral, we first store a horizontal table using an equivalent vertical table, and then develop various query transformation rules for horizontal table queries using the PIVOT operator. In particular, we propose various alternative query transformation rules for the basic relational operators, selection, projection, and join. Here, we note that the transformed queries can be executed in several ways, and their execution times will differ from each other. Thus, we propose various optimization strategies that transform the horizontal queries to the equivalent vertical queries when using the PIVOT operator. Finally, we evaluate these methods through extensive experiments and identify the optimal transformation strategy when using the PIVOT operator.

A Web Services-based Client OLAP API and Its Application to Cube Browsing (웹 서비스 기반의 클라이언트 OLAP API와 큐브 브라우징에의 응용 사례)

  • Bae, Eun-Ju;Kim, Myung
    • The KIPS Transactions:PartD
    • /
    • v.10D no.1
    • /
    • pp.143-152
    • /
    • 2003
  • XML and Web Services draw a lot of attention as standard technologies for data exchange and integration among heterogeneous platforms XML/A, which supports such technologies, is a SOAP based XML APl that facilitates data exchange between a client application and a data analysis engine through the Internet. The fact that the XML format is used for data exchange makes XML/A to be platform-independent. However. client application developers have to go through a tedious Job of treating the same type of XML documents fur downloading data from the server. Also, an XML query language is needed for extracting data from the XML documents sent by the server. In this paper, we present a high level client OLAP API, called DXML, for the client application developers in the windows environment to easily use the OLAP services of XML/A. XMLMD consists of properties and methods needed for OLAP application development. XMLMD is to XML/A what ADOMD is to OLEDB for OLAP. We also present a web OLAP cube browser that is developed using XMLMD. The browser display's data in various formats such as XML, HTML, Excel, and graph.

A MOLAP Cube Storage Scheme for Fast Query Processing (고속 질의처리를 위한 MOLAP 큐브 저장구조)

  • Lim, Yoon-Sun;Yang, Hye-Yeong;Kim, Myung
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04b
    • /
    • pp.127-129
    • /
    • 2001
  • 데이터 웨어하우스의 데이터를 다차원적으로 분석하여 그 결과를 온라인으로 사용자에게 제공하는 것을 OLAP 이라고 하고, 이 때 데이터를 큐브라고 불리는 배열에 저장해 두고 데이터를 위치정보를 통해 엑세스하는 시스템을 MOLAP 시스템이라고 한다. OLAP 연산 도중에 디스크로부터 읽어야 하는 데이터의 양을 감소시키기 위해 큐브를 압축된 청크 단위로 저장하는 방안이 이미 제안되고 있으나, 큐브의 데이터 분포, 청크와 디스크 블록의 크기 관계 등을 고려하여 디스크 엑세스를 줄이는 방안에 관한 연구는 아직 소개된 바가 없다. 본 연구에서는 청크들을 밀도를 기준으로하여 군집화 하고, 큐브내의 인접 청크들을 가능한 한 동일한 디스크 블록에 속하게 함으로써, OLAP의 주요 연산인 슬라이스, 다이스와 같은 연산의 속도를 향상시키는 방안을 제시한다. 제안한 저장구조는 실험을 통해 그 효율성을 증명하였다.

  • PDF

H*-tree/H*-cubing-cubing: Improved Data Cube Structure and Cubing Method for OLAP on Data Stream (H*-tree/H*-cubing: 데이터 스트림의 OLAP를 위한 향상된 데이터 큐브 구조 및 큐빙 기법)

  • Chen, Xiangrui;Li, Yan;Lee, Dong-Wook;Kim, Gyoung-Bae;Bae, Hae-Young
    • The KIPS Transactions:PartD
    • /
    • v.16D no.4
    • /
    • pp.475-486
    • /
    • 2009
  • Data cube plays an important role in multi-dimensional, multi-level data analysis. Meeting on-line analysis requirements of data stream, several cube structures have been proposed for OLAP on data stream, such as stream cube, flowcube, S-cube. Since it is costly to construct data cube and execute ad-hoc OLAP queries, more research works should be done considering efficient data structure, query method and algorithms. Stream cube uses H-cubing to compute selected cuboids and store the computed cells in an H-tree, which form the cuboids along popular-path. However, the H-tree layoutis disorderly and H-cubing method relies too much on popular path.In this paper, first, we propose $H^*$-tree, an improved data structure, which makes the retrieval operation in tree structure more efficient. Second, we propose an improved cubing method, $H^*$-cubing, with respect to computing the cuboids that cannot be retrieved along popular-path when an ad-hoc OLAP query is executed. $H^*$-tree construction and $H^*$-cubing algorithms are given. Performance study turns out that during the construction step, $H^*$-tree outperforms H-tree with a more desirable trade-off between time and memory usage, and $H^*$-cubing is better adapted to ad-hoc OLAP querieswith respect to the factors such as time and memory space.

Development and Application of a Big Data Platform for Education Longitudinal Study Analysis (교육종단연구 분석을 위한 빅데이터 플랫폼 개발 및 적용)

  • Park, Jung;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.11-27
    • /
    • 2020
  • In this paper, we developed a big data platform to store, process, and analyze effectively on such education longitudinal study data. And it was applied to the Seoul Education Longitudinal Study(SELS) to confirm its usefulness. The developed platform consists of data preprocessing unit and data analysis unit. The data preprocessing unit 1) masking, 2) converts each item into a factor 3) normalizes / creates dummy variables 4) data derivation, and 5) data warehousing. The data analysis unit consists of OLAP and data mining(DM). In the multidimensional analysis, OLAP is performed after selecting a measure and designing a schema. The DM process involves variable selection, research model selection, data modification, parameter tuning, model training, model evaluation, and interpretation of the results. The data warehouse created through the preprocessing process on this platform can be shared by various researchers, and the continuous accumulation of data sets makes further analysis easier for subsequent researchers. In addition, policy-makers can access the SELS data warehouse directly and analyze it online through multi-dimensional analysis, enabling scientific decision making. To prove the usefulness of the developed platform, SELS data was built on the platform and OLAP and DM were performed by selecting the mathematics academic achievement as a measure, and various factors affecting the measurements were analyzed using DM techniques. This enabled us to quickly and effectively derive implications for data-based education policies.