• Title/Summary/Keyword: Large data

Search Result 14,138, Processing Time 0.036 seconds

Dense Sub-Cube Extraction Algorithm for a Multidimensional Large Sparse Data Cube (다차원 대용량 저밀도 데이타 큐브에 대한 고밀도 서브 큐브 추출 알고리즘)

  • Lee Seok-Lyong;Chun Seok-Ju;Chung Chin-Wan
    • Journal of KIISE:Databases
    • /
    • v.33 no.4
    • /
    • pp.353-362
    • /
    • 2006
  • A data warehouse is a data repository that enables users to store large volume of data and to analyze it effectively. In this research, we investigate an algorithm to establish a multidimensional data cube which is a powerful analysis tool for the contents of data warehouses and databases. There exists an inevitable retrieval overhead in a multidimensional data cube due to the sparsity of the cube. In this paper, we propose a dense sub-cube extraction algorithm that identifies dense regions from a large sparse data cube and constructs the sub-cubes based on the dense regions found. It reduces the retrieval overhead remarkably by retrieving those small dense sub-cubes instead of scanning a large sparse cube. The algorithm utilizes the bitmap and histogram based techniques to extract dense sub-cubes from the data cube, and its effectiveness is demonstrated via an experiment.

The Alignment of Measuring Data using the Pattern Matching Method (패턴매칭을 이용한 형상측정 데이터의 결합)

  • 조택동;이호영
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2000.11a
    • /
    • pp.307-310
    • /
    • 2000
  • The measuring method of large object using the pattern matching is discussed in the paper. It is hard and expensive to get the complete 3D data when the object is large or exceeds the limit of measuring devices. The large object is divided into several smaller areas and is scanned several times to get the data of all the pieces. These data are aligned to get the complete 3D data using the pattern matching method. The point pattern matching method and transform matrix algorithm are used for aligning. The laser slit beam and CCD camera is applied for experimental measurement. Visual C++ on Window98 is implemented in processing the algorithm.

  • PDF

An Empirical Study on the Construction Strategy of Web-caching Network (효과적인 웹-캐싱 네트웍 구축전략에 관한 실증 연구)

  • 이주헌;조병룡
    • The Journal of Information Technology and Database
    • /
    • v.8 no.2
    • /
    • pp.41-60
    • /
    • 2001
  • Despite the growth in Internet users, demand for multi-medial, large data files and resulting explosive growth in data traffic, there has been lack of investment in Middle-Mile, interconnection of various networks, resulting in bottleneck effect, which is acerbating. One strategy to overcome such network bottleneck is Content Delivery Network (CDN). CDN does not achieve efficient delivery of large file data through physical improvement/increase in network capacity, but by delivering large file contents, the cause of bottlenecks, from distributed servers. Since it is impracticable to physically improve networks capacity to accommodate the growth in internet traffic, CON, by strong CPs contents at cache servers deployed at major ISPs networks, is able to deliver requested contents to the requesting Web clients without the loss of data and long latency.

  • PDF

The Merging Method of Point Data with Point Pattern Matching in 3D Measurement (3차원 형상측정에서 점 패턴매칭을 이용한 점 데이터의 결합방법)

  • 조택동;이호영;양상민
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.9 no.9
    • /
    • pp.714-719
    • /
    • 2003
  • We propose a measuring method of large object using the pattern matching. It is hard and expensive to get the complete 3D data when the object is large and exceeds the limit of measuring devices. The large object is divided into several smaller areas and is scanned several times to get the data of all the pieces. These data are aligned to get the complete 3D data using the pattern matching method such as point pattern matching method and transform matrix algorithm. The laser slit beam and CCD camera are applied for the experimental measurement. Visual C++ on Windows 98 is implemented in processing the algorithm.

Multiclass LS-SVM ensemble for large data

  • Hwang, Hyungtae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1557-1563
    • /
    • 2015
  • Multiclass classification is typically performed using the voting scheme method based on combining binary classifications. In this paper we propose multiclass classification method for large data, which can be regarded as the revised one-vs-all method. The multiclass classification is performed by using the hat matrix of least squares support vector machine (LS-SVM) ensemble, which is obtained by aggregating individual LS-SVM trained on each subset of whole large data. The cross validation function is defined to select the optimal values of hyperparameters which affect the performance of multiclass LS-SVM proposed. We obtain the generalized cross validation function to reduce computational burden of cross validation function. Experimental results are then presented which indicate the performance of the proposed method.

The Architectural Pattern of a Highly Extensible System for the Asynchronous Processing of a Large Amount of Data

  • Hwang, Ro Man;Kim, Soo Kyun;An, Syungog;Park, Dong-Won
    • Journal of Information Processing Systems
    • /
    • v.9 no.4
    • /
    • pp.567-574
    • /
    • 2013
  • In this paper, we have proposed an architectural solution for a system for the visualization and modification of large amounts of data. The pattern is based on an asynchronous execution of programmable commands and a reflective approach of an object structure composition. The described pattern provides great flexibility, which helps adopting it easily to custom application needs. We have implemented a system based on the described pattern. The implemented system presents an innovative approach for a dynamic data object initialization and a flexible system for asynchronous interaction with data sources. We believe that this system can help software developers increase the quality and the production speed of their software products.

Overview of Reliability Rank Measures for Small Sample (소표본인 경우 신뢰성 순위 척도의 고찰)

  • Choi, Sung-Woon
    • Journal of the Korea Safety Management & Science
    • /
    • v.9 no.2
    • /
    • pp.161-169
    • /
    • 2007
  • This paper presents three methods for expression of reliability measures for large and small data. First method is to express parametric estimation of cardinal reliability measure data for large sample, which requires numerous sample. Second is to obtain nonparametric distribution classification of ordinal reliability measure data for small sample. However it is difficult for field user to understand this method. Last method is to acquire parametric estimation of ordinal reliability measure data for small data. Because this method requires small sample and is comprehensive, we recommend this one among the proposed methods. Various reliability rank measures are presented.

INVITED PAPER MULTIVARIATE ANALYSIS FOR THE CASE WHEN THE DIMENSION IS LARGE COMPARED TO THE SAMPLE SIZE

  • Fujikoshi, Yasunori
    • Journal of the Korean Statistical Society
    • /
    • v.33 no.1
    • /
    • pp.1-24
    • /
    • 2004
  • This paper is concerned with statistical methods for multivariate data when the number p of variables is large compared to the sample size n. Such data appear typically in analysis of DNA microarrays, curve data, financial data, etc. However, there is little statistical theory for high dimensional data. On the other hand, there are some asymptotic results under the assumption that both and p tend to $\infty$, in some ratio p/n ${\rightarrow}$c. The results suggest that the new asymptotic results are more useful and insightful than the classical large sample asymptotics. The main purpose of this paper is to review some asymptotic results for high dimensional statistics as well as classical statistics under a high dimensional asymptotic framework.

Large-scale 3D fast Fourier transform computation on a GPU

  • Jaehong Lee;Duksu Kim
    • ETRI Journal
    • /
    • v.45 no.6
    • /
    • pp.1035-1045
    • /
    • 2023
  • We propose a novel graphics processing unit (GPU) algorithm that can handle a large-scale 3D fast Fourier transform (i.e., 3D-FFT) problem whose data size is larger than the GPU's memory. A 1D FFT-based 3D-FFT computational approach is used to solve the limited device memory issue. Moreover, to reduce the communication overhead between the CPU and GPU, we propose a 3D data-transposition method that converts the target 1D vector into a contiguous memory layout and improves data transfer efficiency. The transposed data are communicated between the host and device memories efficiently through the pinned buffer and multiple streams. We apply our method to various large-scale benchmarks and compare its performance with the state-of-the-art multicore CPU FFT library (i.e., fastest Fourier transform in the West [FFTW]) and a prior GPU-based 3D-FFT algorithm. Our method achieves a higher performance (up to 2.89 times) than FFTW; it yields more performance gaps as the data size increases. The performance of the prior GPU algorithm decreases considerably in massive-scale problems, whereas our method's performance is stable.

Efficient Data Management for Finite Element Analysis with Pre-Post Processing of Large Structures (전-후 처리 과정을 포함한 거대 구조물의 유한요소 해석을 위한 효율적 데이터 구조)

  • 박시형;박진우;윤태호;김승조
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2004.04a
    • /
    • pp.389-395
    • /
    • 2004
  • We consider the interface between the parallel distributed memory multifrontal solver and the finite element method. We give in detail the requirement and the data structure of parallel FEM interface which includes the element data and the node array. The full procedures of solving a large scale structural problem are assumed to have pre-post processors, of which algorithm is not considered in this paper. The main advantage of implementing the parallel FEM interface is shown up in the case that we use a distributed memory system with a large number of processors to solve a very large scale problem. The memory efficiency and the performance effect are examined by analyzing some examples on the Pegasus cluster system.

  • PDF