• Title/Summary/Keyword: Cardinality Estimation

Search Result 4, Processing Time 0.019 seconds

Count-Min HyperLogLog : Cardinality Estimation Algorithm for Big Network Data (Count-Min HyperLogLog : 네트워크 빅데이터를 위한 카디널리티 추정 알고리즘)

  • Sinjung Kang;DaeHun Nyang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.3
    • /
    • pp.427-435
    • /
    • 2023
  • Cardinality estimation is used in wide range of applications and a fundamental problem processing a large range of data. While the internet moves into the era of big data, the function addressing cardinality estimation use only on-chip cache memory. To use memory efficiently, there have been various methods proposed. However, because of the noises between estimator, which is data structure per flow, loss of accuracy occurs in these algorithms. In this paper, we focus on minimizing noises. We propose multiple data structure that each estimator has the number of estimated value as many as the number of structures and choose the minimum value, which is one with minimum noises, We discover that the proposed algorithm achieves better performance than the best existing work using the same tight memory, such as 1 bit per flow, through experiment.

Efficient distributed estimation based on non-regular quantized data

  • Kim, Yoon Hak
    • Journal of IKEEE
    • /
    • v.23 no.2
    • /
    • pp.710-715
    • /
    • 2019
  • We consider parameter estimation in distributed systems in which measurements at local nodes are quantized in a non-regular manner, where multiple codewords are mapped into a single local measurement. For the system with non-regular quantization, to ensure a perfect independent encoding at local nodes, a local measurement can be encoded into a set of a great number of codewords which are transmitted to a fusion node where estimation is conducted with enormous computational cost due to the large cardinality of the sets. In this paper, we propose an efficient estimation technique that can handle the non-regular quantized data by efficiently finding the feasible combination of codewords without searching all of the possible combinations. We conduct experiments to show that the proposed estimation performs well with respect to previous novel techniques with a reasonable complexity.

Effect of Sampling for Multi-set Cardinality Estimation (멀티셋의 크기 추정 기법에서 샘플링의 효과)

  • Dao, DinhNguyen;Nyang, DaeHun;Lee, KyungHee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.1
    • /
    • pp.15-22
    • /
    • 2015
  • Estimating the number of distinct values is really well-known problems in network data measurement and many effective algorithms are suggested. Recent works have built upon technique called Linear Counting to solve the estimation problem for massive sets or spreaders in small memory. Sampling is used to reduce the measurement data, and it is assumed that sampling gives bad effect on the accuracy. In this paper, however, we show that the sampling on multi-set estimation sometimes gives better results for CSE with sampling than for MCSE that examines all the packets without sampling in terms of accuracy and estimation range. To prove this, we presented mathematical analysis, conducted experiment with real data, and compared the results of CSE, MCSE, and CSES.

Selectivity Estimation for Spatio-Temporal a Overlap Join (시공간 겹침 조인 연산을 위한 선택도 추정 기법)

  • Lee, Myoung-Sul;Lee, Jong-Yun
    • Journal of KIISE:Databases
    • /
    • v.35 no.1
    • /
    • pp.54-66
    • /
    • 2008
  • A spatio-temporal join is an expensive operation that is commonly used in spatio-temporal database systems. In order to generate an efficient query plan for the queries involving spatio-temporal join operations, it is crucial to estimate accurate selectivity for the join operations. Given two dataset $S_1,\;S_2$ of discrete data and a timestamp $t_q$, a spatio-temporal join retrieves all pairs of objects that are intersected each other at $t_q$. The selectivity of the join operation equals the number of retrieved pairs divided by the cardinality of the Cartesian product $S_1{\times}S_2$. In this paper, we propose aspatio-temporal histogram to estimate selectivity of spatio-temporal join by extending existing geometric histogram. By using a wide spectrum of both uniform dataset and skewed dataset, it is shown that our proposed method, called Spatio-Temporal Histogram, can accurately estimate the selectivity of spatio-temporal join. Our contributions can be summarized as follows: First, the selectivity estimation of spatio-temporal join for discrete data has been first attempted. Second, we propose an efficient maintenance method that reconstructs histograms using compression of spatial statistical information during the lifespan of discrete data.