Analyzing errors in selectivity estimation using the multilevel grid file

계층 그리드 화일을 이용한 선택률 추정에서 발생되는 오차 분석

  • 김상욱 (강원대학교 정보통신공학과) ;
  • 황환규 (강원대학교 정보통신공학과) ;
  • 황규영 (한국과학기술원 전산학과)
  • Published : 1996.09.01

Abstract

In this paper, we discuss the errors in selectivity estimation using the multilevel grid file (MLGF). We first demonstrate that the estimatio errors stem from the uniformity assumption that records are uniformly distributed in their belonging region represented by an entry in a level of an MLGF directory. Bsed on this demonstration, we then investigate five factors affecting the accuracy of estimation: (1) the data distribution in a region (2) the number of records stored in an MLFG (3) the page size, (4) the query region size, and (5) the level of an MLFG directory. Next we present the tendancy of estimation errors according to the change of values for each factor through experiments. The results show that the errors decrease when (1) the distribution of records in a region becomes closer to the uniform one, (2) the number of records in an MLFG increases, (3) the page size decreases, (4) the query region size increases, and (5) the level of an MLFG directory employed as data distribution information becomes lower. After the definition of the granule ratio, the core formula representing the basic relationship between the estimation errors and the above five factors, we finally examine the change of estimation errors according to the change of the values for the granule ratio through experiments. The results indicate that errors tend to be similar depending on the values for the granule ratio regardless of the various changes of the values for the five factors. factors affecting the accuracy of estimation:

Keywords