An Efficient Bulk  Loading for High Dimensional Index Structures

Bok, Kyoung-Soo;Lee, Seok-Hee;Cho, Ki-Hyung;Yoo, Jae-Soo;

The Transactions of the Korea Information Processing Society (한국정보처리학회논문지)

Volume 7 Issue 8
/
Pages.2327-2340
/
2000
/
1226-9190(pISSN)

Korea Information Processing Society (한국정보처리학회)

An Efficient Bulk Loading for High Dimensional Index Structures

고차원 색인 구조를 위한 효율적인 벌크 로딩

Bok, Kyoung-Soo (Dept. of Information Communication Engineering, Graduate School of Chungbuk National University) ;
Lee, Seok-Hee (Dept.of Internet BroadCast, Dongah Broadcasting College) ;
Cho, Ki-Hyung (Dept. of Electrical Elecronic Engineering, Chungbuk National University) ;
Yoo, Jae-Soo (Dept. of Information Communication Engineering, Chungbuk National University)

복경수 (충북대학교 대학원 정보통신공학과) ;
이석희 (동아방송대학 인터넷방송과) ;
조기형 (충북대학교 전기전자공학부) ;
유재수 (충북대학교 정보통신공학과)

Published : 2000.08.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Existing bulk loading algorithms for multi-dimensional index structures suffer from satisfying both index construction time and retrieval perfonnancc. In this paper, we propose an efficient bulk loading algorithm to construct high dimensional index structures for large data set that overcomes the problem. Although several bulk loading algorithms have been proposed for this purpose, none of them improve both constnlCtion time and search performance. To improve the construction time, we don't sort whole data set and use bisectiou algorithm that divides the whole data set or a subset into two partitions according to the specific pivot value. Also, we improve the search performance by selecting split positions according to the distribution properties of the data set. We show that the proposed algorithm is superior to existing algorithms in terms of construction time and search perfomlance through various experiments.

다차원 색인 구조를 위한 기존의 벌크 로딩 알고리즘은 색인 구성 시간과 검색 성능 모두를 향상시키지 못하는 문제점을 갖는다. 이 논문은 이와 같은 문제점을 해결한 대량의 고차원 데이터에 대한 색인 구조를 위한 새로운 벌크 로딩 알고리즘을 제안한다. 제안한는 알고리즘은 색인을 구성하는 시간을 단축시키기 위해 전체 데이터 집합을 정렬하는 것이 아니라 데이터의 특성을 파악하여 피벗 값에 따라 분할하는 기법을 이용한다. 또한 검색 성능을 향상시키기 위해 데이터들의 분포 특성에 따라 분할 위치를 선택한다. 실험을 통해 제안하는 알고리즘의 기존의 알고리즘보다 색인 구성 시간과 검색 성능 측면에서 우수함을 보인다.

Keywords

References

Guttman A., 'R-trees : A Dynamic Index Structure for Spatial Searching,' ACM SIGMOD, pp.47-57, 1984
Beckmann N., Kriegel H. P., Schneider R., Seeger B., 'The R*-tree : An Efficient and Robust Access Method for Points and Rectangles,' ACM SIGMOD, pp.322-331, May, 1990
K.I. Lin, H. Jagadish, and C. Faloutsos, 'The TV-tree - An Index Structure for High Dimensional Data,' VLDB Journal, Vol.3, pp.517-542, 1994 https://doi.org/10.1007/BF01231606
Berchtold S., Keim D. A., Kriegel H. P., 'The X-tree : An Index Structure for High-Dimensional Data,' VLDB Conference, pp.28-39, 1996
Roussopoulos N., Keifker D., 'Direct Spatial Search on Pictorial Databases Packed R-trees,' Proc. ACM SIGMOD Conference, pp.17-31, 1985 https://doi.org/10.1145/318898.318900
Kamel I., Falousos C., 'On Packing R-trees,' CIKM, pp.490-499, 1993 https://doi.org/10.1145/170088.170403
Leutenegger S. T., Lopez M. A., Edgington J., 'STR : A Simple and Efficient Algorithm for R-Tree Packing,' ICDE, pp.497-506, 1997
Garcia Y. J., Lopez M. A., Leutenegger S. T., 'A Greedy Algorithm for Bulk Loading R-Trees,' ACM GIS, pp.163-164, 1998
Van den Bercken J., Seeger B., Widmayer., 'A General Approach to Bulk Loading Multidimensional Index Structures,' VLDB Conference, pp. 406-415, 1997
Arge L., 'The Buffer Tree : A New Technique for Optimal I/O-Algorithms,' WADS, pp.334-345, 1995
Berchtold S., Bohm C., Kriegel H. P., 'Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations,' EDBT, pp.216-230, 1998
Bially T., 'Space-Filling Curves : Their Generation and Their Application to Bandwidth Reduction,' IEEE Trans. on Information Theory, Vol.IT-15, No.6, pp.658-664, 1969 https://doi.org/10.1109/TIT.1969.1054385
Lo M. N., Ravishankar C. V., 'Generating Seeded Trees from Data Sets,' SSD, pp.328-347, 1995
Berchtold S., Bohm C., Keim D. A., Kriegel H. P., 'A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space,' PODS, pp.78-86, 1997 https://doi.org/10.1145/263661.263671

The Transactions of the Korea Information Processing Society (한국정보처리학회논문지)

An Efficient Bulk Loading for High Dimensional Index Structures

고차원 색인 구조를 위한 효율적인 벌크 로딩

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)