Abstracted Partitioned-Layer Index: A Top-k Query Processing Method Reducing the Number of Random Accesses of the Partitioned-Layer Index

요약된 Partitioned-Layer Index: Partitioned-Layer Index의 임의 접근 횟수를 줄이는 Top-k 질의 처리 방법

  • 허준석 (한국과학기술원 전산학과)
  • Received : 2010.05.25
  • Accepted : 2010.09.13
  • Published : 2010.09.30

Abstract

Top-k queries return k objects that users most want in the database. The Partitioned-Layer Index (simply, the PL -index) is a representative method for processing the top-k queries efficiently. The PL-index partitions the database into a number of smaller databases, and then, for each partitioned database, constructs a list of sublayers over the partitioned database. Here, the $i^{th}$ sublayer in the partitioned database has the objects that can be the top-i object in the partitioned one. To retrieve top k results, the PL-index merges the sublayer lists depending on the user's query. The PL-index has the advantage of reading a very small number of objects from the database when processing the queries. However, since many random accesses occur in merging the sublayer lists, query performance of the PL-index is not good in environments like disk-based databases. In this paper, we propose the Abstracted Partitioned-Layer Index (simply, the APL-index) that significantly improves the query performance of the PL-index in disk-based environments by reducing the number of random accesses. First, by abstracting each sublayer of the PL -index into a virtual (point) object, we transform the lists of sublayers into those of virtual objects (ie., the APL-index). Then, we virtually process the given query by using the APL-index and, accordingly, predict sublayers that are to be read when actually processing the query. Next, we read the sublayers predicted from each sublayer list at a time. Accordingly, we reduce the number of random accesses that occur in the PL-index. Experimental results using synthetic and real data sets show that our APL-index proposed can significantly reduce the number of random accesses occurring in the PL-index.

Top-k 질의는 데이터베이스에서 사용자가 가장 원하는 k개의 객체를 구하는 질의이다. Top-k 질의를 효율적으로 처리하는 대표적인 연구로 Partitioned-Layer Index (간단히, PL-index) 방법이 있다. PL-index는 데이터베이스를 여러 개의 더 작은 데이터베이스로 분할하고 각 분할된 데이터베이스에 대해 sublayer들의 list (간단히, sublayer list)를 구성한다. 이때, 분할된 데이터베이스에 대해서 top-i 결과가 될 수 있는 객체들을 그 분할된 데이터베이스에 대한 i번째 sublayer로 구성한다. 그리고 주어진 질의에 맞춰 그 sublayer list들을 병합함으로써 질의 결과를 구한다. PL-index는 질의 처리 시 데이터베이스로부터 읽어 들이는 객체의 개수가 매우 작다는 장점을 가지지만, sublayer list들을 병합할 때에 임의 접근(random access)이 많이 발생하기 때문에 디스크 기반의 데이터베이스 환경에서 질의 처리 성능이 저하된다. 이에 본 논문에서는 임의 접근 횟수를 줄임으로써 디스크 기반의 데이터베이스 환경에서 PL-index의 질의 처리 성능을 크게 향상시키는 요약된(Abstracted) Partitioned-Layer Index (간단히, APL一index)를 제안한다. 먼저, PL-index의 각 sublayer를 가상의 (점) 객체로 요약함으로써 sublayer list들을 이러한 점 객체들의 list들(즉, APL-index)로 변형한다. 그리고 APL-index에 대해 질의 처리를 가상으로 수행하여 실제 질의 처리 시 접근할 sublayer를 예측한다, 그리고 예측된 sublayer들을 sublayer list별로 한꺼번에 읽어 들임으로 PL-index에서 발생하는 임의 접근 횟수를 줄인다. 합성 데이터와 실제 데이터에 대한 실험을 통하여 제안한 APL-index가 PL-index의 임의 접근 횟수를 크게 줄일 수 있음을 보인다.

Keywords

Acknowledgement

Supported by : 방위사업청, 국방과학연구소

References

  1. V. Hristidis and Y. Papakonstantinou, "Algorithms and applications for answering ranked queries using ranked views," The VLDB Journal, Vol.13, No.1, 2004.
  2. C. Li, K. C.-C. Chang, I. F. Ilyas, and S. Song, "RankSQL: Query Algebra and Optimization for Relational Top-k Queries," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Baltimore, Maryland, June 2005.
  3. C. Li, K. C.-C. Chang, and I. F. Ilyas, "Supporting ad-hoc ranking aggregates," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Chicago, IL, June 2006.
  4. J.-S. Heo, J. Cho, and K.-W. Whang, "The Hybrid-Layer Index: A Synergic Approach to Answering Top-k Queries in Arbitrary Subspaces," In Proc. 26th Int'l Conf. on Data Engineering (ICDE), Long Beach, California, Mar. 2010.
  5. J.-S. Heo, K.-Y. Whang, M.-S. Kim, Y.-R. Kim, and I.-Y. Song, "The Partitioned- Layer Index: Answering Monotone Top-k Queries Using the Convex Skyline and Partitioning- Merging Technique," Information Sciences, Vol.179, No.9, 2009
  6. R. Fagin, A. Lotem, and M. Naor, "Optimal Aggregation Algorithms for Middleware," In Proc. ACM Symposium on Principles of Database Systems (PODS), Santa Barbara, California, May 2001.
  7. D. Xin, C. Chen, and J. Han, "Towards Robust Indexing for Ranked Queries," In Proc. Int'l Conf. on Very Large Data Bases (VLDB), Seoul, Korea, Sept. 2006.
  8. Y. C. Chang, L. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J. R. Smith, "The Onion Technique: Indexing for Linear Optimization Queries," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Dallas, Texas, May 2000.
  9. G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis, "Answering Top-k Queries Using Views," In Proc. Int'l Conf. on Very Large Data Bases (VLDB), Seoul, Korea, Sept. 2006.
  10. Yi, K., Yu, H., Yang, J., Xia, G., and Chen, Y., "Efficient Maintenance of Materialized Top-k Views," In Proc. Int'l Conf. on Data Engineering (ICDE), Bangalore, India, Mar. 2003.
  11. C.-Y. Chan, P.-K. Eng, and K.-L. Tan, "Stratified computation of skylines with partially-ordered domains," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, pp. 203-214, Baltimore, Maryland, June 2005.
  12. D. Papadias, Y. Tao, G. Fu, and B. Seeger, "Progressive skyline computation in database systems," ACM Trans. on Database Systems, Vol.30, No.1, 2005.
  13. G. Beskales, M. A. Soliman, and I. F. Ilyas, "Efficient search for the top-k probable nearest neighbors in uncertain databases," In Proc. Int'l Conf. on Very Large Data Bases (VLDB), Auckland, New Zealand, Aug. 2008.
  14. M. Hua, J. Pei, W. Zhang, and X. Lin, "Ranking queries on uncertain data: a probabilistic threshold approach," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Vancouver, Canada, June 2008.
  15. S. Borzsonyi, D. Kossmann, and K. Stocker, "The Skyline Operator," In Proc. Int'l Conf. on Data Engineering (ICDE), Heidelberg, Germany, Apr. 2001.
  16. M. Berg, M. Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry: Algorithms and Applications, 2nd ed., Springer-Verlag, 2000.
  17. S. G. Gass, Linear Programming: Method and Applications, 5th ed. An International Thomson Publishing Company, 1985.
  18. B. Barber, D. Dobkin, and H. Huhdanpaa, "The Quickhull Algorithm for Convex Hulls," ACM Trans. on Mathematical Software, Vol. 22, No.4, 1996.