최근접 질의를 위한 고차원 인덱싱 방법

  • 김상욱 (강원대학교 컴퓨터정보통신학부) ;
  • ;
  • Published : 2001.12.01

Abstract

The nearest neighbor query is an important operation widely used in multimedia databases for finding the object that is most similar to a given object Most of techniques for processing nearest neighbor queries employ multidimensional indexes for effective indexing of objects. However, the performance of previous multidimensional indexes, which use N-dimensional rectangles or spheres for representing the capsule of the object cluster, deteriorates seriously as th number of dimensions gets higher, In this paper we first point out the fact that the simple representation of capsuler incurs performance degradation in processing nearest neighbor queries. For alleviating this problem,. we propose(1) adopting new axis systems appropriate to a given cluster (2) representing various shapes of capsules by combining rectangles and spheres, and (3) maintaining outliers separately, We also verify the superiority of our approach through performance evaluation by performing extensive experiments.

최근접 질의(nearest neighbor query)는 멀티미디어 데이타베이스에서 주어진 질의 객체와 가장 유사한 객체를 찾기 위한 매우 중요한 연산으로 사용된다. 대부분의 최근접 질의 처리 기법들은 객체의 효과 적인 인덱싱을 위하여 다차원 인덱스(multidimensional index)를 사용한다. 그러나 N차원 사각형 혹은 원을 사용하여 객체 클러스터의 캡슐 표현하는 기존의 다차원 인덱스들은 타원 수가 높아짐에 따라 검색 성능이 크게 떨어진다. 본 논문에서는 이러한 단순한 캡슐 표현 방식이 최근접 질의 처리의 성능을 저하시키는 주요 원인임을 지적하고, (1) 클러스터에 적합한 새로운 축 시스템(axis system)의 채택, (2) 원과 사각형의 조합 에 의한 다양한 캡슐 형태의 표현. (3) 아웃 라이어(outlier)의 별도 관리 등의 해결 방안을 제안한다. 또한, 이러한 개념들을 채택하는 인덱싱 구조를 제시하고. 이를 이용하는 최근접 질의 처리 방안을 제안한다. 끝으 로, 다양한 실험에 의한 성능 평가를 통하여 제안된 기법의 우수성을 검증한다.

Keywords

References

  1. C. Faloutsos et al., 'Efficient and Effective Querying by Image Content, Journal of Intelligent Information Systems, Vol. 3, No, 3, pp. 231-262, 1994 https://doi.org/10.1007/BF00962238
  2. C. Faloutsos, 'Fast Searching by Content in Multimedia Databases,' IEEE Data Engineering Bulletin, Vol. 18, No.4, pp, 31-40, 1995
  3. M. Arya et al., 'QBlSM: Extending a DBMS to Support 3D Medical Images, In Proc. Intl. Conf. on Data Engineering, IEEE, pp. 314-325, 1994 https://doi.org/10.1109/ICDE.1994.283046
  4. H. V. Jagadish, 'A Retrieval Technique for Similar Shapes,' In Proc. IntI. Conf. on Management of Data, ACM SIGMOD, pp. 208-217, 1991 https://doi.org/10.1145/115790.115821
  5. W. Niblack et al., 'The QBIC Project: Querying Images by Content, Using Color, Texture, and Shape,' In Proc. Intl. Conf. Storage and Retrieval for Image and Video Databases, pp. 173-187, 1993 https://doi.org/10.1117/12.143648
  6. S. Berchtold et al., 'Fast Nearest Neighbor Search in High-Dimensional Space, In Proc. IntI. Conf. on Data Engineering, IEEE, pp, 209-218, 1998 https://doi.org/10.1109/ICDE.1998.655779
  7. F. Kom et al., 'Fast Nearest Neighbor Search in Medical Image Databases, In Proc. IntI. Conf. on Very Large Data Bases, VLDB, pp. 215-226, 1996
  8. N, Roussopoulos. S. Kelley, F. Vincent, 'Nearest Neighbor Queries,' In Proc. IntI. Conf. on Management of Data, ACM SIGMOD, pp. 71-79. 1995 https://doi.org/10.1145/223784.223794
  9. T. Seidl and H.-P. Kriegel, 'Optimal Multi-Step k-Nearest Neighbor Search.' In Proc. IntI. Conf. on Management of Data, ACM SIGMOD, pp. 154-165. 1998 https://doi.org/10.1145/276304.276319
  10. A. Guttman, 'H-Trees: A Dynamic Index Structure for Spatial Searching,' In Proc. IntI. Conf. on Management of Data. ACM SIGMOD. pp. 47-57, 1984 https://doi.org/10.1145/602259.602266
  11. T. Sellis, N, Houssopoulos, C. Faloutsos, 'The H+-Tree: A Dynamic Index for Multi-Dimensional Objects, In Proc, Intl. Conf. on Very Large Data Bases. VLDB, pp. 507-518. 1987
  12. N. Beckmann et al., 'The H*-tree: an Efficient and Robust Access Method for Points and Rectangles,' In Proc. Intl. Conf. on Management of Data, ACM SIGMOD, pp. 322-331. May 1990 https://doi.org/10.1145/93597.98741
  13. R. Weber, H.-J. Schek, and S. Blatt, 'A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces, In Proc. IntI. Conf. on Very Large Data Bases, VLDB, pp. 194-205, 1998
  14. S. Berchtold, D. A. Keirn, and H.-P. Kriegel. 'The X -tree: An Index Structure for High-Dimensional Data,' In Proc Intl. Conf. on Very Large Data Bases, VLDB, pp. 28-39, 1996
  15. D. A. White and H. Jain, 'Similarity Indexing with the SS-tree,' In Proc. IntI. Conf. on Data Engineering, IEEE, pp. 516-523, 1996 https://doi.org/10.1109/ICDE.1996.492202
  16. D. A. White and H. Jain, 'Similarity Indexing: Algorithms and Performance,' In Proc. IntI. Conf. Storage and Retrieval for Image and Video Databases, SPIE, pp. 62-73, 1996 https://doi.org/10.1117/12.234810
  17. Sproull, 'Refinements to Nearest Neighbor Searching in K - Dimensional Trees,' Algorithmica. Vol. 6, No.4, pp. 579-589, 1991 https://doi.org/10.1007/BF01759061
  18. K. Lin, H. Jagadish, and C. Faloutsos, 'The TV-Tree: An Index Structure for High Dimensional Data,' The VLDB Journal, Vol. 3, No.4, pp, 517-542 https://doi.org/10.1007/BF01231606
  19. T. Bozkaya, and Z. Ozsoyoglu, 'Distance-Based Indexing for High-Dimensional Metric Spaces,' In Proc. Intl. Conf. on Management of Data, ACM SIGMOD, pp. 357-368, 1997 https://doi.org/10.1145/253260.253345
  20. J. K. Uhlmann, 'Satisfying General Proximity and Similarity Queries with Metric Trees,' Information Processing Letters, Vol. 40, pp. 175-179, 1991 https://doi.org/10.1016/0020-0190(91)90074-R
  21. N. Katayama and S. Satoh, 'The SH-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries,' In Proc. IntI. Conf. on Management of Data, ACM SIGMOD, pp, 369-380, 1997 https://doi.org/10.1145/253260.253347
  22. E. Kushilevitz, R. Ostrovsky, and Y. Rabani, 'Efficient Search for Approximate Nearest Neighbor Queries,' In Proc. ACM IntI. Symp. on Theory of Computing, pp, 614-623, 1998 https://doi.org/10.1145/276698.276877
  23. S. Pramanik, S. Alexander, and J. Li, 'An Efficient Searching Algorithm for Approximate Nearest Neighbor Queries in High Dimensions,' IEEE Multimedia Systems, pp, 865-869, 1999 https://doi.org/10.1109/MMCS.1999.779315
  24. K. Beyer et al., 'When Is Nearest Neighbor Meaningful',' In Proc. IntI. Conf. on Database Theory, pp. 217-235, 1998
  25. E, Horowitz, S. Sahni, Fundamentals of Computer Algorithms, Computer Science Press, 1978
  26. P. Ciaccia, M, Patella, P. Zezula, 'M -tree: An Efficient Access Method for Similarity Search in Metric Spaces.' In Proc IntI. Conf. on Very Large Data Bases, VLDB, pp. 426-435, 1997
  27. A. M. Law and W. D. Kelton, Simulation Modeling and Analysis, MerGraw-Hill Company, New York, 1982
  28. W. Fleming, Functions of Several Variable, 2nd Edition. 1977
  29. E. M. Knorr and R. T. Ng, 'Algorithms for Mining Distance-Based Outliers in Large Datasets.' In Proc. Intl. Conf. on Very Large Data Bases, VLDB, pp. 392-403, 1998
  30. S. Ramaswamy, R. Rastogi, and K. Shim, 'Efficient Algorithms for Mining Outliers from Large Data Sets,' In Proc, Intl. Conf. on Management of Data, ACM SIGMOD, pp. 427 -438, 2000
  31. M. M. Breunig et al., 'LOF: Identifying DensityBased Local Outliers.' In Proc. IntI. Conf. on Management of Data, ACM SIGMOD, pp. 93-104, 2000
  32. C. Aggarwal and P. S. Yu, 'Outlier Detection for High Dimensional Data,' In Proc. Intl. Conf. on Management of Data, ACM SIGMOD, 2001. (accepted to appear) https://doi.org/10.1145/375663.375668
  33. I.T. Jolliffe, Principal Component Analysis, Springer- Verlag 1986
  34. B.-U. Pagel, H.-W. Six, and M. Winter, 'Window Query--optimal Clustering of Spatial Objects,' In Proc. Intl. Conf. on Principals of Database Systems, pp. 86-94, 1995 https://doi.org/10.1145/212433.212458