An Effective Method for Dimensionality Reduction in High-Dimensional Space

고차원 공간에서 효과적인 차원 축소 기법

  • Jeong Seung-Do (Dept. of Electronics and Computer Engineering, Hanyag University) ;
  • Kim Sang-Wook (College of Information and Communications, Hanyang University) ;
  • Choi Byung-Uk (College of Information and Communications, Hanyang University)
  • 정승도 (한양대학교 전자통신컴퓨터공학과) ;
  • 김상욱 (한양대학교 정보통신대학) ;
  • 최병욱 (한양대학교 정보통신대학)
  • Published : 2006.07.01

Abstract

In multimedia information retrieval, multimedia data are represented as vectors in high dimensional space. To search these vectors effectively, a variety of indexing methods have been proposed. However, the performance of these indexing methods degrades dramatically with increasing dimensionality, which is known as the dimensionality curse. To resolve the dimensionality curse, dimensionality reduction methods have been proposed. They map feature vectors in high dimensional space into the ones in low dimensional space before indexing the data. This paper proposes a method for dimensionality reduction based on a function approximating the Euclidean distance, which makes use of the norm and angle components of a vector. First, we identify the causes of the errors in angle estimation for approximating the Euclidean distance, and discuss basic directions to reduce those errors. Then, we propose a novel method for dimensionality reduction that composes a set of subvectors from a feature vector and maintains only the norm and the estimated angle for every subvector. The selection of a good reference vector is important for accurate estimation of the angle component. We present criteria for being a good reference vector, and propose a method that chooses a good reference vector by using Levenberg-Marquardt algorithm. Also, we define a novel distance function, and formally prove that the distance function lower-bounds the Euclidean distance. This implies that our approach does not incur any false dismissals in reducing the dimensionality effectively. Finally, we verify the superiority of the proposed method via performance evaluation with extensive experiments.

멀티미디어 정보 검색에서 멀티미디어 데이터는 고차원 공간상의 벡터로 표현된다. 이러한 특정 벡터를 효율적으로 검색하기 위하여 다양한 색인 기법이 제안되어 왔다. 그러나 특정 벡터의 차원이 증가하면서 색인 기법의 효율성이 급격히 떨어지는 차원의 저주 문제가 발생한다. 차원의 저주 문제를 해결하기 위하여 색인하기 이전에 원 특정 벡터를 저차원 공간상의 벡터로 사상하는 차원 축소 기법이 제안된 바 있다. 본 연구에서는 벡터의 놈과 각도 성분을 이용하여 유클리드 거리를 근사하는 함수를 기반으로 하는 새로운 차원 축소 기법을 제안한다. 먼저, 유클리드 거리 근사를 위하여 추정된 각도의 오차의 발생 원인을 분석하고 이 오차를 줄이기 위한 기본 방향을 제시한다. 또한, 고차원 특정 벡터를 다수의 특징 서브 벡터들의 집합으로 분리하고 각 특징 서브 벡터로부터 놈과 각도 성분을 근사하여 차원을 축소하는 새로운 기법을 제안한다. 각도 성분을 정확하게 근사하기 위해서는 올바른 기준 벡터의 설정이 필수적이다. 본 연구에서는 최적 기준 벡터의 조건을 제시하고, Levenberg-Marquardt 알고리즘을 이용하여 기준 벡터를 선정하는 방법을 제안한다. 또한, 축소된 저차원 공간상의 벡터틀을 위한 새로운 거리 함수를 정의하고, 이 거리 함수가 유클리드 거리 함수의 하한 함수가 됨을 이론적으로 증명한다. 이는 제안된 기법이 착오 기각의 발생을 허용하지 않으면서 효과적으로 차원을 줄일 수 있음을 의미하는 것이다. 끝으로, 다양한 실험에 의한 성능 평가를 통하여 제안하는 방법의 우수성을 규명한다.

Keywords

References

  1. C. C. Aggarwal, 'On the Effects of Dimensionality Reduction on High Dimensional Similarity Search,' In Proc. Int'l. Symp. on Principles of Database Systems, ACM SIGACT-SIGMOD-SIGART, pp. 256-266, May 2001 https://doi.org/10.1145/375551.383213
  2. R. Agrawal, C. Faloutsos, and A. Swami, 'Efficient Similarity Search in Sequence Database,' In Proc. Int'l. Conf. on Foundations of Data Organization and Algorithms, FODO, pp. 69-84, Oct. 1993
  3. N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger, 'The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, pp. 322-331, 1990 https://doi.org/10.1145/93597.98741
  4. S. Berchtold, C. Balun, B. Braunrnilller, D. Keirn, and H.-P. Kriegel, 'Fast Parallel Similarity Search in Multimedia Databases,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp. 1-12, 1997 https://doi.org/10.1145/253262.253263
  5. K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, 'When Is Nearest Neighbor Meaningful?,' In Proc. Int'l. Conf. on Database Theory, IDCT, pp. 217-235, Jan. 1999
  6. C. Bohm, S. Berchtold, and D. Keim, 'Searching in High-Dimensional Spaces-Index Structures for Improving the Performance of Multimedia Databases,' ACM Computing Surveys, Vol. 33, Issue 3, pp. 322-373, Sep. 2001 https://doi.org/10.1145/502807.502809
  7. P. Ciaccia, M. Patella, and P. Zezula, 'M-tree: An Efficient Access Method for Similarity Search in Metric Spaces,' In Proc Int'l. Conf. on Very Large Data Bases, VLDB, pp. 426-435, 1997
  8. O. Egecioglu, 'Parametric Approximation Algorithms for High-Dimensional Euclidean Similarity In Proc. European Conf. On Principles of Data Mining and Knowledge Discovery, PKDD, pp. 79-90, Sep. 2001
  9. O. Egecioglu, H. Ferhatosmanoglu, and U. Ogras, 'Dimensionality Reduction and Similarity Computation by Inner Product Approximations,' In IEEE Trans. on Knowledge and Data Engineering, pp. 714-726, 2004 https://doi.org/10.1109/TKDE.2004.9
  10. H. Eidenberger, 'A New Method for Visual Descriptor Evaluation,' In Proc. SPIE Storage and Retrieval Methods and Applications for Multimedia, pp. 145-157, 2004 https://doi.org/10.1117/12.525068
  11. C. Faloutsos, R. Barber, M. Flickner, W. Niblack, D. Petkovic, and W. Equitz, 'Efficient and Effective Querying By Image Content,' In Journal of Intelligent Information Systems, Vol. 3 No. 3/4 pp. 231-262, Jul. 1994 https://doi.org/10.1007/BF00962238
  12. S. Jeong, S. Kim, K Kim, and B.-U. Choi, 'An Effective Method for Approximating the Euclidean Distance in High-Dimensional Space,' In Journal of the Institude if Electronics Engineers if Korea, Vol. 42-CI No. 5 pp. 69-78, 2005
  13. K. V. R. Kanth, D. Agrawal, and A. Singh, 'Dimensionality Reduction for Similarity Searching in Dynamic Databases,' In Proc. Int'l. Conf. on Management if Data, ACM SIGMOD, pp. 166-176, Jun. 1998 https://doi.org/10.1145/276304.276320
  14. N. Katayama and S. Satoh, 'The SR-Tree: An Index Structure for High-dimensional Nearest Neighbor Queries,' In Proc. Int'l Conf. on Management if Data, ACM SIGMOD, pp. 369-380, 1997 https://doi.org/10.1145/253262.253347
  15. S. Krishnamachari and M. Abdel-Mottaleb, 'Hierarchical Clustering Algorithm for Fast Image Retrieval,' In Proc. IS & T/SPIE Conf. On Storage and Retrieval for Image and Video Databases, pp. 427-435, Jan. 1999 https://doi.org/10.1117/12.333862
  16. K. Lin, H. Jagadish, and C. Faloutsos, 'The TV-Tree: An Index Structure for High Dimensional Data,' The VLDB Journal, Vol. 3, No.4, pp; 517-542, 1994 https://doi.org/10.1007/BF01231606
  17. A. Mertins, Signal Analysis, John Wiley & Sons, Inc., 2000
  18. T. K Moon and W. C. Stirling, Mathematical Methods and Algorithms for Signal Processing, Prentice-Hall, 2000
  19. W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, and P. Yanker, 'The QBIC Project: Querying Images by Content Using Color, Texture, and Shape,' In Proc. Int'l. Conf. Storage and Retrieval for Image and Video Databases, pp. 173-187, 1993 https://doi.org/10.1117/12.143648
  20. U. Ogras and H. Ferhatosmanoglu, 'Dimensionality Reduction Using Magnitude and Shape Approximations,' In Proc. Ini'l Conf. on Information and Knowledge Management, ACM CIKM, pp. 99-107, 2003 https://doi.org/10.1145/956863.956883
  21. B.-U. Pagel, H - W. Six, and M. Winter, 'Window Query-Optimal Clustering of Spatial Objects,' In Proc. Int'l. Conf. on Principals of Database Systems, pp. 86-94, 1995 https://doi.org/10.1145/212433.212458
  22. T. Seidl and H.-P. Kriegel, 'Efficient User-daptable Similarity Search in Large Multimedia Databases, In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp. 506-515, Aug. 1997
  23. T. Seidl and H.-P. Kriegel, 'Optimal Multi-Step k-Nearest Neighbor Search,' In Proc. Int'l. Conf. on Management of data, ACM SIGMOD, pp. 154-165, June 1998 https://doi.org/10.1145/276304.276319
  24. R. Weber, H. J. Schek, and S. Blott, 'A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,' In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp. 194-205, Aug. 1998
  25. D. A. White and R. Jain, 'Similarity Indexing with the SS-tree,' In Proc. Int'l. Conf. on Data Engineering, IEEE, pp. 516-523, 1996 https://doi.org/10.1109/ICDE.1996.492202
  26. http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.html