An Effective Method for Dimensionality Reduction in High-Dimensional Space

고차원 공간에서 효과적인 차원 축소 기법

  • Jeong Seung-Do (Dept. of Electronics and Computer Engineering, Hanyag University) ;
  • Kim Sang-Wook (College of Information and Communications, Hanyang University) ;
  • Choi Byung-Uk (College of Information and Communications, Hanyang University)
  • 정승도 (한양대학교 전자통신컴퓨터공학과) ;
  • 김상욱 (한양대학교 정보통신대학) ;
  • 최병욱 (한양대학교 정보통신대학)
  • Published : 2006.07.01

Abstract

In multimedia information retrieval, multimedia data are represented as vectors in high dimensional space. To search these vectors effectively, a variety of indexing methods have been proposed. However, the performance of these indexing methods degrades dramatically with increasing dimensionality, which is known as the dimensionality curse. To resolve the dimensionality curse, dimensionality reduction methods have been proposed. They map feature vectors in high dimensional space into the ones in low dimensional space before indexing the data. This paper proposes a method for dimensionality reduction based on a function approximating the Euclidean distance, which makes use of the norm and angle components of a vector. First, we identify the causes of the errors in angle estimation for approximating the Euclidean distance, and discuss basic directions to reduce those errors. Then, we propose a novel method for dimensionality reduction that composes a set of subvectors from a feature vector and maintains only the norm and the estimated angle for every subvector. The selection of a good reference vector is important for accurate estimation of the angle component. We present criteria for being a good reference vector, and propose a method that chooses a good reference vector by using Levenberg-Marquardt algorithm. Also, we define a novel distance function, and formally prove that the distance function lower-bounds the Euclidean distance. This implies that our approach does not incur any false dismissals in reducing the dimensionality effectively. Finally, we verify the superiority of the proposed method via performance evaluation with extensive experiments.

References

  1. C. C. Aggarwal, 'On the Effects of Dimensionality Reduction on High Dimensional Similarity Search,' In Proc. Int'l. Symp. on Principles of Database Systems, ACM SIGACT-SIGMOD-SIGART, pp. 256-266, May 2001 https://doi.org/10.1145/375551.383213
  2. R. Agrawal, C. Faloutsos, and A. Swami, 'Efficient Similarity Search in Sequence Database,' In Proc. Int'l. Conf. on Foundations of Data Organization and Algorithms, FODO, pp. 69-84, Oct. 1993
  3. N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger, 'The R*-tree: An Efficient and Robust Access Method for Points and Rectangles,' In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, pp. 322-331, 1990 https://doi.org/10.1145/93597.98741
  4. S. Berchtold, C. Balun, B. Braunrnilller, D. Keirn, and H.-P. Kriegel, 'Fast Parallel Similarity Search in Multimedia Databases,' In Proc. Int'l. Conf. on Management of Data, ACM SIGMOD, pp. 1-12, 1997 https://doi.org/10.1145/253262.253263
  5. K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, 'When Is Nearest Neighbor Meaningful?,' In Proc. Int'l. Conf. on Database Theory, IDCT, pp. 217-235, Jan. 1999
  6. C. Bohm, S. Berchtold, and D. Keim, 'Searching in High-Dimensional Spaces-Index Structures for Improving the Performance of Multimedia Databases,' ACM Computing Surveys, Vol. 33, Issue 3, pp. 322-373, Sep. 2001 https://doi.org/10.1145/502807.502809
  7. P. Ciaccia, M. Patella, and P. Zezula, 'M-tree: An Efficient Access Method for Similarity Search in Metric Spaces,' In Proc Int'l. Conf. on Very Large Data Bases, VLDB, pp. 426-435, 1997
  8. O. Egecioglu, 'Parametric Approximation Algorithms for High-Dimensional Euclidean Similarity In Proc. European Conf. On Principles of Data Mining and Knowledge Discovery, PKDD, pp. 79-90, Sep. 2001
  9. O. Egecioglu, H. Ferhatosmanoglu, and U. Ogras, 'Dimensionality Reduction and Similarity Computation by Inner Product Approximations,' In IEEE Trans. on Knowledge and Data Engineering, pp. 714-726, 2004 https://doi.org/10.1109/TKDE.2004.9
  10. H. Eidenberger, 'A New Method for Visual Descriptor Evaluation,' In Proc. SPIE Storage and Retrieval Methods and Applications for Multimedia, pp. 145-157, 2004 https://doi.org/10.1117/12.525068
  11. C. Faloutsos, R. Barber, M. Flickner, W. Niblack, D. Petkovic, and W. Equitz, 'Efficient and Effective Querying By Image Content,' In Journal of Intelligent Information Systems, Vol. 3 No. 3/4 pp. 231-262, Jul. 1994 https://doi.org/10.1007/BF00962238
  12. S. Jeong, S. Kim, K Kim, and B.-U. Choi, 'An Effective Method for Approximating the Euclidean Distance in High-Dimensional Space,' In Journal of the Institude if Electronics Engineers if Korea, Vol. 42-CI No. 5 pp. 69-78, 2005
  13. K. V. R. Kanth, D. Agrawal, and A. Singh, 'Dimensionality Reduction for Similarity Searching in Dynamic Databases,' In Proc. Int'l. Conf. on Management if Data, ACM SIGMOD, pp. 166-176, Jun. 1998 https://doi.org/10.1145/276304.276320
  14. N. Katayama and S. Satoh, 'The SR-Tree: An Index Structure for High-dimensional Nearest Neighbor Queries,' In Proc. Int'l Conf. on Management if Data, ACM SIGMOD, pp. 369-380, 1997 https://doi.org/10.1145/253262.253347
  15. S. Krishnamachari and M. Abdel-Mottaleb, 'Hierarchical Clustering Algorithm for Fast Image Retrieval,' In Proc. IS & T/SPIE Conf. On Storage and Retrieval for Image and Video Databases, pp. 427-435, Jan. 1999 https://doi.org/10.1117/12.333862
  16. K. Lin, H. Jagadish, and C. Faloutsos, 'The TV-Tree: An Index Structure for High Dimensional Data,' The VLDB Journal, Vol. 3, No.4, pp; 517-542, 1994 https://doi.org/10.1007/BF01231606
  17. A. Mertins, Signal Analysis, John Wiley & Sons, Inc., 2000
  18. T. K Moon and W. C. Stirling, Mathematical Methods and Algorithms for Signal Processing, Prentice-Hall, 2000
  19. W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, and P. Yanker, 'The QBIC Project: Querying Images by Content Using Color, Texture, and Shape,' In Proc. Int'l. Conf. Storage and Retrieval for Image and Video Databases, pp. 173-187, 1993 https://doi.org/10.1117/12.143648
  20. U. Ogras and H. Ferhatosmanoglu, 'Dimensionality Reduction Using Magnitude and Shape Approximations,' In Proc. Ini'l Conf. on Information and Knowledge Management, ACM CIKM, pp. 99-107, 2003 https://doi.org/10.1145/956863.956883
  21. B.-U. Pagel, H - W. Six, and M. Winter, 'Window Query-Optimal Clustering of Spatial Objects,' In Proc. Int'l. Conf. on Principals of Database Systems, pp. 86-94, 1995 https://doi.org/10.1145/212433.212458
  22. T. Seidl and H.-P. Kriegel, 'Efficient User-daptable Similarity Search in Large Multimedia Databases, In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp. 506-515, Aug. 1997
  23. T. Seidl and H.-P. Kriegel, 'Optimal Multi-Step k-Nearest Neighbor Search,' In Proc. Int'l. Conf. on Management of data, ACM SIGMOD, pp. 154-165, June 1998 https://doi.org/10.1145/276304.276319
  24. R. Weber, H. J. Schek, and S. Blott, 'A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,' In Proc. Int'l. Conf. on Very Large Data Bases, VLDB, pp. 194-205, Aug. 1998
  25. D. A. White and R. Jain, 'Similarity Indexing with the SS-tree,' In Proc. Int'l. Conf. on Data Engineering, IEEE, pp. 516-523, 1996 https://doi.org/10.1109/ICDE.1996.492202
  26. http://kdd.ics.uci.edu/databases/CorelFeatures/CorelFeatures.html