Locality-Sensitive Hashing Techniques for Nearest Neighbor Search

Lee, Keon Myung;

doi:10.5391/IJFIS.2012.12.4.300

International Journal of Fuzzy Logic and Intelligent Systems

Volume 12 Issue 4
/
Pages.300-307
/
2012
/
1598-2645(pISSN)
/
2093-744X(eISSN)

Korean Institute of Intelligent Systems (한국지능시스템학회)

DOI QR Code

Locality-Sensitive Hashing Techniques for Nearest Neighbor Search

Lee, Keon Myung (Dept of Computer Science and PT-ERC Chungbuk National University)

Received : 2012.12.01
Accepted : 2012.12.24
Published : 2012.12.25

https://doi.org/10.5391/IJFIS.2012.12.4.300 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

When the volume of data grows big, some simple tasks could become a significant concern. Nearest neighbor search is such a task which finds from a data set the k nearest data points to queries. Locality-sensitive hashing techniques have been developed for approximate but fast nearest neighbor search. This paper introduces the notion of locality-sensitive hashing and surveys the locality-sensitive hashing techniques. It categories them based on several criteria, presents their characteristics, and compares their performance.

Keywords

References

A. Andoni and P. Indyk, "Near-optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions," Comm. ACM, vol.51, no.1, pp.117-122, 2008. https://doi.org/10.1145/1327452.1327494
S. Baluja and M. Covell, "Learning Forgiving Hash Functions: Algorithms and Large Scale Tests," Proc. 20th Int. Joint Conf. on Artifical intelligence, pp. 2663-2669, 2007.
J. L. Bentley, "Multidimensional Binary Search Trees used for Associative Searching," Commun. Ass. Comput. Mach., vol. 19, pp. 509-517, 1975.
S. Boriah, V. Chandola, and V. Kumar, "Similarity Measures for Categorical Data: A Comparative Evaluation," Proc. of the 8th SIAM Int. Conf. on Data Mining, pp.243-254, 2008.
A. Z. Broder, "On the Resemblance and Containment of Documents," Proc. Compression and Complexity of Sequence, pp. 21-29, 1997.
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher, "Min-wise Independent Permutations," ACM Symposium on Theory of Computing, pp. 327-336, 1998.
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, "Locality-sensitive Hashing Scheme based on p-stable Distribution," Symp. on Computational Geometry, pp. 253-262, 2004.
D. G. Lowe, "Object recognition from local scaleinvariant features," Proc. of the Int.l Conf. on Computer Vision, vol.2. pp.1150-1157, 1999.
A. Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. of VLDB, 1999.
Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, "Iterative quantization: A Procrustean approach to learning binary codes for large-scale image retrieval," IEEE Trans. Pattern Anal. Mach. Intell., 2012.
A. Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching," Proc. of SIGMOD'84, 1984.
J. Hays and A. A. Efros, "Scene Completion Using Millions of Photographs," Proc. of SIGGRAPH, 2007.
J. He, W. Liu, and S.-F. Chang, "Scalable Similarity Search with Optimized Kernel Hashing," Proc. of IEEE Int. Conf. on Knowledge Discovery and Data Mining, pp.1129-1138 2010.
H. Henzinger, "Finding Nearest-Duplicate Web Pages: a Large-Scale Evaluation of Algorithms," Proc. of SIGIR, pp. 284-291, 2006.
J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon, "Spectral Hashing," Proc. of CVPR, 2012.
P. Indyk and R. Motwani, "Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality," Proc. of STOC, 1998.
Q. Jiang and M. Sun, "Semi-supervised Simhash for Efficient Document Similarity Search," Proc. The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.93-101, 2011.
W. Kong and W.-J. Li, "Isotropic Hashing," Proc. of NIPS2012, 2012.
W. Kong, W.-J. Li, and M. Guo, "Manhattan hashing for large-scale image retrieval," Proc. of SIGIR, 2012.
Y. Koren, "Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model," KDD, 2008.
A. Krizhevsky, V. Nair, and G. Hinton, The CIFAR-10 and CIFAR-100 Databases, http://www.cs.toronto.edu/ kriz/cifar.html.
B. Kulis and K. Grauman, "Kernelized Localitysensitive Hashing," Proc. of 12th Int. Conf. on Computer Vision, 2009.
B. Kulis and T. Barrell, "Learning to Hash with Binary Reconstructive Embeddings," Tech. Rep., UC Berkeley, 2009.
B. Kulis, P. Jain, and K. Grauman, "Fast Similarity Search for Learned Metrics," IEEE TPAMI, vol.31, no. 12, 2009.
Y. LeCun and C. Cortes, MNIST Database, http://yann. lecun.com/exdb/mnist/.
K. M. Lee and K.M. Lee, "A Locality Sensitive Hashing Technique for Categorical Data," Applied Mech. And Mat., 2013(to appear).
F.-F. Li, M. Andreetto, and M. A. Ranzato, Caltech 101 Database, http://www.vision.caltech.edu/ImageDatasets/Caltech101/.
Y. Lin, D. Cai, "Density Sensitive Hashing," ArXive-prints arXiv:1205.2930, 2012.
T.Liu, A. W. Moore, A. Gray, and K. Yang, "An Investigation of Practical Approximate Nearest Neighbor Algorithms," Proc. of NIPS, pp.825-832. 2005.
W. Liu, J. Wang, S. Kumar, and S.-F. Chang, "Hashing with Graphs," Proc. of Int. Conf. on Machine Learning, 2011.
U. von Luxburg, "A Tutorial on Spectral Clustering," Stat. Comput., vol.17, pp. 395-416, 2007. https://doi.org/10.1007/s11222-007-9033-z
U. Manber, "Finding Similar Files in a Large File System," Proc. USENIX Conference, pp. 1-10, 1994.
Y. Matsushita and T. Wada, "Principal Component Hashing: An Accelerated Approximate Nearest Neighbor Search," Proc. of PSIVT, 2009.
B. McFee and G. Lanckriet, "Large-Scale Music Similarity Search With Spatial Trees," Proc. of ISMIR, 2011.
G. A. Miller, R. Beckwith, C. D. Fellbaum, D. Gross, and K. Miller, "WordNet: An Online Lexical Database," Int. J. Lexicograph, vol.3, no.4, pp. 235-244, 1990. https://doi.org/10.1093/ijl/3.4.235
Y. Mu, J. Shen, and S. Yan, "Weakly-Supervised Hashing in Kernel Space," Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.3344-3351, 2010.
D. Nister and H. Stewenius, "Scalable Recognition with a Vocabulary Tree," Proc. CVPR , vol. 5, 2006.
M. Norouzi and D. J. Fleet, "Minimal Loss Hashing for Compact Binary Codes," Proc. of ICML, 2011.
A. Oliva, A. Torralba, "Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope," Int. J. of Computer Vision, vol.42,no.3, pp.145-175, 1989.
S. Omohundro, "Five balltree construction algorithms," Technical Report, ICSI, 1989.
S. Pandey, A. Broder, and F. Chierichetti, "Nearest-Neighbor Caching for Content-Match Applications," Proc. of WWW Conf., 2009.
M. Potthast and B. Stein, "New Issues in Near-Duplicate Detection," Data Analysis, Machine Learning and Applications, pp. 601-609, Springer, 2008.
M. Raginsky, and S. Lazebnik, "Locality-sensitive binary codes from shift-invariant kernels," Proc. of NIPS, 2009.
B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, LabelMe, http://labelme.csail.mit.edu/.
R. R. Salakhutdinov and G.E. Hinton, "Semantic hashing," Proc. of Int.l J. of Approximate Reasoning, vol.50, no.7, 2009.
R. E. Schapire, "The Boosting Approach to Machine Learning : An Overview," Nonlinear Estimation and Classification, Springer, 2003.
G. Shakhnarovich, P. Viola, and T. Darrell, "Fast Pose Estimation with Parameter Sensitive Hashing," Proc. ICCV, 2003.
B. Stein, S. M. Eissen, and M. Potthas, "Strategies for retrieving plagiarized documents," SIGIR, 2007.
C. Strecha, A. M. Bronstein, M. M. Bronstein, and P. Fua," LDAHash: Improved Matching with Smaller Descriptors," IEEE TPAMI, vol34, no.1, 2012.
M. Tata, T. Muto, M. Iwamura, and K. Kise, "Extension of Approximate Nearest Neighbor Search Based on Multi-Valued Expression on Closeness to General Distributions," DEIM Forum, 2010(in Japanese).
M. Theodbald, J. Siddhaarth, and A. Paepcke, "Spot-Sigs: robust and efficient near duplicate detection in large web collections," Proc. ACM SIGIR, Singapore, pp.563-570, 2008.
A. Torralba, R. Fergus, and Y. Weiss, "Small Codes and Large Image Databases for Recognition," Proc. of CVPR, pp.1-8, 2008.
A. Torralba, R. Fergus, and W. T. Freeman, 80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition," IEEE PAMI, vol.30, no.11, 2008.
J. K. Uhlmann, "Satisfying general proximity/ similarity queries with metric trees," Information Processing Letters,, vol.4, pp.175-179, 1991.
J.Wang, S. Kumar, and S.-F. Chang, "Sequential Projection Learning for Hashing with Compact Codes," Proc. of Int. Conf. on Machine Learning, 2010.
J. Wang, S. Kumar, and S.-F. Chang, "Semi-Supervised Hashing for Large Scale Search," IEEE PAMI, vol.34, no.12, 2012.
Y. Weiss, A. Torralba, and R. Fergus, "Spectral hashing," Proc. of Neural Information Processing Systems, pp.1753-1760, 2008.
H. Xu, J. Wang, Z. Li, G. Zeng, S. Le, and N. Yu, "Complementary Hashing for Approximate Nearest Neighbor Search," Proc. of IEEE Int. Conf. on Computer Vision, 2011.
D. Zhang, J. Wang, D. Cai, and J. Lu, "Self-taught hashing for fast similarity search," Proc. SIGIR, pp.18-25, 2010.
D. Zhang, J. Wang, D. Cai, and J. Lu, "Laplacian Cohashing of Terms and Documents," Proc. ECIR2010, LNCS, vol.5993, pp.577-580, 2010.

Cited by

Bucket-size balancing locality sensitive hashing using the map reduce paradigm pp.1573-7543, 2017, https://doi.org/10.1007/s10586-017-1013-2
MapReduce-based storage and indexing for big health data pp.15320626, 2018, https://doi.org/10.1002/cpe.4854

International Journal of Fuzzy Logic and Intelligent Systems

Locality-Sensitive Hashing Techniques for Nearest Neighbor Search

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)