• Title/Summary/Keyword: k Approximate Nearest Neighbor Search

Search Result 8, Processing Time 0.022 seconds

An Approximate k-Nearest Neighbor Search Algorithm for Content- Based Multimedia Information Retrieval (내용 기반 멀티미디어 정보 검색을 위한 근사 k-최근접 데이타 탐색 알고리즘)

  • Song, Kwang-Taek;Chang, Jae-Woo
    • Journal of KIISE:Databases
    • /
    • v.27 no.2
    • /
    • pp.199-208
    • /
    • 2000
  • The k-nearest neighbor search query based on similarity is very important for content-based multimedia information retrieval(MIR). The conventional exact k-nearest neighbor search algorithm is not efficient for the MIR application because multimedia data should be represented as high dimensional feature vectors. Thus, an approximate k-nearest neighbor search algorithm is required for the MIR applications because the performance increase may outweigh the drawback of receiving approximate results. For this, we propose a new approximate k-nearest neighbor search algorithm for high dimensional data. In addition, the comparison of the conventional algorithm with our approximate k-nearest neighbor search algorithm is performed in terms of retrieval performance. Results show that our algorithm is more efficient than the conventional ones.

  • PDF

Locality-Sensitive Hashing Techniques for Nearest Neighbor Search

  • Lee, Keon Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.4
    • /
    • pp.300-307
    • /
    • 2012
  • When the volume of data grows big, some simple tasks could become a significant concern. Nearest neighbor search is such a task which finds from a data set the k nearest data points to queries. Locality-sensitive hashing techniques have been developed for approximate but fast nearest neighbor search. This paper introduces the notion of locality-sensitive hashing and surveys the locality-sensitive hashing techniques. It categories them based on several criteria, presents their characteristics, and compares their performance.

Optimization of Warp-wide CUDA Implementation for Parallel Shifted Sort Algorithm (병렬 Shifted Sort 알고리즘의 Warp 단위 CUDA 구현 최적화)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.18 no.4
    • /
    • pp.739-745
    • /
    • 2017
  • This paper presents and discusses an implementation of the GPU shifted sorting method to find approximate k nearest neighbors which executes within "warp", the minimum execution unit in GPU parallel architecture. Also, this paper presents the comparison results with other two common nearest neighbor searching methods, GPU-based kd-tree and ANN (Approximate Nearest Neighbor) library. The proposed implementation focuses on the cases when k is small, i.e. 2, 4, 8, and 16, which are handled efficiently within warp to consider it is very common for applications to handle small k's. Also, this paper discusses optimization ways to implementation by improving memory management in a loop for the CUB open library and adopting CUDA commands which are supported by GPU hardware. The proposed implementation shows more than 16-fold speed-up against GPU-based other methods in the tests, implying that the improvement would become higher for more larger input data.

Approximate k-Nearest Neighbor Search Algorithms for Content-Based Retrieval of Multimedia Data (대용량 멀티미디어 데이터의 내용-기반 검색을 위한 근사 k-최근접 데이터 탐색 알고리즘)

  • 송광택;심춘보;장재우
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10b
    • /
    • pp.256-258
    • /
    • 1998
  • 대용량의 멀티미디어 자료를 기반으로 하는 내용-기반 멀티미디어 검색 시스템에서 k-최근접 탐색 질의는 사용자의 매우 중요한 검색 질의 중에 하나이다. 하지만, 방대한 양의 멀티미디어 데이터베이스를 기반으로하는 경우에는 적중 에러 없는 정확(exact) k-최근접 데이터 탐색을 위해서 상당히 많은 디스크 접근 횟수가 요구된다. 본 논문에서는 X-트리에서의 정확 k-최근접 탐색 질의를 개선하고, 또한 사용자의 빠른 검색 성능을 위해 다소의 적중 에러는 허용한다 하더라도 디스크 접근 횟수를 줄이는 근사(approximate) k-최근접 탐색 알고리즘을 제안한다.

An Improvement in K-NN Graph Construction using re-grouping with Locality Sensitive Hashing on MapReduce (MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선)

  • Lee, Inhoe;Oh, Hyesung;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.681-688
    • /
    • 2015
  • The k nearest neighbor (k-NN) graph construction is an important operation with many web-related applications, including collaborative filtering, similarity search, and many others in data mining and machine learning. Despite its many elegant properties, the brute force k-NN graph construction method has a computational complexity of $O(n^2)$, which is prohibitive for large scale data sets. Thus, (Key, Value)-based distributed framework, MapReduce, is gaining increasingly widespread use in Locality Sensitive Hashing which is efficient for high-dimension and sparse data. Based on the two-stage strategy, we engage the locality sensitive hashing technique to divide users into small subsets, and then calculate similarity between pairs in the small subsets using a brute force method on MapReduce. Specifically, generating a candidate group stage is important since brute-force calculation is performed in the following step. However, existing methods do not prevent large candidate groups. In this paper, we proposed an efficient algorithm for approximate k-NN graph construction by regrouping candidate groups. Experimental results show that our approach is more effective than existing methods in terms of graph accuracy and scan rate.

Performance Enhancement of a DVA-tree by the Independent Vector Approximation (독립적인 벡터 근사에 의한 분산 벡터 근사 트리의 성능 강화)

  • Choi, Hyun-Hwa;Lee, Kyu-Chul
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.151-160
    • /
    • 2012
  • Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.

Analysis of GPU-based Parallel Shifted Sort Algorithm by comparing with General GPU-based Tree Traversal (일반적인 GPU 트리 탐색과의 비교실험을 통한 GPU 기반 병렬 Shifted Sort 알고리즘 분석)

  • Kim, Heesu;Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.18 no.6
    • /
    • pp.1151-1156
    • /
    • 2017
  • It is common to achieve lower performance in traversing tree data structures in GPU than one expects. In this paper, we analyze the reason of lower-than-expected performance in GPU tree traversal and present that the warp divergences is caused by the branch instructions ("if${\ldots}$ else") which appear commonly in tree traversal CUDA codes. Also, we compare the parallel shifted sort algorithm which can reduce the number of warp divergences with a kd-tree CUDA implementation to show that the shifted sort algorithm can work faster than the kd-tree CUDA implementation thanks to less warp divergences. As the analysis result, the shifted sort algorithm worked about 16-fold faster than the kd-tree CUDA implementation for $2^{23}$ query points and $2^{23}$ data points in $R^3$ space. The performance gaps tend to increase in proportion to the number of query points and data points.

Automatic Selection of Similar Sentences for Teaching Writing in Elementary School (초등 글쓰기 교육을 위한 유사 문장 자동 선별)

  • Park, Youngki
    • Journal of The Korean Association of Information Education
    • /
    • v.20 no.4
    • /
    • pp.333-340
    • /
    • 2016
  • When elementary students write their own sentences, it is often educationally beneficial to compare them with other people's similar sentences. However, it is impractical for use in most classrooms, because it is burdensome for teachers to look up all of the sentences written by students. To cope with this problem, we propose a novel approach for automatic selection of similar sentences based on a three-step process: 1) extracting the subword units from the word-level sentences, 2) training the model with the encoder-decoder architecture, and 3) using the approximate k-nearest neighbor search algorithm to find the similar sentences. Experimental results show that the proposed approach achieves the accuracy of 75% for our test data.