• Title/Summary/Keyword: multiple hashing

Search Result 22, Processing Time 0.025 seconds

Fast Search with Data-Oriented Multi-Index Hashing for Multimedia Data

  • Ma, Yanping;Zou, Hailin;Xie, Hongtao;Su, Qingtang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.7
    • /
    • pp.2599-2613
    • /
    • 2015
  • Multi-index hashing (MIH) is the state-of-the-art method for indexing binary codes, as it di-vides long codes into substrings and builds multiple hash tables. However, MIH is based on the dataset codes uniform distribution assumption, and will lose efficiency in dealing with non-uniformly distributed codes. Besides, there are lots of results sharing the same Hamming distance to a query, which makes the distance measure ambiguous. In this paper, we propose a data-oriented multi-index hashing method (DOMIH). We first compute the covariance ma-trix of bits and learn adaptive projection vector for each binary substring. Instead of using substrings as direct indices into hash tables, we project them with corresponding projection vectors to generate new indices. With adaptive projection, the indices in each hash table are near uniformly distributed. Then with covariance matrix, we propose a ranking method for the binary codes. By assigning different bit-level weights to different bits, the returned bina-ry codes are ranked at a finer-grained binary code level. Experiments conducted on reference large scale datasets show that compared to MIH the time performance of DOMIH can be improved by 36.9%-87.4%, and the search accuracy can be improved by 22.2%. To pinpoint the potential of DOMIH, we further use near-duplicate image retrieval as examples to show the applications and the good performance of our method.

k-NN Join Based on LSH in Big Data Environment

  • Ji, Jiaqi;Chung, Yeongjee
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.2
    • /
    • pp.99-105
    • /
    • 2018
  • k-Nearest neighbor join (k-NN Join) is a computationally intensive algorithm that is designed to find k-nearest neighbors from a dataset S for every object in another dataset R. Most related studies on k-NN Join are based on single-computer operations. As the data dimensions and data volume increase, running the k-NN Join algorithm on a single computer cannot generate results quickly. To solve this scalability problem, we introduce the locality-sensitive hashing (LSH) k-NN Join algorithm implemented in Spark, an approach for high-dimensional big data. LSH is used to map similar data onto the same bucket, which can reduce the data search scope. In order to achieve parallel implementation of the algorithm on multiple computers, the Spark framework is used to accelerate the computation of distances between objects in a cluster. Results show that our proposed approach is fast and accurate for high-dimensional and big data.

Improving TCP Performance in Multipath Packet Forwarding Networks

  • Lee, Youngseok;Park, Ilkyu;Park, Yanghee
    • Journal of Communications and Networks
    • /
    • v.4 no.2
    • /
    • pp.148-157
    • /
    • 2002
  • This paper investigates schemes to improve TCP performance in multipath forwarding networks. In multipath routing, packets to the same destination are sent to multiple next-hops in either packet-level or flow-level forwarding mode. Effective bandwidth is increased since we can utilize unused capacity of multiple paths to the destination. In packet-level multipath forwarding networks, TCP performance may not be enhanced due to frequent out-of-order segment arrivals at the receiver because of different delays among paths. To overcome this problem, we propose simple TCP modifications. At the sender, the fast retransmission threshold is adjusted taking the number of paths into consideration. At the receiver, the delayed acknowledgment scheme is modified such that an acknowledgment for an out-of-order segment arrival is delayed in the same way for the in-order one. The number of unnecessary retransmissions and congestion window reductions is diminished, which is verified by extensive simulations. In flow-level multipath forwarding networks, hashing is used at routers to select outgoing link of a packet. Here, we show by simulations that TCP performance is increased in proportion to the number of paths regardless of delay differences.

Software Measurement by Analyzing Multiple Time-Series Patterns (다중 시계열 패턴 분석에 의한 소프트웨어 계측)

  • Kim Gye-Young
    • Journal of Internet Computing and Services
    • /
    • v.6 no.1
    • /
    • pp.105-114
    • /
    • 2005
  • This paper describes a new measuring technique by analysing multiple time-series patterns. This paper's goal is that extracts a really measured value having a sample pattern which is the best matched with an inputted time-series, and calculates a difference ratio with the value. Therefore, the proposed technique is not a recognition but a measurement. and not a hardware but a software. The proposed technique is consisted of three stages, initialization, learning and measurement. In the initialization stage, it decides weights of all parameters using importance given by an operator. In the learning stage, it classifies sample patterns using LBG and DTW algorithm, and then creates code sequences for all the patterns. In the measurement stage, it creates a code sequence for an inputted time-series pattern, finds samples having the same code sequence by hashing, and then selects the best matched sample. Finally it outputs the really measured value with the sample and the difference ratio. For the purpose of performance evaluation, we tested on multiple time-series patterns obtained from etching machine which is a semiconductor manufacturing.

  • PDF

Parellel Computation of the Shift Table of a Hashing-Based Algorithm for the Order-Preserving Multiple Pattern Matching (순위다중패턴매칭을 위한 해싱기반 알고리즘의 이동테이블 병렬계산)

  • Park, Jeonghoon;Kim, Youngho;Kwan, Sanghoon;Sim, Jeong Seop
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.36-39
    • /
    • 2017
  • 길이가 같은 두 문자열의 같은 위치에 있는 문자의 순위가 모두 일치할 때, 두 문자열은 순위동형이라 한다. 순위다중패턴매칭문제는 텍스트 T와 k개의 패턴들의 집합 $P^{\prime}=\{P_1,P_2{\ldots},P_k\}$이 주어졌을 때, P'의 패턴들과 순위동형인 T의 모든 부분문자열의 위치를 찾는 문제이다. 최근 전처리단계에서 P'에 대한 이동테이블을 O(kmqlogq) 시간에 계산하여 순위다중패턴매칭문제를 해결하는 해싱기반 알고리즘이 제시되었다. 이때 P'에서 가장 짧은 패턴의 길이를 m, q-그램의 길이를 q라고 한다. 본 논문에서는 P'이 주어졌을 때, 이동테이블을 O(mqlogq) 시간에 계산하는 병렬알고리즘을 제시한다. 실험결과, 본 논문에서 제시하는 병렬알고리즘은 k개의 스레드를 이용하여 m=100, q=5에 대해 k=100일때와 k=1,000일 때 순차알고리즘보다 각각 약 12.9배, 약 215배 빠른 수행시간을 보였다.

Two Predecessor Nongroup Cellular Automata (두 개의 직전자를 갖는 Nongroup CA)

  • Cho, Sung-Jin;Choi, Un-Sook;Kim, Han-Doo;Hwang, Yoon-Hee;Kim, Jin-Gyoung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.06a
    • /
    • pp.423-426
    • /
    • 2007
  • In this paper, we propose an algorithm for finding 90/150 Two Predecessor Nongroup Cellular Automata(TPNCA). Especially, we synthesize TPNCA for the minimal polynomial whose type is of the form xp(x) or x(x+1)p(x) using the proposed algorithm which is useful to study pseudorandom number generation, where p(x) is some primitive polynomial. Also we synthesize Two Predecessor Single Attractor CA and Two Predecessor Multiple Attractor CA which are useful to study hashing.

  • PDF

Multi-match Packet Classification Scheme Combining TCAM with an Algorithmic Approach

  • Lim, Hysook;Lee, Nara;Lee, Jungwon
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.1
    • /
    • pp.27-38
    • /
    • 2017
  • Packet classification is one of the essential functionalities of Internet routers in providing quality of service. Since the arrival rate of input packets can be tens-of-millions per second, wire-speed packet classification has become one of the most challenging tasks. While traditional packet classification only reports a single matching result, new network applications require multiple matching results. Ternary content-addressable memory (TCAM) has been adopted to solve the multi-match classification problem due to its ability to perform fast parallel matching. However, TCAM has a fundamental issue: high power dissipation. Since TCAM is designed for a single match, the applicability of TCAM to multi-match classification is limited. In this paper, we propose a cost- and energy-efficient multi-match classification architecture that combines TCAM with a tuple space search algorithm. The proposed solution uses two small TCAM modules and requires a single-cycle TCAM lookup, two SRAM accesses, and several Bloom filter query cycles for multi-match classifications.

OFPT: OpenFlow based Parallel Transport in Datacenters

  • Liu, Bo;XU, Bo;Hu, Chao;Hu, Hui;Chen, Ming
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.10
    • /
    • pp.4787-4807
    • /
    • 2016
  • Although the dense interconnection datacenter networks (DCNs) (e.g. FatTree) provide multiple paths and high bisection bandwidth for each server pair, the single-path TCP (SPT) and ECMP which are widely used currently neither achieve high bandwidth utilization nor have good load balancing. Due to only one available transmission path, SPT cannot make full use of all available bandwidth, while ECMP's random hashing results in many collisions. In this paper, we present OFPT, an OpenFlow based Parallel Transport framework, which integrates precise routing and scheduling for better load balancing and higher network throughput. By adopting OpenFlow based centralized control mechanism, OFPT computes the optimal path and bandwidth provision for each flow according to the global network view. To guarantee high throughput, OFPT dynamically schedules flows with Seamless Flow Migration Mechanism (SFMM), which can avoid packet loss in flow rerouting. Finally, we test OFPT on Mininet and implement it in a real testbed. The experimental results show that the average network throughput in OFPT is up to 97.5% of bisection bandwidth, which is higher than ECMP by 36%. Besides, OFPT decreases the average flow completion time (AFCT) and achieves better scalability.

An Efficient Multi-Layer Encryption Framework with Authentication for EHR in Mobile Crowd Computing

  • kumar, Rethina;Ganapathy, Gopinath;Kang, GeonUk
    • International journal of advanced smart convergence
    • /
    • v.8 no.2
    • /
    • pp.204-210
    • /
    • 2019
  • Mobile Crowd Computing is one of the most efficient and effective way to collect the Electronic health records and they are very intelligent in processing them. Mobile Crowd Computing can handle, analyze and process the huge volumes of Electronic Health Records (EHR) from the high-performance Cloud Environment. Electronic Health Records are very sensitive, so they need to be secured, authenticated and processed efficiently. However, security, privacy and authentication of Electronic health records(EHR) and Patient health records(PHR) in the Mobile Crowd Computing Environment have become a critical issue that restricts many healthcare services from using Crowd Computing services .Our proposed Efficient Multi-layer Encryption Framework(MLEF) applies a set of multiple security Algorithms to provide access control over integrity, confidentiality, privacy and authentication with cost efficient to the Electronic health records(HER)and Patient health records(PHR). Our system provides the efficient way to create an environment that is capable of capturing, storing, searching, sharing, analyzing and authenticating electronic healthcare records efficiently to provide right intervention to the right patient at the right time in the Mobile Crowd Computing Environment.

Metadata Management of a SAN-Based Linux Cluster File System (SAN 기반 리눅스 클러스터 파일 시스템을 위한 메타데이터 관리)

  • Kim, Shin-Woo;Park, Sung-Eun;Lee, Yong-Kyu;Kim, Gyoung-Bae;Shin, Bum-Joo
    • The KIPS Transactions:PartA
    • /
    • v.8A no.4
    • /
    • pp.367-374
    • /
    • 2001
  • Recently, LINUX cluster file systems based on the storage area network (SAN) have been developed. In those systems, without using a central file server, multiple clients sharing the whole disk storage through Fibre Channel can freely access disk storage and act as file servers. Accordingly, they can offer advantages such as availability, load balancing, and scalability. In this paper, we describe metadata management schemes designed for a new SAN-based LINUX cluster file system. First, we present a new inode structure which is better than previous ones in disk block access time. Second, a new directory structure which uses extendible hashing is described. Third, we describe a novel scheme to manage free disk blocks, which is suitable for very large file systems. Finally, we present how we handle metadata journaling. Through performance evaluation, we show that our proposed schemes have better performance than previous ones.

  • PDF