• Title/Summary/Keyword: Parallel Processing Architecture

Search Result 396, Processing Time 0.026 seconds

Design of Lightweight Parallel BCH Decoder for Sensor Network (센서네트워크 활용을 위한 경량 병렬 BCH 디코더 설계)

  • Choi, Won-Jung;Lee, Je-Hoon
    • Journal of Sensor Science and Technology
    • /
    • v.24 no.3
    • /
    • pp.188-193
    • /
    • 2015
  • This paper presents a new byte-wise BCH (4122, 4096, 2) decoder, which treats byte-wise parallel operations so as to enhance its throughput. In particular, we evaluate the parallel processing technique for the most time-consuming components such as syndrome generator and Chien search owing to the iterative operations. Even though a syndrome generator is based on the conventional LFSR architecture, it allows eight consecutive bit inputs in parallel and it treats them in a cycle. Thus, it can reduce the number of cycles that are needed. In addition, a Chien search eliminates the redundant operations to reduce the hardware complexity. The proposed BCH decoder is implemented with VHDL and it is verified using a Xilinx FPGA. From the simulation results, the proposed BCH decoder can enhance the throughput as 43% and it can reduce the hardware complexity as 67% compared to its counterpart employing parallel processing architecture.

A Functional Design of Programmable Logic Controller Based on Parallel Architecture (병렬 구조에 의한 가변 논리제어장치의 기능적 설계)

  • 이정훈;신현식
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.40 no.8
    • /
    • pp.836-844
    • /
    • 1991
  • PLC(programmable logic controller) system is widely used for the control of factory. PLC system receives ladder diagram which is drawn by the user to implement hardware logic, converts the ladder diagram into sequence program which is executable in the PLC system, and executes the sequence program indefinitely unless user breaks. The sequence program processes the data of on/off signal, and endures 1 scan delay and missing of pulse-type signal shorter than a scan time. So, data dependency doesn't exist. By applying theis characteristics to multiprocessor architecture, we design parellel PLC functionally and evaluate performance upgrade. Parallel PLC consists of central processing module, N general processing unit, and a shared memory by master-slave type. Each module executes allocated sequence program by the control of central processing module. We can expect performance upgrade by parallel processing, and reliability by relocation of sequence program when error occurs in processing module.

  • PDF

Parallel Processing of the Fuzzy Fingerprint Vault based on Geometric Hashing

  • Chae, Seung-Hoon;Lim, Sung-Jin;Bae, Sang-Hyun;Chung, Yong-Wha;Pan, Sung-Bum
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.4 no.6
    • /
    • pp.1294-1310
    • /
    • 2010
  • User authentication using fingerprint information provides convenience as well as strong security. However, serious problems may occur if fingerprint information stored for user authentication is used illegally by a different person since it cannot be changed freely as a password due to a limited number of fingers. Recently, research in fuzzy fingerprint vault system has been carried out actively to safely protect fingerprint information in a fingerprint authentication system. In addition, research to solve the fingerprint alignment problem by applying a geometric hashing technique has also been carried out. In this paper, we propose the hardware architecture for a geometric hashing based fuzzy fingerprint vault system that consists of the software module and hardware module. The hardware module performs the matching for the transformed minutiae in the enrollment hash table and verification hash table. On the other hand, the software module is responsible for hardware feature extraction. We also propose the hardware architecture which parallel processing technique is applied for high speed processing. Based on the experimental results, we confirmed that execution time for the proposed hardware architecture was 0.24 second when number of real minutiae was 36 and number of chaff minutiae was 200, whereas that of the software solution was 1.13 second. For the same condition, execution time of the hardware architecture which parallel processing technique was applied was 0.01 second. Note that the proposed hardware architecture can achieve a speed-up of close to 100 times compared to a software based solution.

A study on the Cost-effective Architecture Design of High-speed Soft-decision Viterbi Decoder for Multi-band OFDM Systems (Multi-band OFDM 시스템용 고속 연판정 비터비 디코더의 효율적인 하드웨어 구조 설계에 관한 연구)

  • Lee, Seong-Joo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.11 s.353
    • /
    • pp.90-97
    • /
    • 2006
  • In this paper, we present a cost-effective architecture of high-speed soft-decision Viterbi decoder for Multi-band OFDM(MB-OFDM) systems. In the design of modem for MB-OFDM systems, a parallel processing architecture is general]y used for the reliable hardware implementation, because the systems should support a very high-speed data rate of at most 480Mbps. A Viterbi decoder also should be designed by using a parallel processing structure and support a very high-speed data rate. Therefore, we present a optimized hardware architecture for 4-way parallel processing Viterbi decoder in this paper. In order to optimize the hardware of Viterbi decoder, we compare and analyze various ACS architectures and find the optimal one among them with respect to hardware complexity and operating frequency The Viterbi decoder with a optimal hardware architecture is designed and verified by using Verilog HDL, and synthesized into gate-level circuits with TSMC 0.13um library. In the synthesis results, we find that the Viterbi decoder contains about 280K gates and works properly at the speed required in MB-OFDM systems.

Architecture design for speeding up Multi-Access Memory System(MAMS) (Multi-Access Memory System(MAMS)의 속도 향상을 위한 아키텍처 설계)

  • Ko, Kyung-sik;Kim, Jae Hee;Lee, S-Ra-El;Park, Jong Won
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.6
    • /
    • pp.55-64
    • /
    • 2017
  • High-capacity, high-definition image applications need to process considerable amounts of data at high speed. Accordingly, users of these applications demand a high-speed parallel execution system. To increase the speed of a parallel execution system, Park (2004) proposed a technique, called MAMS (Multi-Access Memory System), to access data in several execution units without the conflict of parallel processing memories. Since then, many studies on MAMS have been conducted, furthering the technique to MAMS-PP16 and MAMS-PP64, among others. As a memory architecture for parallel processing, MAMS must be constructed in one chip; therefore, a method to achieve the identical functionality as the existing MAMS while minimizing the architecture needs to be studied. This study proposes a method of miniaturizing the MAMS architecture in which the architectures of the ACR (Address Calculation and Routing) circuit and MMS (Memory Module Selection) circuit, which deliver data in memories to parallel execution units (PEs), do not use the MMS circuit, but are constructed as one shift and conditional statements whose number is the same as that of memory modules inside the ACR circuit. To verify the performance of the realized architecture, the study conducted the processing time of the proposed MAMS-PP64 through an image correlation test, the results of which demonstrated that the ratio of the image correlation from the proposed architecture was improved by 1.05 on average.

Parallel Evaluation of Linearly Recursive Rules using a Shared-Nothing Paralled Architecture (비공유 병렬구조를 이용한 선형적 재귀규칙의 병렬평가)

  • Cho, Woo-Hyun;Kim, Hang-Joon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.12
    • /
    • pp.3069-3077
    • /
    • 1997
  • This paper is concerned with a new paradigm for parallel evaluation of linear recursion rules which contain transitive dependency in a shared-nothing parallel architecture. For parallel evaluation of rules, we consider a shared-nothing parallel architecture that consists of a set of nodes and a message passing network to these nodes. An evaluation of normalized rules is a computation of the proof theoretic meaning of a collection of rules. We shall here define normalized recursion rules which contain transitive dependency, present an equivalent expression for the rule, propose a paradigm for Parallel evaluation of normalized rule based on the equivalent expression using join, partition, and transitive closure operations, and analyze response-time complexity.

  • PDF

Parallel Implementation of the Recursive Least Square for Hyperspectral Image Compression on GPUs

  • Li, Changguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3543-3557
    • /
    • 2017
  • Compression is a very important technique for remotely sensed hyperspectral images. The lossless compression based on the recursive least square (RLS), which eliminates hyperspectral images' redundancy using both spatial and spectral correlations, is an extremely powerful tool for this purpose, but the relatively high computational complexity limits its application to time-critical scenarios. In order to improve the computational efficiency of the algorithm, we optimize its serial version and develop a new parallel implementation on graphics processing units (GPUs). Namely, an optimized recursive least square based on optimal number of prediction bands is introduced firstly. Then we use this approach as a case study to illustrate the advantages and potential challenges of applying GPU parallel optimization principles to the considered problem. The proposed parallel method properly exploits the low-level architecture of GPUs and has been carried out using the compute unified device architecture (CUDA). The GPU parallel implementation is compared with the serial implementation on CPU. Experimental results indicate remarkable acceleration factors and real-time performance, while retaining exactly the same bit rate with regard to the serial version of the compressor.

Implementation of Pixel Subword Parallel Processing Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 픽셀 서브워드 병렬처리 명령어 구현)

  • Jung, Yong-Bum;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.3
    • /
    • pp.99-108
    • /
    • 2011
  • Processor technology is currently continued to parallel processing techniques, not by only increasing clock frequency of a single processor due to the high technology cost and power consumption. In this paper, a SIMD (Single Instruction Multiple Data) based parallel processor is introduced that efficiently processes massive data inherent in multimedia. In addition, this paper proposes pixel subword parallel processing instructions for the SIMD parallel processor architecture that efficiently operate on the image and video pixels. The proposed pixel subword parallel processing instructions store and process four 8-bit pixels on the partitioned four 12-bit registers in a 48-bit datapath architecture. This solves the overflow problem inherent in existing multimedia extensions and reduces the use of many packing/unpacking instructions. Experimental results using the same SIMD-based parallel processor architecture indicate that the proposed pixel subword parallel processing instructions achieve a speedup of $2.3{\times}$ over the baseline SIMD array performance. This is in contrast to MMX-type instructions (a representative Intel multimedia extension), which achieve a speedup of only $1.4{\times}$ over the same baseline SIMD array performance. In addition, the proposed instructions achieve $2.5{\times}$ better energy efficiency than the baseline program, while MMX-type instructions achieve only $1.8{\times}$ better energy efficiency than the baseline program.

Novel Parallel Approach for SIFT Algorithm Implementation

  • Le, Tran Su;Lee, Jong-Soo
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.298-306
    • /
    • 2013
  • The scale invariant feature transform (SIFT) is an effective algorithm used in object recognition, panorama stitching, and image matching. However, due to its complexity, real-time processing is difficult to achieve with current software approaches. The increasing availability of parallel computers makes parallelizing these tasks an attractive approach. This paper proposes a novel parallel approach for SIFT algorithm implementation using a block filtering technique in a Gaussian convolution process on the SIMD Pixel Processor. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and input/output capabilities of the processor, which results in a system that can perform real-time image and video compression. We apply this implementation to images and measure the effectiveness of such an approach. Experimental simulation results indicate that the proposed method is capable of real-time applications, and the result of our parallel approach is outstanding in terms of the processing performance.

Implementation of Reed-Solomon Decoder Using the efficient Modified Euclid Module (효율적 구조의 수정 유클리드 구조를 이용한 Reed-Solomon 복호기의 설계)

  • Kim, Dong-Sun;Chung, Duck-Jin
    • Proceedings of the KIEE Conference
    • /
    • 1998.11b
    • /
    • pp.575-578
    • /
    • 1998
  • In this paper, we propose a VLSI architecture of Reed-Solomon decoder. Our goal is the development of an architecture featuring parallel and pipelined processing to improve the speed and low power design. To achieve the this goal, we analyze the RS decoding algorithm to be used parallel and pipelined processing efficiently, and modified the Euclid's algorithm arithmetic part to apply the parallel structure in RS decoder. The overall RS decoder are compared to Shao's, and we show the 10% area efficiency than Shao's time domain decoder and three times faster, in addition, we approve the proposed RS decoders with Altera FPGA Flex 10K-50, and Implemeted with LG 0.6{\mu}$ processing.

  • PDF