• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.036 seconds

A Memory Intensive Real-time 3x3 Neighborhood processor for Image Processing (Memory Intensive 실시간 영상신호처리용 3 $\times$ 3 Neighborhood VLSI 처리기)

  • 김진홍;남철우;우성일;김용태
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.6
    • /
    • pp.963-971
    • /
    • 1990
  • This paper proposes a memory intensive VLSI architecture for the realization of real-time 3x3 neighborhood processor based on the distributed arithmetic. The proposed architecture is characterized by a bit serial and multi-kernel parallel processing which exploits the pixel kernel parallelism and concurrency. The chip implements 8 neighborhood processing elements in parallel with efficirnt input and output modules which operate concurrently. Besides the a4chitectural design of a neighborhood processor, the design methodology using module generator concept has been considered and MOGOT(MOdule Generator Oriented VLSI design Tool) has been constructed based on the workstation. Based on these design environments MOGOT, it has been shown that the main part of the suggested architecture can be designed efficiently using 2\ulcorner double metal CMOS technology. It includes design of input delay and data conversion module, look-up table for inner product operation, carry save accumulator, output data converter and delay module, and control module.

  • PDF

A Parallel Processing Method for Partial Nodes in R*-tree Using GPU (GPU를 활용한 R*-tree에서의 부분 노드 병렬 처리 방법)

  • Kim, Seong;Oh, Byoung-Woo
    • Spatial Information Research
    • /
    • v.20 no.6
    • /
    • pp.139-144
    • /
    • 2012
  • The R*-tree manages hierarchical nodes for efficient access of spatial data. We propose a method that maintains partial nodes of R*-tree in the GPU memory to improve efficiency using parallel processing. The proposed method attempts to load as many nodes as possible to the GPU memory. The new nodes are inserted to manage the rest of R*-tree nodes in the main memory. The experimental result shows that the proposed method is more efficient than the main memory based R*-tree.

Hardware Design and Implementation of a Parallel Processor for High-Performance Multimedia Processing (고성능 멀티미디어 처리용 병렬프로세서 하드웨어 설계 및 구현)

  • Kim, Yong-Min;Hwang, Chul-Hee;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.5
    • /
    • pp.1-11
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements (PEs) and operates on a 3-stage pipelining. Experimental results indicated that the proposed parallel processor outperforms conventional parallel processors in terms of performance. In addition, our proposed parallel processor outperforms commercial high-performance TI C6416 DSP in terms of performance (1.4-31.4x better) and energy efficiency (5.9-8.1x better) with same 130nm technology and 720 clock frequency. The proposed parallel processor was developed with verilog HDL and verified with a FPGA prototype system.

High Throughput Parallel KMP Algorithm Considering CPU-GPU Memory Hierarchy (CPU-GPU 메모리 계층을 고려한 고처리율 병렬 KMP 알고리즘)

  • Park, Soeun;Kim, Daehee;Lee, Myungho;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.5
    • /
    • pp.656-662
    • /
    • 2018
  • Pattern matching algorithm is widely used in many application fields such as bio-informatics, intrusion detection, etc. Among many string matching algorithms, KMP (Knuth-Morris-Pratt) algorithm is commonly used because of its fast execution time when using large texts. However, the processing speed of KMP algorithm is also limited when the text size increases significantly. In this paper, we propose a high throughput parallel KMP algorithm considering CPU-GPU memory hierarchy based on OpenCL in GPGPU (General Purpose computing on Graphic Processing Unit). We focus on the optimization for the allocation of work-times and work-groups, the local memory copy of the pattern data and the failure table, and the overlapping of the data transfer with the string matching operations. The experimental results show that the execution time of the optimized parallel KMP algorithm is about 3.6 times faster than that of the non-optimized parallel KMP algorithm.

Efficient Quantitative Association Rules with Parallel Processing (병렬처리를 이용한 효율적인 수량 연관규칙)

  • Lee, Hye-Jung;Hong, Min;Park, Doo-Soon
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.8
    • /
    • pp.945-957
    • /
    • 2007
  • Quantitative association rules apply a binary association to the data which have the relatively strong quantitative attributions in a large database system. When a domain range of quantitative data which involve the significant meanings for the association is too broad, a domain requires to be divided into a proper interval which satisfies the minimum support for the generation of large interval items. The reliability of formulated rules is enormously influenced by the generation of large interval items. Therefore, this paper proposes a new method to efficiently generate the large interval items. The proposed method does not lose any meaningful intervals compared to other existing methods, provides the accurate large interval items which are close to the minimum support, and minimizes the loss of characteristics of data. In addition, since our method merges data where the frequency of data is high enough, it provides the fast run time compared with other methods for the broad quantitative domain. To verify the superiority of proposed method, the real national census data are used for the performance analysis and a Clunix HPC system is used for the parallel processing.

  • PDF

An Iterative Algorithm for the Bottom Up Computation of the Data Cube using MapReduce (맵리듀스를 이용한 데이터 큐브의 상향식 계산을 위한 반복적 알고리즘)

  • Lee, Suan;Jo, Sunhwa;Kim, Jinho
    • Journal of Information Technology and Architecture
    • /
    • v.9 no.4
    • /
    • pp.455-464
    • /
    • 2012
  • Due to the recent data explosion, methods which can meet the requirement of large data analysis has been studying. This paper proposes MRIterativeBUC algorithm which enables efficient computation of large data cube by distributed parallel processing with MapReduce framework. MRIterativeBUC algorithm is developed for efficient iterative operation of the BUC method with MapReduce, and overcomes the limitations about the storage size and processing ability caused by large data cube computation. It employs the idea from the iceberg cube which computes only the interesting aspect of analysts and the distributed parallel process of cube computation by partitioning and sorting. Thus, it reduces data emission so that it can reduce network overload, processing amount on each node, and eventually the cube computation cost. The bottom-up cube computation and iterative algorithm using MapReduce, proposed in this paper, can be expanded in various way, and will make full use of many applications.

Design and Implementation of a Latency Efficient Encoder for LTE Systems

  • Hwang, Soo-Yun;Kim, Dae-Ho;Jhang, Kyoung-Son
    • ETRI Journal
    • /
    • v.32 no.4
    • /
    • pp.493-502
    • /
    • 2010
  • The operation time of an encoder is one of the critical implementation issues for satisfying the timing requirements of Long Term Evolution (LTE) systems because the encoder is based on binary operations. In this paper, we propose a design and implementation of a latency efficient encoder for LTE systems. By virtue of 8-bit parallel processing of the cyclic redundancy checking attachment, code block (CB) segmentation, and a parallel processor, we are able to construct engines for turbo codings and rate matchings of each CB in a parallel fashion. Experimental results illustrate that although the total area and clock period of the proposed scheme are 19% and 6% larger than those of a conventional method based on a serial scheme, respectively, our parallel structure decreases the latency by about 32% to 65% compared with a serial structure. In particular, our approach is more latency efficient when the encoder processes a number of CBs. In addition, we apply the proposed scheme to a real system based on LTE, so that the timing requirement for ACK/NACK transmission is met by employing the encoder based on the parallel structure.

Development of Parallel Signal Processing Algorithm for FMCW LiDAR based on FPGA (FPGA 고속병렬처리 구조의 FMCW LiDAR 신호처리 알고리즘 개발)

  • Jong-Heon Lee;Ji-Eun Choi;Jong-Pil La
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.2
    • /
    • pp.335-343
    • /
    • 2024
  • Real-time target signal processing techniques for FMCW LiDAR are described in this paper. FMCW LiDAR is gaining attention as the next-generation LiDAR for self-driving cars because of its detection robustness even in adverse environmental conditions such as rain, snow and fog etc. in addition to its long range measurement capability. The hardware architecture which is required for high-speed data acquisition, data transfer, and parallel signal processing for frequency-domain signal processing is described in this article. Fourier transformation of the acquired time-domain signal is implemented on FPGA in real time. The paper also details the C-FAR algorithm for ensuring robust target detection from the transformed target spectrum. This paper elaborates on enhancing frequency measurement resolution from the target spectrum and converting them into range and velocity data. The 3D image was generated and displayed using the 2D scanner position and target distance data. Real-time target signal processing and high-resolution image acquisition capability of FMCW LiDAR by using the proposed parallel signal processing algorithms based on FPGA architecture are verified in this paper.

Frequency-Code Domain Contention in Multi-antenna Multicarrier Wireless Networks

  • Lv, Shaohe;Zhang, Yiwei;Li, Wen;Lu, Yong;Dong, Xuan;Wang, Xiaodong;Zhou, Xingming
    • Journal of Communications and Networks
    • /
    • v.18 no.2
    • /
    • pp.218-226
    • /
    • 2016
  • Coordination among users is an inevitable but time-consuming operation in wireless networks. It severely limit the system performance when the data rate is high. We present FC-MAC, a novel MAC protocol that can complete a contention within one contention slot over a joint frequency-code domain. When a node takes part in the contention, it generates randomly a contention vector (CV), which is a binary sequence of length equal to the number of available orthogonal frequency division multiplexing (OFDM) subcarriers. In FC-MAC, different user is assigned with a distinct signature (i.e., PN sequence). A node sends the signature at specific subcarriers and uses the sequence of the ON/OFF states of all subcarriers to indicate the chosen CV. Meanwhile, every node uses the redundant antennas to detect the CVs of other nodes. The node with the minimum CV becomes the winner. The experimental results show that, the collision probability of FC-MAC is as low as 0.05% when the network has 100 nodes. In comparison with IEEE 802.11, contention time is reduced by 50-80% and the throughput gain is up to 200%.