• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.028 seconds

A Novel Reconfigurable Processor Using Dynamically Partitioned SIMD for Multimedia Applications

  • Lyuh, Chun-Gi;Suk, Jung-Hee;Chun, Ik-Jae;Roh, Tae-Moon
    • ETRI Journal
    • /
    • v.31 no.6
    • /
    • pp.709-716
    • /
    • 2009
  • In this paper, we propose a novel reconfigurable processor using dynamically partitioned single-instruction multiple-data (DP-SIMD) which is able to process multimedia data. The SIMD processor and parallel SIMD (P-SIMD) processor, which is composed of a number of SIMD processors, are usually used these days. But these processors are inefficient because all processing units (PUs) should process the same operations all the time. Moreover, the PUs can process different operations only when every SIMD group operation is predefined. We propose a processor control method which can partition parallel processors into multiple SIMD-based processors dynamically to enhance efficiency. For performance evaluation of the proposed method, we carried out the inverse transform, inverse quantization, and motion compensation operations of H.264 using processors based on SIMD, P-SIMD, and DP-SIMD. Experimental results show that the DP-SIMD control method is more efficient than SIMD and P-SIMD control methods by about 15% and 14%, respectively.

A Study on Parallel Processing by Multi-Microprocessors (마이크로프로세서복합에 의한 병렬처리에 관한 연구)

  • Chung, Yon-Tack;Song, Young-Jae
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.17 no.5
    • /
    • pp.36-42
    • /
    • 1980
  • In this study, multi-microprocessors system in which slave microprocessor is conrlected with master microprocessor bus through the DMA controller is designed by the use of four 8085 CPU. A high degree of processing efficiency could be obtained by making this system work parallel processing. The result of measuring relat ions bet ween working microproressor and system throughput was 70-80 percents lower than ideal value Master microprocessor takes charge of resource allocation and scheduling, common memory assigns communication between microprocessors and a store of common data. The met hod of detecting Pa rallelism from source Program composed by series is also suggested.

  • PDF

An FPGA Implementation of Parallel Hardware Architecture for the Real-time Window-based Image Processing (실시간 윈도우 기반 영상 처리를 위한 병렬 하드웨어 구조의 FPGA 구현)

  • Jin S.H.;Cho J.U.;Kwon K.H.;Jeon J.W.
    • The KIPS Transactions:PartB
    • /
    • v.13B no.3 s.106
    • /
    • pp.223-230
    • /
    • 2006
  • A window-based image processing is an elementary part of image processing area. Because window-based image processing is computationally intensive and data intensive, it is hard to perform ail of the operations of a window-based image processing in real-time by using a software program on general-purpose computers. This paper proposes a parallel hardware architecture that can perform a window-based image processing in real-time using FPGA(Field Programmable Gate Array). A dynamic threshold circuit and a local histogram equalization circuit of the proposed architecture are designed using VHDL(VHSIC Hardware Description Language) and implemented with an FPGA. The performances of both implementations are measured.

A Study of How to Improve Execution Speed of Grabcut Using GPGPU (GPGPU를 이용한 Grabcut의 수행 속도 개선 방법에 관한 연구)

  • Kim, Ji-Hoon;Park, Young-Soo;Lee, Sang-Hun
    • Journal of Digital Convergence
    • /
    • v.12 no.11
    • /
    • pp.379-386
    • /
    • 2014
  • In this paper, the processing speed of Grabcut algorithm in order to efficiently improve the GPU (Graphics Processing Unit) for processing the data from the method. Grabcut algorithm has excellent performance object detection algorithm. Grabcut existing algorithms to split the foreground area and the background area, and then background and foreground K-cluster is assigned a cluster. And assigned to gradually improve the results, until the process is repeated. But Drawback of Grabcut algorithm is the time consumption caused by the repetition of clustering. Thus GPGPU (General-Purpose computing on Graphics Processing Unit) using the repeated operations in parallel by processing Grabcut algorithm to effectively improve the processing speed of the method. We proposed method of execution time of the algorithm reduced the average of about 95.58%.

Hadoop System Design for Big data Processing of RFID Distribution (RFID/NFC 물류의 빅 데이터 처리를 위한 하둡 시스템의 설계)

  • Kim, Nam-Ho;Noh, Jin-Heon;Jeong, Hee-Ja
    • Smart Media Journal
    • /
    • v.2 no.3
    • /
    • pp.47-53
    • /
    • 2013
  • Recently convergence of IT in logistics system as a typical application RFID/NFC technology is being used, such as, according to the distribution of the flow is generated by a lot of big data. The Hadoop distributed system to collect data items produced by the parallel processing capabilities of logistics information and logistics information for the record management can create. Hadoop system to support the design and development of prototypes were approaching the possibility of its utilization.

  • PDF

SQL Data Transport Technique for Efficient Hybrid Data Processing on Distributed and Parallel Environment (분산 병렬 환경에서 효율적인 이종 데이터 처리를 위한 SQL 데이터 전송 기법)

  • Yang, HyeonSik;Baek, Naeun;Sung, Mirae;Chang, Jae-woo
    • Annual Conference of KIPS
    • /
    • 2015.10a
    • /
    • pp.1102-1105
    • /
    • 2015
  • 인터넷 발전이 가속화되고 SNS가 보급된 이후 과거와는 비교할 수 없을 정도로 큰 데이터 트래픽이 발생하고 있다. 기존의 DBMS는 이를 효과적으로 처리할 수 없었기 때문에 Hadoop과 같은 NoSQL이 탄생하였고, 최근 NoSQL 및 기존 SQL DBMS의 협업을 통해 유연하고 강력한 데이터 관리를 수행하는 연구가 진행되었다. 효율적인 질의 처리를 위한 대표적인 연구로 SQL 기반 분산 병렬 질의 처리 기법과 Hive등이 존재한다. 그러나 기존의 기법은 분산 병렬 환경을 고려하지 않아 SQL DBMS의 질의 결과를 효율적으로 Hive에 전송하지 못한다. 본 논문에서는 SQL DBMS에서 Hive로의 효율적인 SQL 데이터 이동을 위해 네트워크 비용을 최소화하는 기법을 제안하고, 제안하는 기법의 우수성을 제시한다.

Parallel LDPC Decoder for CMMB on CPU and GPU Using OpenCL (OpenCL을 활용한 CPU와 GPU 에서의 CMMB LDPC 복호기 병렬화)

  • Park, Joo-Yul;Hong, Jung-Hyun;Chung, Ki-Seok
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.6
    • /
    • pp.325-334
    • /
    • 2016
  • Recently, Open Computing Language (OpenCL) has been proposed to provide a framework that supports heterogeneous computing platforms. By using an OpenCL framework, digital communication systems can support various protocols in a unified computing environment to achieve both high portability and high performance. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes for China Multimedia Mobile Broadcasting (CMMB) on a heterogeneous platform. Each step of LDPC decoding has different parallelization characteristics. In this paper, steps suitable for task-level parallelization are executed on the CPU, and steps suitable for data-level parallelization are processed by the GPU. To improve the performance of the proposed OpenCL kernels for LDPC decoding operations, explicit thread scheduling, loop-unrolling, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance by using heterogeneous multi-core processors on a unified computing framework.

Design of a 20 Gb/s CMOS Demultiplexer Using Redundant Multi-Valued Logic (중복 다치논리를 이용한 20 Gb/s CMOS 디멀티플렉서 설계)

  • Kim, Jeong-Beom
    • The KIPS Transactions:PartA
    • /
    • v.15A no.3
    • /
    • pp.135-140
    • /
    • 2008
  • This paper describes a high-speed CMOS demultiplexer using redundant multi-valued logic (RMVL). The proposed circuit receives serial binary data and is converted to parallel redundant multi-valued data using RMVL. The converted data are reconverted to parallel binary data. By the redundant multi-valued data conversion, the RMVL makes it possible to achieve higher operating speeds than that of a conventional binary logic. The implemented demultiplexer consists of eight integrators. Each integrator is composed of an accumulator, a window comparator, a decoder and a D flip flop. The demultiplexer is designed with TSMC $0.18{\mu}m$ standard CMOS process. The validity and effectiveness are verified through the HSPICE simulation. The demultiplexer is achieved the maximum data rate of 20 Gb/s and the average power consumption of 95.85 mW.

Design of Trajectory Data Indexing and Query Processing for Real-Time LBS in MapReduce Environments (MapReduce 환경에서의 실시간 LBS를 위한 이동궤적 데이터 색인 및 검색 시스템 설계)

  • Chung, Jaehwa
    • Journal of Digital Contents Society
    • /
    • v.14 no.3
    • /
    • pp.313-321
    • /
    • 2013
  • In recent, proliferation of mobile smart devices have led to big-data era, the importance of location-based services is increasing due to the exponential growth of trajectory related data. In order to process trajectory data, parallel processing platforms such as cloud computing and MapReduce are necessary. Currently, the researches based on MapReduce are on progress, but due to the MapReduce's properties in using batch processing and simple key-value structure, applying MapReduce framework for real time LBS is difficult. Therefore, in this research we propose a suitable system design on efficient indexing and search techniques for real time service based on detailed analysis on the properties of MapReduce.

Parallel Range Query Processing with R-tree on Multi-GPUs (다중 GPU를 이용한 R-tree의 병렬 범위 질의 처리 기법)

  • Ryu, Hongsu;Kim, Mincheol;Choi, Wonik
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.522-529
    • /
    • 2015
  • Ever since the R-tree was proposed to index multi-dimensional data, many efforts have been made to improve its query performances. One common trend to improve query performance is to parallelize query processing with the use of multi-core architectures. To this end, a GPU-base R-tree has been recently proposed. However, even though a GPU-based R-tree can exhibit an improvement in query performance, it is limited in its ability to handle large volumes of data because GPUs have limited physical memory. To address this problem, we propose MGR-tree (Multi-GPU R-tree), which can manage large volumes of data by dividing nodes into multiple GPUs. Our experiments show that MGR-tree is up to 9.1 times faster than a sequential search on a GPU and up to 1.6 times faster than a conventional GPU-based R-tree.