• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.032 seconds

Implementation of a 3D Graphics Hardwired T&L Accelerator based on a SoC Platform for a Mobile System (SoC 플랫폼 기반 모바일용 3차원 그래픽 Hardwired T&L Accelerator 구현)

  • Lee, Kwang-Yeob;Koo, Yong-Seo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.9
    • /
    • pp.59-70
    • /
    • 2007
  • In this paper, we proposed an effective T&L(Transform & Lighting) Processor architecture for a real time 3D graphics acceleration SoC(System on a Chip) in a mobile system. We designed Floating point arithmetic IPs for a T&L processor. And we verified IPs using a SoC Platform. Designed T&L Processor consists of 24 bit floating point data format and 16 bit fixed point data format, and supports the pipeline keeping the balance between Transform process and Lighting process using a parallel computation of 3D graphics. The delay of pipeline processing only Transform operation is almost same as the delay processing both Transform operation and Lighting operation. Designed T&L Processor is implemented and verified using a SoC Platform. The T&L Processor operates at 80MHz frequency in Xilinx-Virtex4 FPGA. The processing speed is measured at the rate of 20M Vertexes/sec.

Performance Analysis of Distributed Hadoop Systems (분산 하둡 시스템의 성능 비교 분석)

  • Bae, Byoung-Jin;Kim, Young-Joo;Kim, Young-Kuk
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.479-482
    • /
    • 2014
  • Nowadays open-source hadoop systems have been using widely to efficiently manage a fast-growing big data. Hadoop systems consist of distributed file processing system called HDFS (Hadoop Distributed File System) and distributed parallel processing system called MapReduce. The MapReduce reads and processes big data from HDFS and then processed results are written in HDFS again by the MapReduce. Such a processing method has different system structure respectively according to hadoop version. Therefore, this paper shows analysis results for performance of hadoop systems. For this, we devise a way which monitors hadoop systems and measure occurrence frequency of processes, threads, and variables generated in hadoop system itself using the devised way. So, by using the measured results as analysis indicator, we help the indicator predict inner performance of hadoop systems.

  • PDF

Development of Information Technology Infrastructures through Construction of Big Data Platform for Road Driving Environment Analysis (도로 주행환경 분석을 위한 빅데이터 플랫폼 구축 정보기술 인프라 개발)

  • Jung, In-taek;Chong, Kyu-soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.669-678
    • /
    • 2018
  • This study developed information technology infrastructures for building a driving environment analysis platform using various big data, such as vehicle sensing data, public data, etc. First, a small platform server with a parallel structure for big data distribution processing was developed with H/W technology. Next, programs for big data collection/storage, processing/analysis, and information visualization were developed with S/W technology. The collection S/W was developed as a collection interface using Kafka, Flume, and Sqoop. The storage S/W was developed to be divided into a Hadoop distributed file system and Cassandra DB according to the utilization of data. Processing S/W was developed for spatial unit matching and time interval interpolation/aggregation of the collected data by applying the grid index method. An analysis S/W was developed as an analytical tool based on the Zeppelin notebook for the application and evaluation of a development algorithm. Finally, Information Visualization S/W was developed as a Web GIS engine program for providing various driving environment information and visualization. As a result of the performance evaluation, the number of executors, the optimal memory capacity, and number of cores for the development server were derived, and the computation performance was superior to that of the other cloud computing.

Implementation of Viterbi Decoder on Massively Parallel GPU for DVB-T Receiver (DVB-T 수신기를 위한 대규모 병렬처리 GPU 기반의 비터비 복호기 구현)

  • Lee, KyuHyung;Lee, Ho-Kyoung;Heo, Seo Weon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.9
    • /
    • pp.3-11
    • /
    • 2013
  • Recently, a plenty of researches have been conducted using the massively parallel processing of GPU for the implementation of communication system. In this paper, we tried to reduce software simulation time applying GPU with sliding block method to Viterbi decoder in DVB-T system which is one of European DTV standards. First of all, we implement DVB-T system by CPU and estimate cost time whereby the system processes one OFDM symbol. Secondly, we implement Viterbi decoder by software using NVIDIA's massive GPU processor. In our work, stream process method is applied to reduce the overhead for data transfer between CPU and GPU, as well as coalescing method to lower the global memory access time. In addition, data structure design method is used to maximize the shared memory usage. Consequently, our proposed method is approximately 11 times faster in 2K mode and 60 times faster in 8K mode for the process in Viterbi decoder.

Implementation of Channel Coding System using Viterbi Decoder of Pipeline-based Multi-Window (파이프라인 기반 다중윈도방식의 비터비 디코더를 이용한 채널 코딩 시스템의 구현)

  • Seo Young-Ho;Kim Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.3
    • /
    • pp.587-594
    • /
    • 2005
  • In the paper, after we propose a viterbi decoder which has multiple buffering and parallel processing decoding scheme through expanding time-divided imput signal, and map a FPGA, we implement a channel coding system together with PC-based software. Continuous input signal is buffered as order of decoding length and is parallel decoded using a high speed cell for viterbi decoding. Output data rate increases linearly with the cell formed the viterbi decoder, and flexible operation can be satisfied by programming controller and modifying input buffer. The tell for viterbi decoder consists of HD block for calculating hamming distance, CM block for calculating value in each state, TB block for trace-back operation, and LIFO. The implemented cell of viterbi decoder used 351 LAB(Logic Arrary Block) and stably operated in maximum 139MHz in APEX20KC EP20K600CB652-7 FPGA of ALTERA. The whole viterbi decoder including viterbi decoding cells, input/output buffers, and a controller occupied the hardware resource of $23\%$ and has the output data rate of 1Gbps.

Design and Implementation of a Grid System META for Executing CFD Analysis Programs on Distributed Environment (분산 환경에서 CFD 분석 프로그램 수행을 위한 그리드 시스템 META 설계 및 구현)

  • Kang, Kyung-Woo;Woo, Gyun
    • The KIPS Transactions:PartA
    • /
    • v.13A no.6 s.103
    • /
    • pp.533-540
    • /
    • 2006
  • This paper describes the design and implementation of a grid system META (Metacomputing Environment using Test-run of Application) which facilitates the execution of a CFD (Computational Fluid Dynamics) analysis program on distributed environment. The grid system META allows the CFD program developers can access the computing resources distributed over the network just like one computer system. The research issues involved in the grid computing include fault-tolerance, computing resource selection, and user-interface design. In this paper, we exploits an automatic resource selection scheme for executing the parallel SPMD (Single Program Multiple Data) application written in MPI (Message Passing Interface). The proposed resource selection scheme is informed from the network latency time and the elapsed time of the kernel loop attained from test-run. The network latency time highly influences the executional performance when a parallel program is distributed and executed over several systems. The elapsed time of the kernel loop can be used as an estimator of the whole execution time of the CFD Program due to a common characteristic of CFD programs. The kernel loop consumes over 90% of the whole execution time of a CFD program.

GLOVE: Distributed Shared Memory Based Parallel Visualization Tool for Massive Scientific Dataset (GLOVE: 대용량 과학 데이터를 위한 분산공유메모리 기반 병렬 가시화 도구)

  • Lee, Joong-Youn;Kim, Min Ah;Lee, Sehoon;Hur, Young Ju
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.6
    • /
    • pp.273-282
    • /
    • 2016
  • Visualization tool can be divided by three components - data I/O, visual transformation and interactive rendering. In this paper, we present requirements of three major components on visualization tools for massive scientific dataset and propose strategies to develop the tool which satisfies those requirements. In particular, we present how to utilize open source softwares to efficiently realize our goal. Furthermore, we also study the way to combine several open source softwares which are separately made to produce a single visualization software and optimize it for realtime visualization of massiv espatio-temporal scientific dataset. Finally, we propose a distributed shared memory based scientific visualization tool which is called "GLOVE". We present a performance comparison among GLOVE and well known open source visualization tools such as ParaView and VisIt.

A Labeling Scheme for Efficient On-the-fly Detection of Race Conditions in Parallel Programs (병렬프로그램의 경합조건을 수행 중에 효율적으로 탐지하기 위한 레이블링 기법)

  • Park, So-Hee;Woo, Jong-Jung;Bae, Jong-Min;Jun, Yong-Kee
    • The KIPS Transactions:PartA
    • /
    • v.9A no.4
    • /
    • pp.525-534
    • /
    • 2002
  • Race conditions, races in short, need to be detected for debugging parallel programs, because the races result in unintended non-deterministic executions. To detect the races in an execution of program, previous techniques use a centralized data structure which may incur serious bottleneck in generating concurrency information, or show inefficient time complexity which depends on the degree of nested parallelism in comparing any two of them. We propose a new labeling scheme in this paper, which is scalable in generating the concurrency information without bottleneck by using private data structure, and improves time complexity into constant in checking concurrency. The scalability and time efficiency therfore makes on-the-fly race detection efficient not only for programs with either shared-memory or message-passing, but also for programs with mixed model of the two.

Efficient FPGA Implementation of AES-CCM for IEEE 1609.2 Vehicle Communications Security

  • Jeong, Chanbok;Kim, Youngmin
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.2
    • /
    • pp.133-139
    • /
    • 2017
  • Vehicles have increasingly evolved and become intelligent with convergence of information and communications technologies (ICT). Vehicle communications (VC) has become one of the major necessities for intelligent vehicles. However, VC suffers from serious security problems that hinder its commercialization. Hence, the IEEE 1609 Wireless Access Vehicular Environment (WAVE) protocol defines a security service for VC. This service includes Advanced Encryption Standard-Counter with CBC-MAC (AES-CCM) for data encryption in VC. A high-speed AES-CCM crypto module is necessary, because VC requires a fast communication rate between vehicles. In this study, we propose and implement an efficient AES-CCM hardware architecture for high-speed VC. First, we propose a 32-bit substitution table (S_Box) to reduce the AES module latency. Second, we employ key box register files to save key expansion results. Third, we save the input and processed data to internal register files for secure encryption and to secure data from external attacks. Finally, we design a parallel architecture for both cipher block chaining message authentication code (CBC-MAC) and the counter module in AES-CCM to improve performance. For implementation of the field programmable gate array (FPGA) hardware, we use a Xilinx Virtex-5 FPGA chip. The entire operation of the AES-CCM module is validated by timing simulations in Xilinx ISE at a speed of 166.2 MHz.

STUDY OF CORE SUPPORT BARREL VIBRATION MONITORING USING EX-CORE NEUTRON NOISE ANALYSIS AND FUZZY LOGIC ALGORITHM

  • CHRISTIAN, ROBBY;SONG, SEON HO;KANG, HYUN GOOK
    • Nuclear Engineering and Technology
    • /
    • v.47 no.2
    • /
    • pp.165-175
    • /
    • 2015
  • The application of neutron noise analysis (NNA) to the ex-core neutron detector signal for monitoring the vibration characteristics of a reactor core support barrel (CSB) was investigated. Ex-core flux data were generated by using a nonanalog Monte Carlo neutron transport method in a simulated CSB model where the implicit capture and Russian roulette technique were utilized. First and third order beam and shell modes of CSB vibration were modeled based on parallel processing simulation. A NNA module was developed to analyze the ex-core flux data based on its time variation, normalized power spectral density, normalized cross-power spectral density, coherence, and phase differences. The data were then analyzed with a fuzzy logic module to determine the vibration characteristics. The ex-core neutron signal fluctuation was directly proportional to the CSB's vibration observed at 8Hz and15Hzin the beam mode vibration, and at 8Hz in the shell mode vibration. The coherence result between flux pairs was unity at the vibration peak frequencies. A distinct pattern of phase differences was observed for each of the vibration models. The developed fuzzy logic module demonstrated successful recognition of the vibration frequencies, modes, orders, directions, and phase differences within 0.4 ms for the beam and shell mode vibrations.