• Title/Summary/Keyword: Parallel Implementation

Search Result 883, Processing Time 0.025 seconds

A Fast SIFT Implementation Based on Integer Gaussian and Reconfigurable Processor

  • Su, Le Tran;Lee, Jong Soo
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.2 no.3
    • /
    • pp.39-52
    • /
    • 2009
  • Scale Invariant Feature Transform (SIFT) is an effective algorithm in object recognition, panorama stitching, and image matching, however, due to its complexity, real time processing is difficult to achieve with software approaches. This paper proposes using a reconfigurable hardware processor with integer half kernel. The integer half kernel Gaussian reduces the Gaussian pyramid complexity in about half [] and the reconfigurable processor carries out a parallel implementation of a full search Fast SIFT algorithm. We use a low memory, fine grain single instruction stream multiple data stream (SIMD) pixel processor that is currently being developed. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and I/O capabilities of the processor which results in a system that can perform real time image and video compression. We apply this novel implementation to images and measure the effectiveness. Experimental simulation results indicate that the proposed implementation is capable of real time applications.

  • PDF

Development of Real time Air Quality Prediction System

  • Oh, Jai-Ho;Kim, Tae-Kook;Park, Hung-Mok;Kim, Young-Tae
    • Proceedings of the Korean Environmental Sciences Society Conference
    • /
    • 2003.11a
    • /
    • pp.73-78
    • /
    • 2003
  • In this research, we implement Realtime Air Diffusion Prediction System which is a parallel Fortran model running on distributed-memory parallel computers. The system is designed for air diffusion simulations with four-dimensional data assimilation. For regional air quality forecasting a series of dynamic downscaling technique is adopted using the NCAR/Penn. State MM5 model which is an atmospheric model. The realtime initial data have been provided daily from the KMA (Korean Meteorological Administration) global spectral model output. It takes huge resources of computation to get 24 hour air quality forecast with this four step dynamic downscaling (27km, 9km, 3km, and lkm). Parallel implementation of the realtime system is imperative to achieve increased throughput since the realtime system have to be performed which correct timing behavior and the sequential code requires a large amount of CPU time for typical simulations. The parallel system uses MPI (Message Passing Interface), a standard library to support high-level routines for message passing. We validate the parallel model by comparing it with the sequential model. For realtime running, we implement a cluster computer which is a distributed-memory parallel computer that links high-performance PCs with high-speed interconnection networks. We use 32 2-CPU nodes and a Myrinet network for the cluster. Since cluster computers more cost effective than conventional distributed parallel computers, we can build a dedicated realtime computer. The system also includes web based Gill (Graphic User Interface) for convenient system management and performance monitoring so that end-users can restart the system easily when the system faults. Performance of the parallel model is analyzed by comparing its execution time with the sequential model, and by calculating communication overhead and load imbalance, which are common problems in parallel processing. Performance analysis is carried out on our cluster which has 32 2-CPU nodes.

  • PDF

Design and Implementation of Real-Time Parallel Engine for Discrete Event Wargame Simulation (이산사건 워게임 시뮬레이션을 위한 실시간 병렬 엔진의 설계 및 구현)

  • Kim, Jin-Soo;Kim, Dae-Seog;Kim, Jung-Guk;Ryu, Keun-Ho
    • The KIPS Transactions:PartA
    • /
    • v.10A no.2
    • /
    • pp.111-122
    • /
    • 2003
  • Military wargame simulation models must support the HLA in order to facilitate interoperability with other simulations, and using parallel simulation engines offer efficiency in reducing system overhead generated by propelling interoperability. However, legacy military simulation model engines process events using sequential event-driven method. This is due to problems generated by parallel processing such as synchronous reference to global data domains. Additionally. using legacy simulation platforms result in insufficient utilization of multiple CPUs even if a multiple CPU system is under use. Therefore, in this paper, we propose conversing the simulation engine to an object model-based parallel simulation engine to ensure military wargame model's improved system processing capability, synchronous reference to global data domains, external simulation time processing, and the sequence of parallel-processed events during a crash recovery. The converted parallel simulation engine is designed and implemented to enable parallel execution on a multiple CPU system (SMP).

Performance Enhancement and Evaluation of AES Cryptography using OpenCL on Embedded GPGPU (OpenCL을 이용한 임베디드 GPGPU환경에서의 AES 암호화 성능 개선과 평가)

  • Lee, Minhak;Kang, Woochul
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.7
    • /
    • pp.303-309
    • /
    • 2016
  • Recently, an increasing number of embedded processors such as ARM Mali begin to support GPGPU programming frameworks, such as OpenCL. Thus, GPGPU technologies that have been used in PC and server environments are beginning to be applied to the embedded systems. However, many embedded systems have different architectural characteristics compare to traditional PCs and low-power consumption and real-time performance are also important performance metrics in these systems. In this paper, we implement a parallel AES cryptographic algorithm for a modern embedded GPU using OpenCL, a standard parallel computing framework, and compare performance against various baselines. Experimental results show that the parallel GPU AES implementation can reduce the response time by about 1/150 and the energy consumption by approximately 1/290 compare to OpenMP implementation when 1000KB input data is applied. Furthermore, an additional 100 % performance improvement of the parallel AES algorithm was achieved by exploiting the characteristics of embedded GPUs such as removing copying data between GPU and host memory. Our results also demonstrate that higher performance improvement can be achieved with larger size of input data.

A Low Complexity Bit-Parallel Multiplier over Finite Fields with ONBs (최적정규기저를 갖는 유한체위에서의 저 복잡도 비트-병렬 곱셈기)

  • Kim, Yong-Tae
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.4
    • /
    • pp.409-416
    • /
    • 2014
  • In H/W implementation for the finite field, the use of normal basis has several advantages, especially the optimal normal basis is the most efficient to H/W implementation in $GF(2^m)$. The finite field $GF(2^m)$ with type I optimal normal basis(ONB) has the disadvantage not applicable to some cryptography since m is even. The finite field $GF(2^m)$ with type II ONB, however, such as $GF(2^{233})$ are applicable to ECDSA recommended by NIST. In this paper, we propose a bit-parallel multiplier over $GF(2^m)$ having a type II ONB, which performs multiplication over $GF(2^m)$ in the extension field $GF(2^{2m})$. The time and area complexity of the proposed multiplier is the same as or partially better than the best known type II ONB bit-parallel multiplier.

NTGST-Based Parallel Computer Vision Inspection for High Resolution BLU (NTGST 병렬화를 이용한 고해상도 BLU 검사의 고속화)

  • 김복만;서경석;최흥문
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.19-24
    • /
    • 2004
  • A novel fast parallel NTGST is proposed for high resolution computer vision inspection of the BLUs in a LCD production line. The conventional computation- intensive NTGST algorithm is modified and its C codes are optimized into fast NTGST to be adapted to the SIMD parallel architecture. And then, the input inspection image is partitioned and allocated to each of the P processors in multi-threaded implementation, and the NTGST is executed on SIMD architecture of N data items simultaneously in each thread. Thus, the proposed inspection system can achieve the speedup of O(NP). Experiments using Dual-Pentium III processor with its MMX and extended MMX SIMD technology show that the proposed parallel NTGST is about Sp=8 times faster than the conventional NTGST, which shows the scalability of the proposed system implementation for the fast, high resolution computer vision inspection of the various sized BLUs in LCD production lines.

Reusing Technique of Primitive Motions for Effective Implementation of Complex Action (복합적 행동들을 효율적으로 구현하기 위한 기본 동작의 재활용 기법)

  • Choi, Jun-Seong;Park, Jong-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.9
    • /
    • pp.1-13
    • /
    • 2014
  • Apart from the physical realism, the implementation of various physical actions of an agent to respond to dynamically changing situations is essential for the design of an agent in a cyber world. To achieve a maximum diversity in actions, we develop a mechanism that allows composite actions to be constructed by reusing a set of primitive motions and enables an agent to instantly react to changes in the ambient states. Specifically we model an agent's body in terms of joints, and a primitive or composite motion is performed in a real time. To implement this mechanism, we produce an animation for basic joint movements and develop a method to construct overall motions out of the primitive motions. These motions can be assembled into a plan by which an agent can achieve a goal. In this manner, diverse actions can be implemented without excessive efforts. This approach has conspicuous advantages when constructing a parallel action, e.g., eating while walking, that is, two or more parallel actions can be naturally merged into a parallel action according to their priority. We implement several composite and parallel actions to demonstrate the viability of our approach.

Parallel Processing Architecture for Parity Checksum Generator Complying with ITU-T J.83 ANNEX B (ITU-T J.83 ANNEX B의 Parity Checksum Generator를 위한 병렬 처리 구조)

  • Lee, Jong-Yeop;Hong, Eon-Pyo;Har, Dong-Soo;Lim, Hai-Jeong
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.6C
    • /
    • pp.619-625
    • /
    • 2009
  • This paper proposes a parallel architecture of a Parity Checksum Generator adopted for packet synchronization and error detection in the ITU-T Recommendation J.83 Annex B. The proposed parallel processing architecture removes a performance bottleneck occurred in a conventional serial processing architecture, leading to significant decrease in processing time for generating a Parity Checksum. The implementation results show that the proposed parallel processing architecture reduces the processing time by 83.1% at the expense of 16% area increase.

Efficient Parallel TLD on CPU-GPU Platform for Real-Time Tracking

  • Chen, Zhaoyun;Huang, Dafei;Luo, Lei;Wen, Mei;Zhang, Chunyuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.1
    • /
    • pp.201-220
    • /
    • 2020
  • Trackers, especially long-term (LT) trackers, now have a more complex structure and more intensive computation for nowadays' endless pursuit of high accuracy and robustness. However, computing efficiency of LT trackers cannot meet the real-time requirement in various real application scenarios. Considering heterogeneous CPU-GPU platforms have been more popular than ever, it is a challenge to exploit the computing capacity of heterogeneous platform to improve the efficiency of LT trackers for real-time requirement. This paper focuses on TLD, which is the first LT tracking framework, and proposes an efficient parallel implementation based on OpenCL. In this paper, we firstly make an analysis of the TLD tracker and then optimize the computing intensive kernels, including Fern Feature Extraction, Fern Classification, NCC Calculation, Overlaps Calculation, Positive and Negative Samples Extraction. Experimental results demonstrate that our efficient parallel TLD tracker outperforms the original TLD, achieving the 3.92 speedup on CPU and GPU. Moreover, the parallel TLD tracker can run 52.9 frames per second and meet the real-time requirement.

The Effect of Series and Shunt Redundancy on Power Semiconductor Reliability

  • Nozadian, Mohsen Hasan Babayi;Zarbil, Mohammad Shadnam;Abapour, Mehdi
    • Journal of Power Electronics
    • /
    • v.16 no.4
    • /
    • pp.1426-1437
    • /
    • 2016
  • In different industrial and mission oriented applications, redundant or standby semiconductor systems can be implemented to improve the reliability of power electronics equipment. The proper structure for implementation can be one of the redundant or standby structures for series or parallel switches. This selection is determined according to the type and failure rate of the fault. In this paper, the reliability and the mean time to failure (MTTF) for each of the series and parallel configurations in two redundant and standby structures of semiconductor switches have been studied based on different failure rates. The Markov model is used for reliability and MTTF equation acquisitions. According to the different values for the reliability of the series and parallel structures during SC and OC faults, a comprehensive comparison between each of the series and parallel structures for different failure rates will be made. According to the type of fault and the structure of the switches, the reliability of the switches in the redundant structure is higher than that in the other structures. Furthermore, the performance of the proposed series and parallel structures of switches during SC and OC faults, results in an improvement in the reliability of the boost dc/dc converter. These studies aid in choosing a configuration to improve the reliability of power electronics equipment depending on the specifications of the implemented devices.