• Title/Summary/Keyword: Parallel Implementation

Search Result 883, Processing Time 0.023 seconds

GPU-Accelerated Single Image Depth Estimation with Color-Filtered Aperture

  • Hsu, Yueh-Teng;Chen, Chun-Chieh;Tseng, Shu-Ming
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.3
    • /
    • pp.1058-1070
    • /
    • 2014
  • There are two major ways to implement depth estimation, multiple image depth estimation and single image depth estimation, respectively. The former has a high hardware cost because it uses multiple cameras but it has a simple software algorithm. Conversely, the latter has a low hardware cost but the software algorithm is complex. One of the recent trends in this field is to make a system compact, or even portable, and to simplify the optical elements to be attached to the conventional camera. In this paper, we present an implementation of depth estimation with a single image using a graphics processing unit (GPU) in a desktop PC, and achieve real-time application via our evolutional algorithm and parallel processing technique, employing a compute shader. The methods greatly accelerate the compute-intensive implementation of depth estimation with a single view image from 0.003 frames per second (fps) (implemented in MATLAB) to 53 fps, which is almost twice the real-time standard of 30 fps. In the previous literature, to the best of our knowledge, no paper discusses the optimization of depth estimation using a single image, and the frame rate of our final result is better than that of previous studies using multiple images, whose frame rate is about 20fps.

Parallel Distributed Implementation of GHT on MPI-based PC Cluster (MPI 기반 PC 클러스터에서 GHT의 병렬 분산 구현)

  • Kim, Yeong-Soo;Kim, Jeong-Sahm;Choi, Heung-Moon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.44 no.3
    • /
    • pp.81-89
    • /
    • 2007
  • This paper presents a parallel distributed implementation of the GHT (generalized Hough transform) for the fast processing on the MPI-based PC cluster. We tried to achieve the higher speedup mainly by alleviating the communication overhead through the pipelined broadcast and accumulator array partition strategy and by time overlapping of the communication and the computation over entire process. Experimental results show that nearly linear speedup is reachable by the proposed method on the MPI-based PC clusters connected through 100Mbps Ethernet switch.

Design and Implementation of a DSP Chip for Portable Multimedia Applications (휴대 멀티미디어 응용을 위한 DSP 칩 설계 및 구현)

  • 윤성현;선우명훈
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.12
    • /
    • pp.31-39
    • /
    • 1998
  • This paper presents the design and implementation of a new multimedia fixed-point DSP (MDSP) core for portable multimedia applications. The MDSP instruction set is designed through the analysis of multimedia algorithms and DSP instruction sets. The MDSP architecture employs parallel processing techniques, such as SIMD and vector processing as well as DSP techniques. The instruction set can handle various data formats and MDSP can perform two MAC operations in parallel. The switching network and packing network can increase the performance by overlapping data rearrangement cycles with computation cycles. We have designed Verilog HDL models and the 0.6 $\mu\textrm{m}$ Samsung KG75000 SOG library is used. The total gate count is 68,831 and the clock frequency is 30 MHz.

  • PDF

Implementation of LTE uplink System for SDR Platform using CUDA and UHD (CUDA와 UHD를 이용한 SDR 플랫폼 용 LTE 상향링크 시스템 구현)

  • Ahn, Chi Young;Kim, Yong;Choi, Seung Won
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.9 no.2
    • /
    • pp.81-87
    • /
    • 2013
  • In this paper, we present an implementation of Long Term Evolution (LTE) Uplink (UL) system on a Software Defined Radio (SDR) platform using a conventional Personal Computer (PC), which adopts Graphic Processing Units (GPU) and Universal Software Radio Peripheral2 (USRP2) with URSP Hardware Driver (UHD) for SDR software modem and Radio Frequency (RF) transceiver, respectively. We have adopted UHD because UHD provides flexibility in the design of transceiver chain. Also, Cognitive Radio (CR) engine have been implemented by using libraries from UHD. Meanwhile, we have implemented the software modem in our system on GPU which is suitable for parallel computing due to its powerful Arithmetic and Logic Units (ALUs). From our experiment tests, we have measured the total processing time for a single frame of both transmit and receive LTE UL data to find that it takes about 5.00ms and 6.78ms for transmit and receive, respectively. It particularly means that the implemented system is capable of real-time processing of all the baseband signal processing algorithms required for LTE UL system.

A 18-Mbp/s, 8-State, High-Speed Turbo Decoder

  • Jung Ji-Won;Kim Min-Hyuk;Jeong Jin-Hee
    • Journal of electromagnetic engineering and science
    • /
    • v.6 no.3
    • /
    • pp.147-154
    • /
    • 2006
  • In this paper, we propose and present implementation results of a high-speed turbo decoding algorithm. The latency caused by (de) interleaving and iterative decoding in a conventional maximum a posteriori(MAP) turbo decoder can be dramatically reduced with the proposed design. The source of the latency reduction is come from the combination of the radix-4, dual-path processing, parallel decoding, and rearly-stop algorithms. This reduced latency enables the use of the turbo decoder as a forward error correction scheme in real-time wireless communication services. The proposed scheme results in a slight degradation in bit-error rate(BER) performance for large block sizes because the effective interleaver size in a radix-4 implementation is reduced to half, relative to the conventional method. Fixed on the parameters of N=212, iteration=3, 8-states, 3 iterations, and QPSK modulation scheme, we designed the adaptive high-speed turbo decoder using the Xilinx chip (VIRTEX2P (XC2VP30-5FG676)) with the speed of 17.78 Mb/s. From the results, we confirmed that the decoding speed of the proposed decoder is faster than conventional algorithms by 8 times.

Optical Implementation of Triple DES Algorithm Based on Dual XOR Logic Operations

  • Jeon, Seok Hee;Gil, Sang Keun
    • Journal of the Optical Society of Korea
    • /
    • v.17 no.5
    • /
    • pp.362-370
    • /
    • 2013
  • In this paper, we propose a novel optical implementation of a 3DES algorithm based on dual XOR logic operations for a cryptographic system. In the schematic architecture, the optical 3DES system consists of dual XOR logic operations, where XOR logic operation is implemented by using a free-space interconnected optical logic gate method. The main point in the proposed 3DES method is to make a higher secure cryptosystem, which is acquired by encrypting an individual private key separately, and this encrypted private key is used to decrypt the plain text from the cipher text. Schematically, the proposed optical configuration of this cryptosystem can be used for the decryption process as well. The major advantage of this optical method is that vast 2-D data can be processed in parallel very quickly regardless of data size. The proposed scheme can be applied to watermark authentication and can also be applied to the OTP encryption if every different private key is created and used for encryption only once. When a security key has data of $512{\times}256$ pixels in size, our proposed method performs 2,048 DES blocks or 1,024 3DES blocks cipher in this paper. Besides, because the key length is equal to $512{\times}256$ bits, $2^{512{\times}256}$ attempts are required to find the correct key. Numerical simulations show the results to be carried out encryption and decryption successfully with the proposed 3DES algorithm.

Development and Implementation of Measures for Structural and Reliability Importance by Using Minimal Cut Sets and Minimal Path Sets (최소절단집합과 최소경로집합을 이용한 구조 및 신뢰성 중요도 척도의 개발 및 적용)

  • Choi, Sung-Woon
    • Journal of the Korea Safety Management & Science
    • /
    • v.14 no.1
    • /
    • pp.225-233
    • /
    • 2012
  • The research discusses interrelationship of structural and reliability importance measures which used in the probabilistic safety assessment. The most frequently used component importance measures, such as Birnbaum's Importance (BI), Risk Reduction (RR), Risk Reduction Worth (RRW), RA (Risk Achievement), Risk Achievement Worth (RAW), Fussel Vesely (FV) and Critically Importance (CI) can be derived from two structure importance measures that are developed based on the size and the number of Minimal Path Set (MPS) and Minimal Cut Set (MCS). In order to show an effectiveness of importance measures which is developed in this paper, the three representative functional structures, such as series-parallel, k out of n and bridge are used to compare with Birnbaum's Importance measure. In addition, the study presents the implementation examples of Total Productive Maintenance (TPM) metrics and alternating renewal process models with exponential distribution to calculate the availability and unavailability of component facility for improving system performances. System state structure functions in terms of component states can be converted into the system availability (unavailability) functions by substituting the component reliabilities (unavailabilities) for the component states. The applicable examples are presented in order to help the understanding of practitioners.

Multi-Objective Pareto Optimization of Parallel Synthesis of Embedded Computer Systems

  • Drabowski, Mieczyslaw
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.304-310
    • /
    • 2021
  • The paper presents problems of optimization of the synthesis of embedded systems, in particular Pareto optimization. The model of such a system for its design for high-level of abstract is based on the classic approach known from the theory of task scheduling, but it is significantly extended, among others, by the characteristics of tasks and resources as well as additional criteria of optimal system in scope structure and operation. The metaheuristic algorithm operating according to this model introduces a new approach to system synthesis, in which parallelism of task scheduling and resources partition is applied. An algorithm based on a genetic approach with simulated annealing and Boltzmann tournaments, avoids local minima and generates optimized solutions. Such a synthesis is based on the implementation of task scheduling, resources identification and partition, allocation of tasks and resources and ultimately on the optimization of the designed system in accordance with the optimization criteria regarding cost of implementation, execution speed of processes and energy consumption by the system during operation. This paper presents examples and results for multi-criteria optimization, based on calculations for specifying non-dominated solutions and indicating a subset of Pareto solutions in the space of all solutions.

A VISION SYSTEM IN ROBOTIC WELDING

  • Absi Alfaro, S. C.
    • Proceedings of the KWS Conference
    • /
    • 2002.10a
    • /
    • pp.314-319
    • /
    • 2002
  • The Automation and Control Group at the University of Brasilia is developing an automatic welding station based on an industrial robot and a controllable welding machine. Several techniques were applied in order to improve the quality of the welding joints. This paper deals with the implementation of a laser-based computer vision system to guide the robotic manipulator during the welding process. Currently the robot is taught to follow a prescribed trajectory which is recorded a repeated over and over relying on the repeatability specification from the robot manufacturer. The objective of the computer vision system is monitoring the actual trajectory followed by the welding torch and to evaluate deviations from the desired trajectory. The position errors then being transfer to a control algorithm in order to actuate the robotic manipulator and cancel the trajectory errors. The computer vision systems consists of a CCD camera attached to the welding torch, a laser emitting diode circuit, a PC computer-based frame grabber card, and a computer vision algorithm. The laser circuit establishes a sharp luminous reference line which images are captured through the video camera. The raw image data is then digitized and stored in the frame grabber card for further processing using specifically written algorithms. These image-processing algorithms give the actual welding path, the relative position between the pieces and the required corrections. Two case studies are considered: the first is the joining of two flat metal pieces; and the second is concerned with joining a cylindrical-shape piece to a flat surface. An implementation of this computer vision system using parallel computer processing is being studied.

  • PDF

Safety of Industrial Overhead Doors : A Review of Maintenance and Parallel Safety Devices (산업용 오버헤드 도어의 사고 예방 : 유지관리 및 병렬구조 안전장치를 중심으로)

  • Bok Ki Kim;Jaewook Jeong
    • Journal of the Korean Society of Safety
    • /
    • v.39 no.1
    • /
    • pp.33-40
    • /
    • 2024
  • This study analyzes the impact of regular preventive maintenance (PM) on reducing the failure rate and occurrence of falling accidents of industrial overhead doors. A reliable safety device model with an additional safety device, which is installed to replace a defective one, is proposed. The research methodology involves collecting breakdown and falling accident records, comparing and analyzing data before and after regular PM implementation, and experimenting with two types of retrofittable safety devices. Key findings are as follows. 1. Regular PM implementation significantly reduces the failure rate of old overhead doors. 2. A parallel structured model with two alternative safety devices can minimize falling accident risks. The study's contributions include the following. 1. The positive impact of PM on extending overhead door lifespan is quantified. 2. A general safety device model that can be retrofitted and used as replacement with a fail-safe function is proposed.