• Title/Summary/Keyword: Parallel Implementation

Search Result 883, Processing Time 0.027 seconds

Load Balancing Based on Transform Unit Partition Information for High Efficiency Video Coding Deblocking Filter

  • Ryu, Hochan;Park, Seanae;Ryu, Eun-Kyung;Sim, Donggyu
    • ETRI Journal
    • /
    • v.39 no.3
    • /
    • pp.301-309
    • /
    • 2017
  • In this paper, we propose a parallelization method for a High Efficiency Video Coding (HEVC) deblocking filter with transform unit (TU) split information. HEVC employs a deblocking filter to boost perceptual quality and coding efficiency. The deblocking filter was designed for data-level parallelism. In this paper, we demonstrate a method of distributing equal workloads to all cores or threads by anticipating the deblocking filter complexity based on the coding unit depth and TU split information. We determined that the average time saving of our proposed deblocking filter parallelization method has a speed-up factor that is 2% better than that of the uniformly distributed parallel deblocking filter, and 6% better than that of coding tree unit row distribution parallelism. In addition, we determined that the speed-up factor of our proposed deblocking filter parallelization method, in terms of percentage run-time, is up to 3.1 compared to the run-time of the HEVC test model 12.0 deblocking filter with a sequential implementation.

Application of Multi-Frontal Method in Collaborative Engineering Environment

  • Cho, Seong-Wook;Choi, Young;Lee, Gyu-Bong;Kwon, Ki-Eak
    • International Journal of CAD/CAM
    • /
    • v.3 no.1_2
    • /
    • pp.51-60
    • /
    • 2003
  • The growth of the World Wide Web and the advances in high-speed network access have greatly changed existing CAD/CAE environment. The WWW has enabled us to share various distributed product data and to collaborate in the design process. An international standard for the product model data, STEP, and a standard for the distributed object technology, CORBA, are very important technological components for the interoperability in the advanced design and manufacturing environment. These two technologies provide background for the sharing of product data and the integration of applications on the network. This paper describes a distributed CAD/CAE environment that is integrated on the network by CORBA and product model data standard STEP. Several prototype application modules were implemented to verify the proposed concept and the test result is discussed. Finite element analysis server are further distributed into several frontal servers for the implementation of distributed parallel solution of finite element system equations. Distributed computation of analysis server is also implemented by using CORBA for the generalization of the proposed method.

Implementation and miniaturization of High Order Derivative Gaussian Pulse Generator for DS-UWB (DS-UWB를 위한 고차 미분 가우시안 펄스 생성기의 소형화와 구현)

  • Kim, Dong-Ho;Bang, Gyeong-Nam;Park, Chong-Dae
    • Journal of IKEEE
    • /
    • v.10 no.2 s.19
    • /
    • pp.109-115
    • /
    • 2006
  • In this paper, High order derivative Gaussian pulse generator for DS-UWB communication satisfying the regulation of FCC was proposed and fabricated. In order to transform rectangular signal of 100Mbps to a Gaussian pulse, the fabricated Gaussian pulse generator consists of only two SRD. The output pulse had the widths of 330 psec and amplitudes of 920 mV. In addition, the designed and fabricated dual bandpass filter shows high order derivate characteristics by using micro-strip line and parallel stub to remove WLAN band. We generated the 13th Gaussian pulse restricted frequency spectrum of WLAN band more than -25dB. The pulse had pulse width of 1 nsec and amplitude of 25 mV.

  • PDF

Implementation of 2,048-bit RSA Based on RNS(Residue Number Systems) (RNS(Residue Number Systems) 기반의 2,048 비트 RSA 설계)

  • 권택원;최준림
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.4
    • /
    • pp.57-66
    • /
    • 2004
  • This paper proposes the design of a 2,048-bit RSA based on RNS(residue number systems) Montgomery modular multiplier As the systems that RNS processes a fast parallel modular multiplication for a large word partitioned into small words, we introduce Montgomery reduction method(MRM)[1]based on Wallace tree modular multiplier and 33 RNS bases with 64-bit size for RNS Montgomery modular multiplication in this paper. Also, for fast RNS modular multiplication, a modified method based on Chinese remainder theorem(CRT)[2] is presented. We have verified 2,048-bit RSA based on RNS using Samsung 0.35${\mu}{\textrm}{m}$ technology and the 2,048-bit RSA is performed in 2.54㎳ at 100MHz.

Analysis on Spatial Sampling and Implementation for Primal Trees (Primal Tree의 공간 분할 샘플링 분석 및 구현)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.15 no.3
    • /
    • pp.347-355
    • /
    • 2014
  • The general octree structure is common for various applications including computer graphics, geometry information analysis and query. Unfortunately, the general octree approach causes duplicated sample data and discrepancy between sampling and representation positions when applied to sample continuous spatial information, for example, signed distance fields. To address these issues, some researchers introduced the dual octree. In this paper, the weakness of the dual octree approach will be illustrated by focusing on the fact that the dual octree cannot access some specific continuous zones asymptotically. This paper shows that the primal tree presented by Lefebvre and Hoppe can solve all the problems above. Also, this paper presents a three-dimensional primal tree traversal algorithm based the Morton codes which will help to parallelize the primal tree method.

SPCBC: A Secure Parallel Cipher Block Chaining Mode of Operation based on logistic Chaotic Map

  • El-Semary, Aly M.;Azim, Mohamed Mostafa A.;Diab, Hossam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3608-3628
    • /
    • 2017
  • Several block cipher modes of operation have been proposed in the literature to protect sensitive information. However, different security analysis models have been presented for attacking them. The analysis indicated that most of the current modes of operation are vulnerable to several attacks such as known plaintext and chosen plaintext/cipher-text attacks. Therefore, this paper proposes a secure block cipher mode of operation to thwart such attacks. In general, the proposed mode combines one-time chain keys with each plaintext before its encryption. The challenge of the proposed mode is the generation of the chain keys. The proposed mode employs the logistic map together with a nonce to dynamically generate a unique set of chain keys for every plaintext. Utilizing the logistic map assures the dynamic behavior while employing the nonce guarantees the uniqueness of the chain keys even if the same message is encrypted again. In this way, the proposed mode called SPCBC can resist the most powerful attacks including the known plaintext and chosen plaintext/cipher-text attacks. In addition, the SPCBC mode improves encryption time performance through supporting parallelized implementation. Finally, the security analysis and experimental results demonstrate that the proposed mode is robust compared to the current modes of operation.

Implementation of the Squared-Error Pattern Clustering Processor Using the Residue Number System (剩餘數體系를 이용한 자승오차 패턴 클러스터링 프로세서의 실현)

  • Kim, Hyeong-Min;Cho, Won-Kyung
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.2
    • /
    • pp.87-93
    • /
    • 1989
  • Squared-error Pattern Clustering algorithm used in unsupervised pattern recognition and image processing application demands substantial processing time for operation of feature vector matrix. So, this paper propose the fast squared-error Pattern Clustering Processor using the Residue Number System which have been the nature of parallel processing and pipeline. The proposed Squared-error Pattern Clustering Processor illustrate satisfiable error rate for Cluster number which can be divide meaningful region and about 200 times faster than 80287 coprocessor from experiments result of image segmentation. In this result, it is useful to real-time processing application for large data.

  • PDF

Implementation of a 'Rasterization based on Vector Algorithm' suited for a Multi-thread Shader architecture (Multi-Thread 쉐이더 구조에 적합한 Vector 기반의 Rasterization 알고리즘의 구현)

  • Lee, Ju-Suk;Kim, Woo-Young;Lee, Bo-Haeng;Lee, Kwang-Yeob
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.10
    • /
    • pp.46-52
    • /
    • 2009
  • A Multi-Core/Multi-Thread architecture is adopted for the Shader processor to enhance the processing performance. The Shader processor is designed to utilize its processing core IP for multiple purposes, such as Vertex-Shading, Rasterization, Pixel-Shading, etc. In this paper, we propose a 'Rasterization based on Vector Algorithm' that makes parallel pixels processing possible with Multi-Core and Multi-Thread architecture on the Shader Core. The proposed algorithm takes only 2% operation counts of the Scan-Line Algorithm and processes pixels independently.

A Parallel Implementation of Purge Process for Lustre File System (Lustre 파일 시스템을 위한 Purge 기능의 병렬화 구현)

  • Kwon, Min-Woo;Yoon, Jun-Weon;Hong, Tae-Young;Park, Chan-Yeol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.64-65
    • /
    • 2016
  • 슈퍼컴퓨터는 대용량의 데이터를 효율적으로 관리하기 위해 Lustre 파일 시스템과 같은 고성능의 병렬 파일 시스템을 이용한다. 한국과학기술정보연구원의 슈퍼컴퓨터 4호기 Tachyon 2차 시스템과 같이 다수의 사용자가 접속하는 슈퍼컴퓨터는 사용자의 데이터가 한없이 누적됨으로 Lustre 파일 시스템의 성능이 저하되는 이슈가 있다. 본 논문에서는 사용자의 데이터가 누적되는 것을 방지하기 위해 장기간 사용하지 않는 데이터를 자동 삭제하는 기능인 Purge기능을 구현하였다. 특히, 기하급수적으로 늘어나는 병렬 파일 시스템의 용량에 대처하기 위해 병렬 컴퓨팅 기술을 이용해 고속 Purge 기능을 구현하였다. 단일 컴퓨팅 노드와 병렬 환경에서 구현한 결과를 비교하였을 때, 단일 컴퓨팅 노드에서는 1,517GB 용량을 지우는데 221.2초가 걸렸으며 16개의 컴퓨팅 노드를 이용한 병렬 환경에서는 49.9초가 걸렸다. 이 결과를 비교했을 때 단일 컴퓨팅 노드에서 구현한 결과 대비 병렬 환경에서 구현했을 때 약 4.4배의 성능향상을 얻을 수 있었다.

Multiplier Using CRT and Overlapped Multiple-bit Scanning Method (CRT와 중첩다중비트 주사기법을 접목한 승산기)

  • 김우완;장상동
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.12
    • /
    • pp.749-755
    • /
    • 2003
  • Digital signal processing hardware based in RNS is currently considered as an important method for high speed and low cost hardware realization. This research designs and implements the method for conversion from a specific residue number system with moduli of the from $(2^k-1, 2^k, 2^k+1)$ to a weighted number system. Then, it simulates the implementation using a overlapped multiple-bit scanning method in the process of CRT conversion. In conclusion, the simulation shows that the CRT method which is adopted in this research, performs arithmetic operations faster than the traditional approaches, due to advantages of parallel processing and carry-free arithmetic operation.