• Title/Summary/Keyword: 병렬 구현

Search Result 1,474, Processing Time 0.029 seconds

Optimized Implementation of Lightweight Block Cipher SIMECK and SIMON Counter Operation Mode on 32-Bit RISC-V Processors (32-bit RISC-V 프로세서 상에서의 경량 블록 암호 SIMECK, SIMON 카운터 운용 모드 최적 구현)

  • Min-Joo Sim;Hyeok-Dong Kwon;Yu-Jin Oh;Min-Ho Song;Hwa-Jeong Seo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.2
    • /
    • pp.165-173
    • /
    • 2023
  • In this paper, we propose an optimal implementation of lightweight block ciphers, SIMECK and SIMON counter operation mode, on a 32-bit RISC-V processor. Utilizing the characteristics of the CTR operating mode, we propose round function optimization that precomputes some values, single plaintext optimization and two plaintext parallel optimization. Since there are no previous research results on SIMECK and SIMON on RISC-V, we compared the performance of implementations with and without precomputation techniques for single plaintext optimization and two plaintext parallel optimization implementations. As a result, the implementations to which the precomputation technique was applied showed a performance improvement of 1% compared to the implementations to which precomputation was not applied.

Design and Implementation of a Communication Module of the Parallel Operating File System based on MISIX (MISIX 기반의 병렬 파일 시스템의 통신 모듈 설계 및 구현)

  • Jin, Sung-Kn;Cho, Jong-Hyun;Kim, Hae-Jin;Seo, Dae-Wha
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.4
    • /
    • pp.373-382
    • /
    • 2000
  • This paper is concerned with development of a communication module of POFS(Parallel Operating File System), which is the parallel file system to be operated on SPAX computer. SPAX is multiprocessor computer with clustering SMP architecture and being developed by ETRI. The operating system for SPAX is MISIX based on the Chorus microkernel. POFS has client/server architecture basically so that it is important to design a communication module. The communication module is so easily affected by network environment that bad design is the major reason that decreases the portability and performance of the parallel file system. This paper describes the structure and performance of the communication of the POFS. the theme is issued in the course of designing and developing POFS. The communication module of POFS was designed to support the portability and the architecture of parallel file system.

  • PDF

MRQUTER : A Parallel Qualitative Temporal Reasoner Using MapReduce Framework (MRQUTER: MapReduce 프레임워크를 이용한 병렬 정성 시간 추론기)

  • Kim, Jonghoon;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.5
    • /
    • pp.231-242
    • /
    • 2016
  • In order to meet rapid changes of Web information, it is necessary to extend the current Web technologies to represent both the valid time and location of each fact and knowledge, and reason their relationships. Until recently, many researches on qualitative temporal reasoning have been conducted in laboratory-scale, dealing with small knowledge bases. However, in this paper, we propose the design and implementation of a parallel qualitative temporal reasoner, MRQUTER, which can make reasoning over Web-scale large knowledge bases. This parallel temporal reasoner was built on a Hadoop cluster system using the MapReduce parallel programming framework. It decomposes the entire qualitative temporal reasoning process into several MapReduce jobs such as the encoding and decoding job, the inverse and equal reasoning job, the transitive reasoning job, the refining job, and applies some optimization techniques into each component reasoning job implemented with a pair of Map and Reduce functions. Through experiments using large benchmarking temporal knowledge bases, MRQUTER shows high reasoning performance and scalability.

A Design of CMOS Transceiver for noncoherent UWB Communication system (비동기방식 UWB통신용 CMOS 아날로그 송수신단의 설계)

  • Park, Jung-Wan;Moon, Yong;Choi, Sung-Soo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.12
    • /
    • pp.71-78
    • /
    • 2005
  • In this paper, we propose a transceiver for noncoherent OOK(On-Off Keying) Ultra Wide Band system based on magnitude detection. The proposed transceiver are designed using 0.18 micron CMOS technology and verified by simulation using SPICE and measurement. The proposed transceiver consist of parallelizer, Analog-to-Digital converter, clock generator, PLL and impulse generator. The time resolution of 1ns is obtained with 125MHz system clocks and 8x parallelization is carried out. The synchronized eight outputs with 2-bit resolution are delivered to the baseband. Impulse generator produces 1ns width pulse using digital CMOS gates. The simulation results and measurement show the feasibility of the proposed transceiver for UWB communication system.

A Parallel Implementation of JPEG2000 4K Ultra High Definition Image using OpenCL (OpenCL을 이용한 JPEG2000 4K 초고화질 영상처리의 병렬고속화 구현)

  • Park, Daeseung;Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.10 no.1
    • /
    • pp.1-5
    • /
    • 2015
  • With the help of fast growing multimedia technology and high preference for users of large screens, the newest video coding standard, HEVC (High Efficiency Video Coding) high-quality video compression), has been introduced. Therefore, the high definition image services which are four times more clear than conventional HD video, are getting popular. JPEG 2000 also has stated to support 4K and 8K UHD. As a result, it requires fast processing technology to read and write UHD images. This paper introduces a study on fast parallel processing technology for UHD images. For this purpose, first, JPEG 2000 is reviewed and a GPU based parallel implementation is proposed for a preprocessing of color conversion stage. The parallelled algorithm is implemented with OpenCL (Open Computing Language). The simulation results show that the proposed method shows 5 times performance improvements on processing speed for 4K UHD over the method using threads.

Development of 8kW ZVZCS Full Bridge DC-DC Converter by Parallel Operation (병렬제어를 적용한 8kW급 영전압/영전류 풀 브릿지 DC-DC 컨버터 개발)

  • Rho, Min-Sik
    • The Transactions of the Korean Institute of Power Electronics
    • /
    • v.12 no.5
    • /
    • pp.400-408
    • /
    • 2007
  • In this paper, development of the 8kW parallel module converter is presented. For a effective configuration of FB-PWM converter, this paper proposes 4-parallel operation of 2 kw-module. FB converter of 2-kW module is controlled by phase shut PWM and in order to achieve ZVZCS, the simple auxiliary circuit is applied in secondary side. In order to achieve ZCS, control logic for auxiliary circuit operation is designed to reset the primary current during free-wheeling period. For output current sharing of 4-modules, the charge control is employed. The charge control logic is designed with phase shift PWM logic. Voltage controller is implemented by using DSP(TMS320LF2406) with A/D conversion data of the output current and voltage of each module. The developed converter is installed in PCU(Power Conditioning Unit) for HSG(High Speed Generator) in a vehicle and health monitoring system is implemented for vehicle operation test. Finally, performance of the developed converter is proved under practical operation of HSG.

Implementation of Efficient Power Method on CUDA GPU (CUDA 기반 GPU에서 효율적인 Power Method의 구현)

  • Kim, Jung-Hwan;Kim, Jin-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.9-16
    • /
    • 2011
  • GPU computing is emerging in high performance application area since it can easily exploit massive parallelism in a way of cost-effective computing. The power method which finds the eigen vector of a given matrix is widely used in various applications such as PageRank for calculating importance of web pages. In this research we made the power method efficiently parallelized on GPU and also suggested how it can be improved to enhance its performance. The power method mainly consists of matrix-vector product and it can be easily parallelized. However, it should decide the convergence of the eigen vector and need scaling of the vector subsequently. Such operations incur several calls to GPU kernels and data movement between host and GPU memories. We improved the performance of the power method by means of reduced calls to GPU kernels, optimized thread allocation and enhanced decision operation for the convergence.

Hardware Implementation of Minimized Serial-Divider for Image Frame-Unit Processing in Mobile Phone Camera. (Mobile Phone Camera의 이미지 프레임 단위 처리를 위한 소형화된 Serial-Divider의 하드웨어 구현)

  • Kim, Kyung-Rin;Lee, Sung-Jin;Kim, Hyun-Soo;Kim, Kang-Joo;Kang, Bong-Soon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.119-122
    • /
    • 2007
  • In this paper, we propose the method of hardware-design for the division operation of image frame-unit processing in mobile phone camera. Generally, there are two types of the data processing, which are the parallel and serial type. The parallel type makes it possible to process in realtime, but it needs significant hardware size due to many comparators and buffer memories. Compare the serial type with the parallel type, the hardware size of the serial type is smaller than the other because it uses only one comparator, but serial type is not able to process in realtime. To use the hardware resources efficiently, we employ the serial divider since frame-unit operation for image processing does not need realtime process. When compared with both in the same bit size and operating frequency, the hardware size of the serial divider is approximately in the ratio of 13 percentage compared with the parallel divider.

  • PDF

The Design of Parallel Processing S/W Using CUDA for Realtime 3D Laser Ladar Imaging System (실시간 3차원 레이저 레이더 영상 생성을 위한 CUDA 기반 병렬처리 소프트웨어 설계)

  • Cho, Yong Il;Ha, Choong Lim;Yang, Ji Hyeon;Kim, Jae Hyup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.1
    • /
    • pp.1-10
    • /
    • 2013
  • In this paper, we propose a CUDA(Common Unified Device Architecture) based SW(software) design method for CPU(Central Processing Unit) and GPU(Graphic Processing Unit) parallel structure to implement real-time process in 3D Laser ladar(LADAR) imaging system. LADAR is a complex system to generate 3-dimensional image based on the laser ranging information, and requires massive process resources in each phase. Therefore, designing and implementing parallel structure are crucial to realize a real-time process within limited system resource. As a conclusion, we can meet the speed of required real-time process allocating separable work load to CUDA GPU by analyzing process algorithm in each phase and confirm the process speed increase by 46%.

The study of striping size according to the amount of storage nodes in the Parallel Media Stream Server (병렬 미디어 스트림 서버에서 저장노드수의 변화에 따른 스트라이핑 크기 결정에 관한 연구)

  • Kim, Seo-Gyun;Nam, Ji-Seung
    • The KIPS Transactions:PartC
    • /
    • v.8C no.6
    • /
    • pp.765-774
    • /
    • 2001
  • In this paper, we proposed the striping policy for the storage nodes in the Linux-based parallel media stream server. We newly developed a storage clustering architecture, and named it as a system RAID architecture. In this system, many storage cluster nodes are grouped to operate as a single server. This system uses unique striping policy to distribute multimedia files into the parallel storage nodes. If a service request occurs, each storage cluster node transmits striped files concurrently to the clients. This scheme can provide the fair distribution of the preprocessing load in all storage cluster nodes. The feature of this system is a relative striping policy based on the file types, service types, and the number of storage nodes to provide the best service.

  • PDF