• Title/Summary/Keyword: pipelining

Search Result 141, Processing Time 0.026 seconds

Priority Data Handling in Pipeline-based Workflow (파이프라인 기반 워크플로우의 우선 데이터 처리 방안)

  • Jeon, Wonpyo;Heo, Daeyoung;Hwang, Suntae
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.12
    • /
    • pp.691-697
    • /
    • 2017
  • Volcanic ash has been predicted to be the main source of damage caused by a potential volcanic disaster around Mount Baekdu and the regions of the Korean peninsula. Computer simulations to predict the diffusion of volcanic ash should be performed according to prevalent meteorological situations within a predetermined time. Therefore, a workflow using pipelining is proposed to parallelize the software used for this computation. Due to the nature of volcanic calamities, the simulations need to be carried out for various plausible conditions given that the parameters cannot be precisely determined during the simulations, even at the time of a volcanic eruption. Among the given conditions, computations need to be first performed for the condition with the highest probability so that a response to the volcanic disaster can be provided using these results. Further action can then be performed later based on subsequent results. The computations need to be performed using a volcanic disaster damage prediction system on a computing server with limited computing performance. Hence, an optimal distribution of the computing resources is required. We propose a method through which specific data can be provided first to the proposed pipeline-based workflow.

Implementation of Optimizing Compiler for Bus-based VLIW Processors (버스기반의 VLIW형 프로세서를 위한 최적화 컴파일러 구현)

  • Hong, Seung-Pyo;Moon, Soo-Mook
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.4
    • /
    • pp.401-407
    • /
    • 2000
  • Modern microprocessors exploit instruction-level parallel processing to increase the performance. Especially VLIW processors supported by the parallelizing compiler are used more and more in specific applications such as high-end DSP and graphic processing. Bus-based VLIW architecture was proposed for these specific applications and it was designed to reduce the overhead of forwarding unit and the instruction width. In this paper, a optimizing scheduling compiler developed for the proposed bus-based VLIW processor is introduced. First, the method to model interconnections between buses and resource usage patterns is described. Then, on the basis of the modeling, machine-dependent optimization techniques such as bus-to-register promotion, copy coalescing and operand substitution were implemented. Optimization techniques for general-purpose VLIW microprocessors such as selective scheduling and enhanced pipelining scheduling(EPS) were also implemented. The experiment result shows about 20% performance gain for multimedia application benchmarks.

  • PDF

A Vectorization Technique at Object Code Level (목적 코드 레벨에서의 벡터화 기법)

  • Lee, Dong-Ho;Kim, Ki-Chang
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.5
    • /
    • pp.1172-1184
    • /
    • 1998
  • ILP(Instruction Level Parallelism) processors use code reordering algorithms to expose parallelism in a given sequential program. When applied to a loop, this algorithm produces a software-pipelined loop. In a software-pipelined loop, each iteration contains a sequence of parallel instructions that are composed of data-independent instructions collected across from several iterations. For vector loops, however the software pipelining technique can not expose the maximum parallelism because it schedules the program based only on data-dependencies. This paper proposes to schedule differently for vector loops. We develop an algorithm to detect vector loops at object code level and suggest a new vector scheduling algorithm for them. Our vector scheduling improves the performance because it can schedule not only based on data-dependencies but on loop structure or iteration conditions at the object code level. We compare the resulting schedules with those by software-pipelining techniques in the aspect of performance.

  • PDF

A VLSI Architecture of Systolic Array for FET Computation (고속 퓨리어 변환 연산용 VLSI 시스토릭 어레이 아키텍춰)

  • 신경욱;최병윤;이문기
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.9
    • /
    • pp.1115-1124
    • /
    • 1988
  • A two-dimensional systolic array for fast Fourier transform, which has a regular and recursive VLSI architecture is presented. The array is constructed with identical processing elements (PE) in mesh type, and due to its modularity, it can be expanded to an arbitrary size. A processing element consists of two data routing units, a butterfly arithmetic unit and a simple control unit. The array computes FFT through three procedures` I/O pipelining, data shuffling and butterfly arithmetic. By utilizing parallelism, pipelining and local communication geometry during data movement, the two-dimensional systolic array eliminates global and irregular commutation problems, which have been a limiting factor in VLSI implementation of FFT processor. The systolic array executes a half butterfly arithmetic based on a distributed arithmetic that can carry out multiplication with only adders. Also, the systolic array provides 100% PE activity, i.e., none of the PEs are idle at any time. A chip for half butterfly arithmetic, which consists of two BLC adders and registers, has been fabricated using a 3-um single metal P-well CMOS technology. With the half butterfly arithmetic execution time of about 500 ns which has been obtained b critical path delay simulation, totla FFT execution time for 1024 points is estimated about 16.6 us at clock frequency of 20MHz. A one-PE chip expnsible to anly size of array is being fabricated using a 2-um, double metal, P-well CMOS process. The chip was layouted using standard cell library and macrocell of BLC adder with the aid of auto-routing software. It consists of around 6000 transistors and 68 I/O pads on 3.4x2.8mm\ulcornerarea. A built-i self-testing circuit, BILBO (Built-In Logic Block Observation), was employed at the expense of 3% hardware overhead.

  • PDF

Investigation of Small MPU Design and its Pipelining by Research CAD Tools (연구용 CAD툴에 의한 소형 MPU의 설계 및 파이프라인화의 고찰)

  • Lee, Su-Jeong;Park, Do-Sun;Song, Nak-Yun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.1 no.4
    • /
    • pp.517-530
    • /
    • 1994
  • In this paper, design of small microprocessor unit is implemented using research purpose VHDL and CAD tools by top-down design method. For this, original basic MPU and its pipelining architectures are suggested. Once, design target, instruction sets, architecture are decided, the operation is confirmed by C language simulation, and then the operation is confirmed by checking internal register contents for given inputs in the case of VHDL simulation. Then, design layouts are made by full/semi-custom design methods by research CAD tools and related simulation is implemented. The feasibility of suggested pipelined structure for performance improvement is confirmed by simulation, and related problems and future research directions are discussed. In conclusion, the MPU design methodology is set up and the design change of architecture is possible by this paper.

  • PDF

Block-wise Skipping for Embedded Database System (임베디드 데이터베이스 시스템을 위한 블록 단위 스키핑 기법)

  • Chong, Jae-Hyok;Park, Hyoung-Min;Hong, Seok-Jin;Shim, Kyu-Seok
    • The KIPS Transactions:PartD
    • /
    • v.16D no.6
    • /
    • pp.835-844
    • /
    • 2009
  • Today, most of all the query processors in the world generally use the 'Pipelining' method to acquire fast response time (first record latency) and less memory usage. Each of the operator nodes in the Query Execution Plan (QEP) provides Open(), Next(), and Close() functions for their interface to facilitate the iterator mechanism. However, the embedded database systems for the mobile devices, based on the FLASH memory, usually require a function like Previous(), which returns the previous records from current position. It is because that, in the embedded environment, the mobile devices cannot fully provide it main memory to store all the query results. So, whenever needed the previously read records the user (program) should re-fetch the previous records using the Previous() function: the BACKWARD data fetch. In this paper, I introduce the 'Direction Switching Problem' caused by the Previous() function and suggest 'Block-wise Skipping' method to fully utilize the benefits of the block-based data transfer mechanism, which is widely accepted by most of the today's relational database management systems.

PDA-based Text Extraction System using Client/Server Architecture (Client/Server구조를 이용한 PDA기반의 문자 추출 시스템)

  • Park Anjin;Jung Keechul
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.85-98
    • /
    • 2005
  • Recently, a lot of researches about mobile vision using Personal Digital Assistant(PDA) has been attempted. Many CPUs for PDA are integer CPUs, which have no floating-computation component. It results in slow computation of the algorithms peformed by vision system or image processing, which have much floating-computation. In this paper, in order to resolve this weakness, we propose the Client(PDA)/server(PC) architecture which is connected to each other with a wireless LAN, and we construct the system with pipelining processing using two CPUs of the Client(PDA) and the Server(PC) in image sequence. The Client(PDA) extracts tentative text regions using Edge Density(ED). The Server(PC) uses both the Multi-1.aver Perceptron(MLP)-based texture classifier and Connected Component(CC)-based filtering for a definite text extraction based on the Client(PDA)'s tentativel99-y extracted results. The proposed method leads to not only efficient text extraction by using both the MLP and the CC, but also fast running time using Client(PDA)/server(PC) architecture with the pipelining processing.

A Design of Giga-bit security module Using Fully pipelined CTR-AES (Full-pipelined CTR-AES를 이용한 Giga-bit 보안모듈 설계)

  • Vinh, T.Q.;Park, Ju-Hyun;Kim, Young-Chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.05a
    • /
    • pp.225-228
    • /
    • 2008
  • In this paper, we presented our implementation of a counter mode AES based on Virtex4 FPGA. Our design exploits three advanced features: composite field arithmetic SubByte, efficient MixColumn transformation, and On-the-Fly Key-Scheduling for fully pipelined architecture. By pipelining the composite field implementation of the S-box, the area cost is reduced to average 17 percent. By designing the On-the-Fly key scheduling, we implemented an efficient key-expander module which is specialized for a pipelined architecture.

  • PDF

Enhanced Pipeline Scheduling for IA-64 (IA-64를 위한 향상된 소프트웨어 파이프라인 명령어 스케줄링)

  • Lee Jae-Mok;Moon Soo-Mook
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11a
    • /
    • pp.826-828
    • /
    • 2005
  • 인텔의 IA-64 프로세서는 명령어 수준의 병렬수행을 지원하는 EPIC (Explicitly Parallel Instruction Computing) 구조를 채택하고 있으며 컴파일러가 순차적 코드에서 병렬 수행이 가능한 독립적인 명령어들을 스케줄링 하도록 되어있다. 본 논문에서는 IA-64 스케줄링을 위해 향상된 파이프라인 스케줄링 (Enhanced Pipeline Scheduling, EPS) 기법[1]을 적용한 결과를 소개한다. EPS는 루프수준의 병렬화를 위한 소프트웨어 파이프라이닝 (software pipelining)기법으로 전역 스케줄링 (global Scheduling) 기법을 기반으로 하고 있다. 우리는 IA-64 프로세서를 위한 공개소스 컴파일러인 ORC (Open Research Compiler)에 EPS를 구현하고 실제 프로세서인 Itanium에서 실험을 수행하였다. 상용 프로세서와 컴파일러에 구현과 튜닝을 하는 과정에서 얻은 경험을 소개하고 기존의 ORC 컴파일러와 비교하여 얻은 성능 향상을 보고하고 분석한다.

  • PDF

Comparative Performance Analysis of High Speed Low Power Area Efficient FIR Adaptive Filter

  • Jaiswal, Manish
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.5
    • /
    • pp.267-270
    • /
    • 2014
  • This paper presents the comparative performance of an adaptive FIR filter for a Delayed LMS algorithm. The delayed error signal was used to obtain a Delayed LMS algorithm to allow efficient pipelining for achieving a small critical path and area efficient implementation. This paper presents hardware efficient results (device utilization parameters) and power consumed. The FPGA families (Artix-7, Virtex-7, and Kintex-7) for a low voltage perspective are shown. The synthesis results showed that the artix-7 CMOS family achieves the lowest power consumption of 1.118 mW with 83.18 % device utilization. Different Precision strategies, such as the speed optimization and power optimization, were imposed to achieve these results. The algorithm was implemented using MATLAB (2013b) and synthesized on the Leonardo spectrum.