• Title/Summary/Keyword: Vector instruction

Search Result 34, Processing Time 0.024 seconds

Study on LLVM application in Parallel Computing System (병렬 컴퓨팅 시스템에서 LLVM 응용 연구)

  • Cho, Jungseok;Cho, Doosan;Kim, Yongyeon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.1
    • /
    • pp.395-399
    • /
    • 2019
  • In order to support various parallel computing systems, it is necessary to extend LLVM IR to more efficiently support vector / matrix and to design LLVM IR to machine code as a new algorithm. As shown in the IR example, RISC instruction generation is naturally generated because the RISC instruction is basically composed of the RISC instruction, and the vector instruction is also not supported. There is a need for new IR structures, command generation algorithms and related extensions to support vector / matrix more robustly. To do this, it is important to map each instruction in the LLVM IR to the appropriate instruction in the target architecture (vector / matrix) (instruction selection algorithm). It is necessary to understand the meaning of LLVM IR command, to compare the meaning of each instruction of the target architecture with syntax, and to select the instruction that matches the pattern to make mapping efficient.

Design of Vector Register Architecture in DSP Processor for Efficient Multimedia Processing

  • Wu, Chou-Pin;Wu, Jen-Ming
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.7 no.4
    • /
    • pp.229-234
    • /
    • 2007
  • In this paper, we present an efficient instruction set architecture using vector register file hardware to accelerate operation of general matrix-vector operations in DSP microprocessor. The technique enables in-situ row-access as well as column access to the register files. It can reduce the number of memory access significantly. The technique is especially useful for block-based video signal processing kernels such as FFT/IFFT, DCT/IDCT, and two-dimensional filtering. We have applied the new instruction set architecture to in-loop deblocking filter processing in H.264 decoder. Performance comparisons show that the required load/store operations for the in-loop deblocking filter can be reduced about 42%. The architecture would improve the processing speed, and code density in DSP microprocessor especially for video signal processing substantially.

An Architecture of Vector Processor Concept using Dimensional Counting Mechanism of Structured Data (구조성 데이터의 입체식 계수기법에 의한 벡터 처리개념의 설계)

  • Jo, Yeong-Il;Park, Jang-Chun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.1
    • /
    • pp.167-180
    • /
    • 1996
  • In the scalar processing oriented machine scalar operations must be performed for the vector processing as many as the number of vector components. So called a vector processing mechanism by the von Neumann operational principle. Accessing vector data hasto beperformed by theevery pointing ofthe instruction or by the address calculation of the ALU, because there is only a program counter(PC) for the sequential counting of the instructions as a memory accessing device. It should be here proposed that an access unit dimensionally to address components has to be designed for the compensation of the organizational hardware defect of the conventional concept. The necessity for the vector structuring has to be implemented in the instruction set and be performed in the mid of the accessing data memory overlapped externally to the data processing unit at the same time.

  • PDF

A Vector Instruction-based RISC Architecture for a Photovoltaic System Monitoring Camera

  • Choi, Youngho;Ahn, Hyungkeun
    • Transactions on Electrical and Electronic Materials
    • /
    • v.13 no.6
    • /
    • pp.278-282
    • /
    • 2012
  • Photovoltaic systems have emerged to be one of the cleanest energy systems. Therefore, many large scale solar parks and PV farms have been built to prepare for the post fossil fuel ages. However, due to their large scale, to efficiently manage and operate PV systems, they need to be visually monitored within the range of infrared ray through the Internet. To satisfy this need, the efficient implementation of a high performance video compression standard is required. This paper therefore presents an implementation of H.264 motion estimation, which is one of the most data-intensive and complicated functions in H.264. To achieve this, this work implements vector instructions in hardware and incorporates them in a generic RISC processor architecture, thus increasing the processing speed while minimizing hardware and software design efforts. Extensive simulation results show that this proposed implementation can process motion estimations up to 13 times faster.

Instruction Queue Architecture for Low Power Microprocessors (마이크로프로세서 전력소모 절감을 위한 명령어 큐 구조)

  • Choi, Min;Maeng, Seung-Ryoul
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.11
    • /
    • pp.56-62
    • /
    • 2008
  • Modern microprocessors must deliver high application performance, while the design process should not subordinate power. In terms of performance and power tradeoff, the instructions window is particularly important. This is because a large instruction window leads to achieve high performance. However, naive scaling conventional instruction window can severely affect the complexity and power consumption. This paper explores an architecture level approach to reduce power dissipation. We propose a low power issue logic with an efficient tag translation. The direct lookup table (DTL) issue logic eliminates the associative wake-up of conventional instruction window. The tag translation scheme deals with data dependencies and resource conflicts by using bit-vector based structure. Experimental results show that, for SPEC2000 benchmarks, the proposed design reduces power consumption by 24.45% on average over conventional approach.

Simulation on a test vector Implementation of a pipeline processor using a HDL (HDL을 이용한 파이프라인 프로세서의 테스트 벡터 구현에 의한 시뮬레이션)

  • 박두열
    • Journal of the Korea Society of Computer and Information
    • /
    • v.5 no.3
    • /
    • pp.16-28
    • /
    • 2000
  • In this paper, we implemented by describing a pipeline processor using a HDL in functional level, simulated and verified it's operation. When simulating a implemented processor. We first specify assembly instruction that is Performed in the processor. entered by programming using the instruction sets at the experimental framework. Thus, the procedure that is presented in this paper can easily identify and verify the purpose for implementation and operation of a system by using test vector. Also, it was possible that exactly simulate a system. The method was comfortable that document a system operation to implement.

  • PDF

A Vectorization Technique at Object Code Level (목적 코드 레벨에서의 벡터화 기법)

  • Lee, Dong-Ho;Kim, Ki-Chang
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.5
    • /
    • pp.1172-1184
    • /
    • 1998
  • ILP(Instruction Level Parallelism) processors use code reordering algorithms to expose parallelism in a given sequential program. When applied to a loop, this algorithm produces a software-pipelined loop. In a software-pipelined loop, each iteration contains a sequence of parallel instructions that are composed of data-independent instructions collected across from several iterations. For vector loops, however the software pipelining technique can not expose the maximum parallelism because it schedules the program based only on data-dependencies. This paper proposes to schedule differently for vector loops. We develop an algorithm to detect vector loops at object code level and suggest a new vector scheduling algorithm for them. Our vector scheduling improves the performance because it can schedule not only based on data-dependencies but on loop structure or iteration conditions at the object code level. We compare the resulting schedules with those by software-pipelining techniques in the aspect of performance.

  • PDF

Implementation of Mobile WiMAX Receiver using Mobile Computing Platform for SDR System (모바일 컴퓨팅 플랫폼을 이용한 SDR 기반 MOBILE WIMAX 수신기 구현)

  • Kim, Han Taek;Ahn, Chi Young;Kim, June;Choi, Seung Won
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.8 no.1
    • /
    • pp.117-123
    • /
    • 2012
  • This paper implements mobile Worldwide Interoperability for Microwave Access (WiMAX) receiver using Software Defined Radio (SDR) technology. SDR system is difficult to implement on the mobile handset because of restrictions that are computing power and under space constraints. The implemented receiver processes mobile WiMAX software modem on Open Multimedia Application Platform (OMAP) System on Chip (SoC) and Field Programmable Gate Array (FPGA). OMAP SoC is composed of ARM processor and Digital Signal Processor (DSP). ARM processor supports Single Instruction Multiple Data (SIMD) instruction which could operate on a vector of data with a single instruction and DSP is powerful image and video accelerators. For this reason, we suggest the possibility of SDR technology in the mobile handset. In order to verify the performance of the mobile WiMAX receiver, we measure the software modem runtime respectively. The experimental results show that the proposed receiver is able to do real-time signal processing.

Is vector theory prior to matrix theory in teaching of linear algebra (선형대수학의 학습에서 벡터이론은 행렬이론보다 선행되어야 하는가)

  • Pak, Hong-Kyung;Kim, Tae-Wan
    • Journal for History of Mathematics
    • /
    • v.23 no.2
    • /
    • pp.89-99
    • /
    • 2010
  • Today linear algebra is one of compulsory courses for university mathematics by virtue of its theoretical fundamentals and fruitful applications. Vector theory and matrix theory constitute of main topics in linear algebra. In the present paper we consider the question which of the two topics is prior in teaching of linear algebra. We suggest that vector theory should be prior to matrix theory contrary to the historical order of them.

SIMD Instruction-based Fast HEVC RExt Decoder (SIMD 명령어 기반 HEVC RExt 복호화기 고속화)

  • Mok, Jung-Soo;Ahn, Yong-Jo;Ryu, Hochan;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.20 no.2
    • /
    • pp.224-237
    • /
    • 2015
  • In this paper, we introduce the fast decoding method with the SIMD (Single Instruction Multiple Data) instructions for HEVC RExt (High Efficiency Video Coding Range Extensions). Several tools of HEVC RExt such as intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules can be classified as the proper modules for applying the SIMD instructions. In consideration of bit-depth increasement of RExt, intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules are accelerated by SSE (Streaming SIMD Extension) instructions. In addition, we propose effective implementations for interpolation filter, inverse-quantization, and clipping modules by utilizing a set of AVX2 (Advanced Vector eXtension 2) instructions that can use 256 bits register. The evaluation of the proposed methods were performed on the private HEVC RExt decoder developed based on HM 16.0. The experimental results show that the developed RExt decoder reduces 12% average decoding time, compared with the conventional sequential method.