• Title/Summary/Keyword: SIMD 구조

Search Result 70, Processing Time 0.023 seconds

Design of SIMD-DSP/PPU for a High-Performance Embedded Microprocessor (고성능 내장형 마이크로프로세서를 위한 SIMD-DSP/FPU의 설계)

  • 정우경;홍인표;이용주;이용석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.4C
    • /
    • pp.388-397
    • /
    • 2002
  • We designed a SIMD-DSP/FPU that can efficiently improve multimedia processing performance when integrated into high-performance embedded microprocessors. We proposed partitioned architectures and new schemes for several functional units to reduce chip area. Sharing functional units reduces the area of FPU significantly. The proposed architecture is modeled in HDL and synthesized with a 0.35$\mu\textrm{m}$ standard cell library. The chip area is estimated to be about 100,000 equivalent gates. The designed unit can run at higher than 50MHz clock frequency of CPU core under the worst-case operating conditions.

Design of Compiler & Variable-Length Instructions for SIMD Structured Shader (가변길이 SIMD구조 쉐이더 명령어 및 컴파일러 설계)

  • Kwak, Jae-Chang;Park, Tae-Ryoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.12
    • /
    • pp.2691-2697
    • /
    • 2010
  • Shader instructions and Compiler are designed for supporting 3D graphic shader 3.0 API. Variable-length instructions are proposed to reduce the size of hardware of graphic processor in SIMD structure by shortening the length of instructions. The designed shader compiler supports variable and two phased structured instructions, and can be programmable at ESSL level. Conformance Test proposed by Khronos group is accomplished to verify the design result of instructions and complier. The test result shows overall average 37% performance improvement at the 16 functions of basic GL shader.

A Parallel Memory Suitable for SIMD Architecture Processing High-Definition Image Haze Removal in High-Speed (고화질 영상에서 고속 안개 제거를 위한 SIMD 구조에 적합한 병렬메모리)

  • Lee, Hyung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.7
    • /
    • pp.9-16
    • /
    • 2014
  • Since the haze removal algorithm using dark channel prior was introduced, many researches for improving processing speed have been addressed even if it presented impressive results. Remarkable one is using median dark channel prior. Although it has been considered as a very attactive method, processing speed is as low as ever. So, a parallel memory model which is suitable for SIMD architecture processing haze removal on high-definition images in high-speed is introduced in this paper. The proposed parallel memory model allows to access n pixels simultaneously. It is also support stride 3, 5, 7, and 11 in order to execute convolution mask operations, e.g., median filter. The proposed parallel memory model can therefore support enough data bandwidth to process the algorithm using median dark channel prior in high-speed.

A Study on the 3 Dimension Graphics Accelerator for Phong Shading Algorithm (Phong Shading 알고리즘을 적용한 3차원 영상을 위한 고속 그래픽스 가속기 연구)

  • Park, Youn-Ok;Park, Jong-Won
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.5
    • /
    • pp.97-103
    • /
    • 2010
  • There are many algorithms for 2D to 3D graphic conversion technology which have the high complexity and large scale of iterative computation. So in this paper propose parallel algorithm and high speed graphics accelerator architecture using Park's MAMS(Multiple Access Memory System) for Phong Shading, one of many 3D algorithms. The Proposed SIMD processor architecture is simulated by HDL and simulated and got 30 times faster result. It means any kinds of 3D algorithm can make parallel algorithm and accelerated by SIMD processor with Park's MAMS for real time processing.

SIMD Instruction-based Fast HEVC RExt Decoder (SIMD 명령어 기반 HEVC RExt 복호화기 고속화)

  • Mok, Jung-Soo;Ahn, Yong-Jo;Ryu, Hochan;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.20 no.2
    • /
    • pp.224-237
    • /
    • 2015
  • In this paper, we introduce the fast decoding method with the SIMD (Single Instruction Multiple Data) instructions for HEVC RExt (High Efficiency Video Coding Range Extensions). Several tools of HEVC RExt such as intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules can be classified as the proper modules for applying the SIMD instructions. In consideration of bit-depth increasement of RExt, intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules are accelerated by SSE (Streaming SIMD Extension) instructions. In addition, we propose effective implementations for interpolation filter, inverse-quantization, and clipping modules by utilizing a set of AVX2 (Advanced Vector eXtension 2) instructions that can use 256 bits register. The evaluation of the proposed methods were performed on the private HEVC RExt decoder developed based on HM 16.0. The experimental results show that the developed RExt decoder reduces 12% average decoding time, compared with the conventional sequential method.

The Design of low-cost SIMD MAC/MAS for Embedded Systems (임베디드 시스템을 위한 저비용 SIMD MAC/MAS 블록 설계)

  • Lee Yong Joo;Jung Jin Woo;Lee Yong Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.10C
    • /
    • pp.1460-1468
    • /
    • 2004
  • In this paper, we developed a low-area and low-cost SIMD MAC/MAS(Single Instruction Multiple Data Multiply and ACcumulate/Multiply And Subtract) for multimedia that is used much in real life. We compared the result of this research with a previously developed more large and high performance SIMD MAC/MAS. This paper is consist of 5 parts, which are an introduction, the contents of designing SIMD MAC/MAS hardware, a special qualities for previous works, the result of synthesis and conclusion. The design result reduced by size 32% of whole hardware than 64 bit SIMD MAC/MAS block of designed for high performance. This improved ISA (Instruction Set Architecture) to be suitable to embedded DSP(Digital Signal Processor), and shortened bit range of 64-bit data to 32-bit and implement more optimally.

SIMD instruction-based fast HEVC interpolation filter for high bit-depth (High bit-depth 를 위한 SIMD 명령어 기반 HEVC 보간 필터 고속화)

  • Mok, Jung-Soo;Ahn, Yong-Jo;Ryu, Hochan;Sim, Dong-Gyu
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2014.11a
    • /
    • pp.200-202
    • /
    • 2014
  • 본 논문은 High bit-depth 를 위한 SIMD (Single Instruction, Multiple Data) 명령어 기반 보간 필터 고속화 방법을 제안한다. 픽셀 연산을 기반으로 하는 보간 필터링은 HEVC 복호화기에서 높은 복잡도를 차지하고 있지만 반복적인 산술연산을 수행하기 때문에 SIMD 를 이용한 고속화에 적합한 구조를 가지고 있다. 이러한 이유로 본 논문에서는 보간 필터 연산에 대하여 SIMD 명령어를 이용하여 메모리를 효율적으로 사용하여 고속화하는 방법을 제안한다. 제안하는 기술은 HEVC 참조 소프트웨어 HM 12.0-RExt 4.1 에 기반을 둔 ANSI C 기반 자체 개발 HEVC RExt 복호화기 소프트웨어에서 평균 8.5%의 복호화 속도향상을 보였으며, 보간 필터의 수행 시간을 평균 24.8% 향상시켰다.

  • PDF

Optimized Implementing A new fast secure hash function LSH using SIMD supported by the Intel CPU (Intel CPU에서 지원하는 SIMD를 이용한 고속해시함수 LSH 최적화 구현)

  • Song, Haeng-Gwon;Lee, Ok-yeon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.701-704
    • /
    • 2015
  • 해시함수는 사회 전반에 걸쳐 무결성 및 인증을 제공하기 위하여서 사용하는 함수로써 암호학적으로 중요한 함수이다. 본 논문에서는 2014년 국가보안기술 연구소에서 개발한 해시함수 LSH를 하드웨어적인 구현이 아닌 소프트웨어적인 구현을 수행하였고 또한 Intel CPU 상에서 동작하는 SIMD 기법인 SSE를 이용하여 LSH 알고리즘의 최적화 구현을 수행한다. 고속해시함수 LSH 알고리즘에서 사용하는 주 연산은 ARX(Addition Rotation, Xor)연산으로 SIMD를 적용하기에 용이한 구조로 되어 있다. 본 논문에서는 기존 32 비트 단위의 연산을 수행하는 LSH 알고리즘을 SIMD를 이용하여 128비트 단위의 연산을 수행 하도록 개발하였다. 그 결과 Intel Xeon CPU에서 SIMED를 적용한 결과 적용하지 않은 LSH 알고리즘보다 최대 2.79배의 성능의 향상을 확인 할 수 있다.

A Design of a Shader Processor based on a dual-phase pipeline architecture (듀얼 페이즈 명령어 파이프라인구조의 쉐이더 프로세서 설계)

  • Jeong, Hyung-Ki;Nam, Ki-Hun;Lee, Gwang-Yeob
    • Journal of IKEEE
    • /
    • v.12 no.4
    • /
    • pp.246-254
    • /
    • 2008
  • This paper represents a design of a 4 way SIMD processor with multi-thread and dual phase instruction pipeline. 8 threads can be performing in round-robin order, so any hazards can’t occur. The dual phase pipeline makes a pipeline operate as two pipelines, and it can fetch maximum 4 unit instructions at once. This variable length instruction set divide into first phase and second phase instructions, and with this function, complex branch and addressing can be executed at one clock cycle. This processor reduces the code size to quarter, pull out the doubled performance improvement than normal SIMD architecture.

  • PDF

A Reconfigurable Parallel Processor for Efficient Processing of Mobile Multimedia (모바일 멀티미디어의 효율적 처리를 위한 재구성형 병렬 프로세서의 구조)

  • Yoo, Se-Hoon;Kim, Ki-Chul;Yang, Yil-Suk;Roh, Tae-Moon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.10
    • /
    • pp.23-32
    • /
    • 2007
  • This paper proposes a reconfigurable parallel processor architecture which can efficiently implement various multimedia applications, such as 3D graphics, H.264/H.263/MPEG-4, JPEG/JPEG2000, and MP3. The proposed architecture directly connects memories and processors so that memory access time and power consumption are reduced. It supports floating-point operations needed in the geometry stage of 3D graphics. It adopts partitioned SIMD to reduce hardware costs. Conditional execution of instructions is used for easy development of parallel algorithms.