• Title/Summary/Keyword: SIMD Computer

Search Result 64, Processing Time 0.041 seconds

SIMD Optimization for Improving the Performance of a CPU-Based Graph Engine (SIMD 최적화를 이용한 CPU 기반 그래프 엔진의 성능 개선)

  • Ikhyeon Jo;Myung-Hwan Jang;Sang-Wook Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.383-385
    • /
    • 2023
  • Single-machine-based 그래프 엔진의 state-of-the-art 모델인 RealGraph 는 쓰레드를 이용한 병렬화로 성능을 향상하였으나 쓰레드 내부에서의 병렬성은 고려되지 않았다. 본 논문은 SIMD 명령어를 이용해 RealGraph 의 병렬성을 향상시켰다. 쓰레드 내부의 효율성을 높이기 위해 RealGraph 의 구조와 그래프 알고리즘의 분석을 통한 SIMD 명령어의 적용 가능한 영역을 탐색하였다. 실험으로 SIMD 명령어의 적용을 통해 쓰레드 내부에서 벡터 연산을 수행하여 평균 7.6%, 11.7%, 9.2%의 수행 시간 단축을 이끌어냈으며 SIMD 명령어의 적용이 그래프 엔진의 분석 성능에 얼마나 도움이 될 수 있는지 확인하였다.

SIMD Instruction-based Fast HEVC RExt Decoder (SIMD 명령어 기반 HEVC RExt 복호화기 고속화)

  • Mok, Jung-Soo;Ahn, Yong-Jo;Ryu, Hochan;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.20 no.2
    • /
    • pp.224-237
    • /
    • 2015
  • In this paper, we introduce the fast decoding method with the SIMD (Single Instruction Multiple Data) instructions for HEVC RExt (High Efficiency Video Coding Range Extensions). Several tools of HEVC RExt such as intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules can be classified as the proper modules for applying the SIMD instructions. In consideration of bit-depth increasement of RExt, intra prediction, interpolation, inverse-quantization, inverse-transform, and clipping modules are accelerated by SSE (Streaming SIMD Extension) instructions. In addition, we propose effective implementations for interpolation filter, inverse-quantization, and clipping modules by utilizing a set of AVX2 (Advanced Vector eXtension 2) instructions that can use 256 bits register. The evaluation of the proposed methods were performed on the private HEVC RExt decoder developed based on HM 16.0. The experimental results show that the developed RExt decoder reduces 12% average decoding time, compared with the conventional sequential method.

Implementation of SIMD-based Many-Core Processor for Efficient Image Data Processing (효율적인 영상데이터 처리를 위한 SIMD기반 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.1
    • /
    • pp.1-9
    • /
    • 2011
  • Recently, as mobile multimedia devices are used more and more, the needs for high-performance and low-energy multimedia processors are increasing. Application-specific integrated circuits (ASIC) can meet the needed high performance for mobile multimedia, but they provide limited, if any, generality needed for various application requirements. DSP based systems can used for various types of applications due to their generality, but they require higher cost and energy consumption as well as less performance than ASICs. To solve this problem, this paper proposes a single instruction multiple data (SIMD) based many-core processor which supports high-performance and low-power image data processing while keeping generality. The proposed SIMD based many-core processor composed of 16 processing elements (PEs) exploits large data parallelism inherent in image data processing. Experimental results indicate that the proposed SIMD-based many-core processor higher performance (22 times better), energy efficiency (7 times better), and area efficiency (3 times better) than conversional commercial high-performance processors.

PC-Based Realtime Implementation of H.263 CODEC Using SIMD Method (SIMD기법에 의한 H.263 코덱의 PC기반 실시간 구현)

  • 하교동;남수영;김남철
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.947-950
    • /
    • 2001
  • This paper implements H.263 codec using SIMD(single instruction multiple data) method in real time based on PC. This system uses INS algorithm previously proposed by the authors as motion estimation module. SIMD method is used in DCT, IDCT, quantization, motion estimation, and display module. The developed algorithms are implemented using TMN5. Using the above algorithm, H.263 Codec can communicate more than 15 frames/sec in CIF resolution on a Pentium-IV 1.7GHz computer.

  • PDF

Hardware Implementation of Rasterizer with SIMD Architecture Applicable to Mobile 3D Graphics System (모바일 3차원 그래픽스 시스템에 적용 가능한 SIMD 구조를 갖는 래스터라이저의 하드웨어 구현)

  • Ha, Chang-Soo;Sung, Kwang-Ju;Choi, Byeong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.313-315
    • /
    • 2010
  • In this paper, we describe research results of developing hardware rasterizer that is applicable to mobile 3D graphics system, designed in SIMD architecture and verified in FPGA. Tile-based scan conversion unit is designed like SIMD architecture running four tiles simultaneously and each tile traverses pixels hierarchical in 3-level so that visiting counts is minimized. As experimental results, $8{\times}8$ is the most efficient size of tile and the last step of tile traversing is performed on $2{\times}2$ sized subtile. The rasterizer supports flat shading and gouraud shading and texture mapper supports affine mapping and perspective corrected mapping. Also, texture mapper supports point sampling mode and bilinear interpolating sampling mode and two types of wrapping modes and various blending modes. The rasterzer operates as 120Mhz on xilinx vertex4 $l{\times}100$ device. To easy verification, texture memory and frame buffer are generated as block rom and block ram.

  • PDF

Parallel Factorization using Quadratic Sieve Algorithm on SIMD machines (SIMD상에서의 이차선별법을 사용한 병렬 소인수분해 알고리즘)

  • Kim, Yang-Hee
    • The KIPS Transactions:PartA
    • /
    • v.8A no.1
    • /
    • pp.36-41
    • /
    • 2001
  • In this paper, we first design an parallel quadratic sieve algorithm for factoring method. We then present parallel factoring algorithm for factoring a large odd integer by repeatedly using the parallel quadratic sieve algorithm based on the divide-and-conquer strategy on SIMD machines with DMM. We show that this algorithm is optimal in view of the product of time and processor numbers.

  • PDF

A Speed-up Method of HOG Pedestrian Detector in Advanced SIMD Architecture (Advanced SIMD 아키텍처에서의 HOG 보행자 검출기 고속화 방법)

  • Kwon, Ki-Pyo;Lee, Jae-Heung
    • Journal of IKEEE
    • /
    • v.18 no.1
    • /
    • pp.106-113
    • /
    • 2014
  • A pedestrian detector can be applied for various purposes such as monitoring or counting the number of people in some place, or detecting the people plunging in the driveway. There was a lot of related research. But, the detection speed is slow in embedded system because of the limited computing power. An algorithm for fast pedestrian detector using HOG in ARM SIMD architecture is presented in this paper. There is a way to quickly remove the background of image and to improve the detection speed using NEON parallel technique. When we tested with INRIA Person Dataset, the proposed pedestrian detector improves the speed by 3.01 times than previous one.

Acceleration Method of Inter Prediction using Advanced SIMD (Advanced SIMD를 이용한 화면 간 예측 고속화방법)

  • Kim, Wan-Su;Lee, Jae-Heung
    • Journal of IKEEE
    • /
    • v.16 no.4
    • /
    • pp.382-388
    • /
    • 2012
  • An H.264/AVC fast motion estimation methodology is presented in this paper. Advanced SIMD based NEON which is one of the parallel processing methods is supported under the ARM Cortex-A9 dual-core platform. NEON is applied to a full search technique with one of the various motion estimation methods and SAD operation count of each macroblock is reduced to 1/4. Pixel values of the corresponding macroblock are assigned to eight 16-bit NEON registers and Intrinsic function in NEON architecture carried out 128 bits arithmetic operations at the same time. In this way, the exact motion vector with the minimum SAD value among the calculated SAD values can be designated. Experimental results show that performance gets improved 30% above average in accordance with the size of image and macroblock.

Photon Mapping SIMD Processor Design using Reconfigurable Cell (재구성 Cell을 이용한 Photon mapping SIMD프로세서 설계)

  • Ryu, Hyun-Woo;Kim, Young-Jin;Lee, Hyon-Soo
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.719-722
    • /
    • 2005
  • The synthesis of the 3D images is the most important part of the virtual reality. The photon mapping is the best method for reality in the 3D graphics. This paper presents an architecture for photon mapping applications on SOC devices. The proposed architecture reduces the computation time to photonmap search and radiance estimation. Also this architecture is implemented by a SIMD processor which trades parallelism for frequency of operation.

  • PDF

A Parallel Memory Suitable for SIMD Architecture Processing High-Definition Image Haze Removal in High-Speed (고화질 영상에서 고속 안개 제거를 위한 SIMD 구조에 적합한 병렬메모리)

  • Lee, Hyung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.7
    • /
    • pp.9-16
    • /
    • 2014
  • Since the haze removal algorithm using dark channel prior was introduced, many researches for improving processing speed have been addressed even if it presented impressive results. Remarkable one is using median dark channel prior. Although it has been considered as a very attactive method, processing speed is as low as ever. So, a parallel memory model which is suitable for SIMD architecture processing haze removal on high-definition images in high-speed is introduced in this paper. The proposed parallel memory model allows to access n pixels simultaneously. It is also support stride 3, 5, 7, and 11 in order to execute convolution mask operations, e.g., median filter. The proposed parallel memory model can therefore support enough data bandwidth to process the algorithm using median dark channel prior in high-speed.