• Title/Summary/Keyword: SIMD Architecture

Search Result 60, Processing Time 0.038 seconds

A Study on the 3 Dimension Graphics Accelerator for Phong Shading Algorithm (Phong Shading 알고리즘을 적용한 3차원 영상을 위한 고속 그래픽스 가속기 연구)

  • Park, Youn-Ok;Park, Jong-Won
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.5
    • /
    • pp.97-103
    • /
    • 2010
  • There are many algorithms for 2D to 3D graphic conversion technology which have the high complexity and large scale of iterative computation. So in this paper propose parallel algorithm and high speed graphics accelerator architecture using Park's MAMS(Multiple Access Memory System) for Phong Shading, one of many 3D algorithms. The Proposed SIMD processor architecture is simulated by HDL and simulated and got 30 times faster result. It means any kinds of 3D algorithm can make parallel algorithm and accelerated by SIMD processor with Park's MAMS for real time processing.

Programming Model for SODA-II: a Baseband Processor for Software Defined Radio Systems (SDR용 기저대역 프로세서를 위한 프로그래밍 모델)

  • Lee, Hyun-Seok;Yi, Joon-Hwan;Oh, Hyuk-Jun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.7
    • /
    • pp.78-86
    • /
    • 2010
  • This paper discusses the programming model of SODA-II that is a baseband processor for software defined radio (SDR) systems. Signal processing On-Demand Architecture Ⅱ (SODA-II) is an on-chip multiprocessor architecture consisting of four processor cores and each core has both an wide SIMD datapath and a scalar datapath. This architecture is appropriate for baseband processing that is a mixture of vector computations and scalar computations. The programming model of the SODA-II is based on C library routines. Because the library routines hide the details of complex SIMD datapath control procedures, end users can easily program the SODA-II without deep understanding on its architecture. In this paper, we discuss the details of library routines and how these routines are exploited in the implementation of baseband signal processing algorithms. As application examples, we show the implementation result of W-CDMA multipath searcher and OFDM demodulator on the SODA-II.

An Implementation of 3D Graphic Accelerator for Phong Shading (퐁 음영법을 위한 3차원 그래픽 가속기의 구현)

  • Lee, Hyung;Park, Youn-Ok;Park, Jong-Won
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.5
    • /
    • pp.526-534
    • /
    • 2000
  • There have been many researches on the 3D graphic accelerator for high speed by needs of CAD/CAM,3D modeling, virtual reality or medical image. In this paper, an SIMD processor architecture for 3D graphic accelerator is proposed in order to improve the processing time of the 3D graphics, and a parallel Phong shading algorithm is presented to estimate performance of the proposed architecture. The proposed SIMD processor architecture for 3D graphic accelerator consists of PCI local bus interface, 16 Processing Elements (PE's), and Park's multi-access memory system (NAMS) that has 17 memory modules. A serial algorithm for Phong shading is modified for the architecture and the main key is to divide a polygon into $4\times{4}$ squares. And, for processing a square, 4 PE's are regarded as a PE Grou logically. Since MAMS can support block access type with interval 1, it is possible that 4 PE Groups process a square at a time. In consequence, 16 pixels are processed simultaneously. The proposed SIMD processor architecture is simulated by CADENCE Verilog-XL that is a package for the hardware simulation. With the same simulated results as that of the serial algorithm, the speed enhancement by the parallel algorithm to the serial one is 5.68.

  • PDF

FPGA Implementation of Scan Conversion Unit using SIMD Architecture and Hierarchical Tile-based Traversing Method (계층적 타일기반 탐색기법과 SIMD 구조가 적용된 스캔변환회로의 FPGA 구현)

  • Ha, Chang-Soo;Choi, Byeong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.9
    • /
    • pp.2023-2030
    • /
    • 2010
  • In this paper, we present research results of developing high performance scan conversion unit and implementing it on FPGA chip. To increase performance of scan conversion unit, we propose an architecture of scan converter that is a SIMD architecture and uses tile-based traversing method. The proposed scan conversion unit can operate about 124Mhz clock frequency on Xilinx Vertex4 LX100 device. To verify the scan conversion unit, we also develop shader unit, texture mapping unit and $240{\times}320$ color TFT-LCD controller to display outputs of the scan conversion unit on TFT-LCD. Because the scan conversion unit implemented on FPGA has 311Mpixels/sec pixel rate, it is applicable to desktop pc's 3d graphics system as well as mobile 3d graphics system needing high pixel rates.

A Design of Modified SIMD Architecture for block level parallelism (Block level parallelism 을 위한 수정된 SIMD Architecture 설계)

  • Jonghee Youn;Daeho Kim;Minwook Ahn;Youngkyu Choi;Hokyun Kim;Seungjun Yang;Yunheung Paek
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.857-859
    • /
    • 2008
  • 미디어 어플리케이션, 특히 비디오 어플리케이션의 경우 커널 코드를 얼마나 효과적으로 처리하느냐에 따라 전체적인 성능에 큰 차이가 생긴다. 이러한 커널 코드를 효과적으로 처리하기 위해, 일반적인 DSP co-processor 에 SIMD 구조를 추가한 아키텍처를 설계하여 비디오 어플리케이션의 전체적인 성능을 향상할 수 있도록 하였다.

A Design of a Shader Processor based on a dual-phase pipeline architecture (듀얼 페이즈 명령어 파이프라인구조의 쉐이더 프로세서 설계)

  • Jeong, Hyung-Ki;Nam, Ki-Hun;Lee, Gwang-Yeob
    • Journal of IKEEE
    • /
    • v.12 no.4
    • /
    • pp.246-254
    • /
    • 2008
  • This paper represents a design of a 4 way SIMD processor with multi-thread and dual phase instruction pipeline. 8 threads can be performing in round-robin order, so any hazards can’t occur. The dual phase pipeline makes a pipeline operate as two pipelines, and it can fetch maximum 4 unit instructions at once. This variable length instruction set divide into first phase and second phase instructions, and with this function, complex branch and addressing can be executed at one clock cycle. This processor reduces the code size to quarter, pull out the doubled performance improvement than normal SIMD architecture.

  • PDF

A Reconfigurable Parallel Processor for Efficient Processing of Mobile Multimedia (모바일 멀티미디어의 효율적 처리를 위한 재구성형 병렬 프로세서의 구조)

  • Yoo, Se-Hoon;Kim, Ki-Chul;Yang, Yil-Suk;Roh, Tae-Moon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.10
    • /
    • pp.23-32
    • /
    • 2007
  • This paper proposes a reconfigurable parallel processor architecture which can efficiently implement various multimedia applications, such as 3D graphics, H.264/H.263/MPEG-4, JPEG/JPEG2000, and MP3. The proposed architecture directly connects memories and processors so that memory access time and power consumption are reduced. It supports floating-point operations needed in the geometry stage of 3D graphics. It adopts partitioned SIMD to reduce hardware costs. Conditional execution of instructions is used for easy development of parallel algorithms.

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.305-317
    • /
    • 2008
  • As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.

Design of SIMD-DSP/PPU for a High-Performance Embedded Microprocessor (고성능 내장형 마이크로프로세서를 위한 SIMD-DSP/FPU의 설계)

  • 정우경;홍인표;이용주;이용석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.4C
    • /
    • pp.388-397
    • /
    • 2002
  • We designed a SIMD-DSP/FPU that can efficiently improve multimedia processing performance when integrated into high-performance embedded microprocessors. We proposed partitioned architectures and new schemes for several functional units to reduce chip area. Sharing functional units reduces the area of FPU significantly. The proposed architecture is modeled in HDL and synthesized with a 0.35$\mu\textrm{m}$ standard cell library. The chip area is estimated to be about 100,000 equivalent gates. The designed unit can run at higher than 50MHz clock frequency of CPU core under the worst-case operating conditions.

NTGST-Based Parallel Computer Vision Inspection for High Resolution BLU (NTGST 병렬화를 이용한 고해상도 BLU 검사의 고속화)

  • 김복만;서경석;최흥문
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.19-24
    • /
    • 2004
  • A novel fast parallel NTGST is proposed for high resolution computer vision inspection of the BLUs in a LCD production line. The conventional computation- intensive NTGST algorithm is modified and its C codes are optimized into fast NTGST to be adapted to the SIMD parallel architecture. And then, the input inspection image is partitioned and allocated to each of the P processors in multi-threaded implementation, and the NTGST is executed on SIMD architecture of N data items simultaneously in each thread. Thus, the proposed inspection system can achieve the speedup of O(NP). Experiments using Dual-Pentium III processor with its MMX and extended MMX SIMD technology show that the proposed parallel NTGST is about Sp=8 times faster than the conventional NTGST, which shows the scalability of the proposed system implementation for the fast, high resolution computer vision inspection of the various sized BLUs in LCD production lines.