• Title/Summary/Keyword: SIMD Computer

Search Result 64, Processing Time 0.026 seconds

On the Conceptual Design of the SIMD Vector Machine Attachable to SISD Machine (SISD 머신에 부착 가능한 SIMD 벡터 머신의 개념적 설계)

  • Cho Young-Il;Ko Young-Woong
    • The KIPS Transactions:PartA
    • /
    • v.12A no.3 s.93
    • /
    • pp.263-272
    • /
    • 2005
  • The addressing mode for data is performed by the software in yon Neumann-concept(SISD) computer a priori without hardware design of an address counter for operands. Therefore, in the addressing mode for the vector the corresponding variables as much as the number of the elements should be specified and used also in the software method. This is because not for operand but only for an instructions, quasi PC(program counter) is designed in hardware physically. A vector has a characteristic of a structural dimension. In this paper we propose to design a hardware unit physically external to the CPU for addressing only the elements of a vector unit with the structure and dimension. Because of the high speed performance for a vector processing it should be designed in the SIMD pipeline mechanics. The proposed mechanics is evaluated through a simulation. Our result shows $12\%$ to $30\%$ performance enhancement over CRAY architecture under the same hardware consideration(processing unit).

NTGST-Based Parallel Computer Vision Inspection for High Resolution BLU (NTGST 병렬화를 이용한 고해상도 BLU 검사의 고속화)

  • 김복만;서경석;최흥문
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.19-24
    • /
    • 2004
  • A novel fast parallel NTGST is proposed for high resolution computer vision inspection of the BLUs in a LCD production line. The conventional computation- intensive NTGST algorithm is modified and its C codes are optimized into fast NTGST to be adapted to the SIMD parallel architecture. And then, the input inspection image is partitioned and allocated to each of the P processors in multi-threaded implementation, and the NTGST is executed on SIMD architecture of N data items simultaneously in each thread. Thus, the proposed inspection system can achieve the speedup of O(NP). Experiments using Dual-Pentium III processor with its MMX and extended MMX SIMD technology show that the proposed parallel NTGST is about Sp=8 times faster than the conventional NTGST, which shows the scalability of the proposed system implementation for the fast, high resolution computer vision inspection of the various sized BLUs in LCD production lines.

A Sclable Parallel Labeling Algorithm on Mesh Connected SIMD Computers (메쉬 구조형 SIMD 컴퓨터 상에서 신축적인 병렬 레이블링 알고리즘)

  • 박은진;이갑섭성효경최흥문
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.731-734
    • /
    • 1998
  • A scalable parallel algorithm is proposed for efficient image component labeling with local operatos on a mesh connected SIMD computer. In contrast to the conventional parallel labeling algorithms, where a single pixel is assigned to each PE, the algorithm presented here is scalable and can assign m$\times$m pixel set to each PE according to the input image size. The assigned pixel set is converted to a single pixel that has representative value, and the amount of the required memory and processing time can be highly reduced. For N$\times$N image, if m$\times$m pixel set is assigned to each PE of P$\times$P mesh, where P=N/m, the time complexity due to the communication of each PE and the computation complexity are reduced to O(PlogP) bit operations and O(P) bit operations, respectively, which is 1/m of each of the conventional method. This method also diminishes the amount of memory in each PE to O(P), and can decrease the number of PE to O(P2) =Θ(N2/m2) as compared to O(N2) of conventional method. Because the proposed parallel labeling algorithm is scalable, we can adapt to the increase of image size without the hardware change of the given mesh connected SIMD computer.

  • PDF

A Reconfigurable Parallel Processor for Efficient Processing of Mobile Multimedia (모바일 멀티미디어의 효율적 처리를 위한 재구성형 병렬 프로세서의 구조)

  • Yoo, Se-Hoon;Kim, Ki-Chul;Yang, Yil-Suk;Roh, Tae-Moon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.10
    • /
    • pp.23-32
    • /
    • 2007
  • This paper proposes a reconfigurable parallel processor architecture which can efficiently implement various multimedia applications, such as 3D graphics, H.264/H.263/MPEG-4, JPEG/JPEG2000, and MP3. The proposed architecture directly connects memories and processors so that memory access time and power consumption are reduced. It supports floating-point operations needed in the geometry stage of 3D graphics. It adopts partitioned SIMD to reduce hardware costs. Conditional execution of instructions is used for easy development of parallel algorithms.

An Optimization Method of Motion Estimation using Advanced SIMD (Advanced SIMD를 이용한 움직임 추정 최적화 방법)

  • Kim, Wan-Su;Lee, Jae-Heung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.54-56
    • /
    • 2012
  • 최근 CPU의 코어 클럭을 높이는 대신 동일한 클럭의 코어 수를 늘림으로써 성능을 향상시키고 전력 소모도 줄이는 멀티코어가 등장하고 있다. 이러한 멀티코어 플랫폼의 등장으로 인해 해당 코어들의 자원을 효율적으로 사용하여 동시에 처리하는 병렬처리 기법에 관한 연구가 활발히 진행되고 있다. 본 논문에서는 병렬처리 기법의 종류 중 하나인 Advanced SIMD기반의 NEON을 적용한 고속화 ME 방법론을 연구 및 제안하였다. 최소화 SAD를 구하고 정확한 모션벡터를 선정하기 위해 다양한 ME 방법 중 전역탐색기법을 NEON에 적용하여 동시에 128비트씩 연산을 수행하였다. 그 결과 영상의 크기에 따라 계산 성능이 최대 60% 이상 향상되는 효과를 검증하였다.

Pipelined Parallel Processing System for Image Processing (영상처리를 위한 Pipelined 병렬처리 시스템)

  • Lee, Hyung;Kim, Jong-Bae;Choi, Sung-Hyk;Park, Jong-Won
    • Journal of IKEEE
    • /
    • v.4 no.2 s.7
    • /
    • pp.212-224
    • /
    • 2000
  • In this paper, a parallel processing system is proposed for improving the processing speed of image related applications. The proposed parallel processing system is fully synchronous SIMD computer with pipelined architecture and consists of processing elements and a multi-access memory system. The multi-access memory system is made up of memory modules and a memory controller, which consists of memory module selection module, data routing module, and address calculating and routing module, to perform parallel memory accesses with the variety of types: block, horizontal, and vertical access way. Morphological filter had been applied to verify the parallel processing system and resulted in faithful processing speed.

  • PDF

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor (임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증)

  • Jung, Yong-Bum;Kim, Yong-Min;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.10
    • /
    • pp.11-21
    • /
    • 2011
  • This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.

Optimized Implementation of Interpolation Filters for HEVC Encoder

  • Taejin, Hwang;Ahn, Yongjo;Ryu, Jiwoo;Sim, Donggyu
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.10
    • /
    • pp.199-203
    • /
    • 2013
  • In this paper, a fast algorithm of discrete cosine transform-based interpolation filter (DCT-IF) for HEVC (high efficiency video coding) encoder is proposed. DCT-IF filter accounts for around 30% of encoder complexity, according to the computational complexity analysis with the HEVC reference software. In this work, the proposed DCT-IF is optimized by applying frame-level interpolation, SIMD optimization, and task-level parallelization via OpenMP on a developed C-based HEVC encoder. Performance analysis is conducted by measuring speed-up factor of the proposed optimization technique on the developed encoder. The results show that speed-up factors by frame-level interpolation, SIMD, and OpenMP are approximately 38-46, 3.6-4.4, and 3.0-3.7, respectively. In the end, we achieved the speed-up factor of 498.4 with the proposed fast algorithm.

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.305-317
    • /
    • 2008
  • As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.

A Study of Printed Score Recognition and its Parallel Algorithm (인쇄 악보의 인식과 병렬 알고리즘에 관한 연구)

  • 황영길;김성천
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.5
    • /
    • pp.959-970
    • /
    • 1994
  • In this thesis, a printed score is read by using handy scanner and the recognition process is excuted in parallel, finally, on Mesh-Connected Computer. What is read is classified into certain patterns and is recognized, based on knowledge. The preprocessing steps are minimized and simple operations are used in the algorithm proposed in this thesis. The score symbols on a printed score can be recognized irrespective of their sizes but their diversity males it difficult to recognize them all, so it is programmed so as to recognize some symbols that is used necessarily and frequently. The recognized result is transformed into the MIDI standard file format. It is required to use a parallel processing system with multiprocessors because the high speed image processing is required. A digitized two-dimensional image is appropriate in processing on the SIMD Mesh-Connected Computer(MCC). Therefore, we explain this architecture and present parallel algorithm using SIMD MCC with n processors that achieves time complexity0(n).

  • PDF