• Title/Summary/Keyword: SIMD Computer

Search Result 64, Processing Time 0.022 seconds

A Design of a Vertex Shader for Mobile Devices (Mobile 기기에 적합한 Vertex Shader 의 설계 및 구현)

  • Jeong, Hyung-Ki;Nam, Ki-Hun;Lee, Kwang-Yeob;Hur, Hyun-Min;Lee, Byung-Ok;Lee, James
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.751-754
    • /
    • 2005
  • In this paper, we designed a vertex shader for mobile devices. Proposed Vertex shader is compatible with the OpenGL ARB & DirectX 8.0 Vertex Shader 1.1 and is organized of modified IEEE-754 24 bits float point SIMD architecture. All float point arithmetic unit process 1 cycle operation with 100Mhz frequency more. We made a vertex shader demo system with Xilinx-Virtex II and get synthesis result that confirm 11M gates size at TSMC 0.13um @ 115MHz.

  • PDF

Implementation of an Optimal Many-core Processor for Beamforming Algorithm of Mobile Ultrasound Image Signals (모바일 초음파 영상신호의 빔포밍 기법을 위한 최적의 매니코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.8
    • /
    • pp.119-128
    • /
    • 2011
  • This paper introduces design space exploration of many-core processors that meet high performance and low power required by the beamforming algorithm of image signals of mobile ultrasound. For the design space exploration of the many-core processor, we mapped different number of ultrasound image data to each processing element of many-core, and then determined an optimal many-core processor architecture in terms of execution time, energy efficiency and area efficiency. Experimental results indicate that PE=4096 and 1024 provide the highest energy efficiency and area efficiency, respectively. In addition, PE=4096 achieves 46x and 10x better than TI DSP C6416, which is widely used for ultrasound image devices, in terms of energy efficiency and area efficiency, respectively.

Control Unit Design and Implementation for SIMD Programmable Unified Shader (SIMD 프로그래머블 통합 셰이더를 위한 제어 유닛 설계 및 구현)

  • Kim, Kyeong-Seob;Lee, Yun-Sub;Yu, Byung-Cheol;Jung, Jin-Ha;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.7
    • /
    • pp.37-47
    • /
    • 2011
  • Real picture like high quality computer graphic is widely used in various fields and shader processor, a key part of a graphic processor, has been advanced to programmable unified shader. However, The existing graphic processors have been optimized to commercial algorithms, so development of an algorithm which is not based on it requires an independent shader processor. In this paper, we have designed and implemented a control unit to support high quality 3 dimensional computer graphic image on programmable integrated shader processor. We have done evaluation through functional level simulation of designed control unit. Hardware resource usage rate are measured by implementing directly on FPGA Virtex-4 and execution speed are verified by applying ASIC library. the result of an evaluation shows that the control unit has the commands more about 1.5 times compared to the other shader processors that is a behavior similar to the control unit and with a number of processing units used in a shader processor, compared with the other processors, overall performance of the control unit is improved about 3.1 GFLOPS.

Software Implementation of Lightweight Block Cipher CHAM for Fast Encryption

  • Kim, Taeung;Hong, Deukjo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.111-117
    • /
    • 2018
  • CHAM is a lightweight block cipher, proposed in ICISC 2017. CHAM-n/k has the n-bit block and the k-bit key, and designers recommend CHAM-64/128, CHAM-128/128, and CHAM-128/256. In this paper, we study how to make optimal software implementation of CHAM such that it has high encryption speed on CPUs with high computing power. The best performances of our CHAM implementations are 1.6 cycles/byte for CHAM-64/128, 2.3 cycles/byte for CHAM-128/128, and 3.8 cycles/byte for CHAM-128/256. The comparison with existing software implementation results for well-known block ciphers shows that our results are competitive.

Parallel Fuzzy Information Processing System - KAFA : KAist Fuzzy Accelerator -

  • Kim, Young-Dal;Lee, Hyung-Kwang;Park, Kyu-Ho
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1993.06a
    • /
    • pp.981-984
    • /
    • 1993
  • During the past decade, several specific hardwares for fast fuzzy inference have been developed. Most of them are dedicated to a specific inference method and thus cannot support other inference methods. In this paper, we present a hardware architecture called KAFA(KAist Fuzzy Accelerator) which provides various fuzzy inference methods and fuzzy set operators. The architecture has SIMD structure, which consists of two parts; system control/interface unit(Main Controller) and arithmetic units(FPEs). Using the parallel processing technology, the KAFA has the high performance for fuzzy information processing. The speed of the KAFA holds promise for the development of the new fuzzy application systems.

  • PDF

Feature Extraction System for High-Speed Fingerprint Recognition using the Multi-Access Memory System (다중 접근 메모리 시스템을 이용한 고속 지문인식 특징추출 시스템)

  • Park, Jong Seon;Kim, Jea Hee;Ko, Kyung-Sik;Park, Jong Won
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.8
    • /
    • pp.914-926
    • /
    • 2013
  • Among the recent security systems, security system with fingerprint recognition gets many people's interests through the strengths such as exclusiveness, convenience, etc, in comparison with other security systems. The most important matters for fingerprint recognition system are reliability of matching between the fingerprint in database and user's fingerprint and rapid process of image processing algorithms used for fingerprint recognition. The existing fingerprint recognition system reduces the processing time by removing some processes in the feature extraction algorithms but has weakness of a reliability. This paper realizes the fingerprint recognition algorithm using MAMS(Multi-Access Memory System) for both the rapid processing time and the reliability in feature extraction and matching accuracy. Reliability of this process is verified by the correlation between serial processor's results and MAMS-PP64's results. The performance of the method using MAMS-PP64 is 1.56 times faster than compared serial processor.

Implementation of Parallel Volume Rendering Using the Sequential Shear-Warp Algorithm (순차 Shear-Warp 알고리즘을 이용한 병렬볼륨렌더링의 구현)

  • Kim, Eung-Kon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1620-1632
    • /
    • 1998
  • This paper presents a fast parallel algorithm for volume rendering and its implementation using C language and MPI MasPar Programming Language) on the 4,096 processor MasPar MP-2 machine. This parallel algorithm is a parallelization hased on the Lacroute' s sequential shear - warp algorithm currently acknowledged to be the fastest sequential volume rendering algorithm. This algorithm reduces communication overheads by using the sheared space partition scheme and the load balancing technique using load estimates from the previous iteration, and the number of voxels to be processed by using the run-length encoded volume data structure.Actual performance is 3 to 4 frames/second on the human hrain scan dataset of $128\times128\times128$ voxels. Because of the scalability of this algorithm, performance of ]2-16 frames/sc.'cond is expected on the 16,384 processor MasPar MP-2 machine. It is expected that implementation on more current SIMD or MIMD architectures would provide 3O~60 frames/second on large volumes.

  • PDF

Multi-Port Register File Design and Implementation for the SIMD Programmable Shader (SIMD 프로그래머블 셰이더를 위한 멀티포트 레지스터 파일 설계 및 구현)

  • Yoon, Wan-Oh;Kim, Kyeong-Seob;Cheong, Jin-Ha;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.9
    • /
    • pp.85-95
    • /
    • 2008
  • Characteristically, 3D graphic algorithms have to perform complex calculations on massive amount of stream data. The vertex and pixel shaders have enabled efficient execution of graphic algorithms by hardware, and these graphic processors may seem to have achieved the aim of "hardwarization of software shaders." However, the hardware shaders have hitherto been evolving within the limits of Z-buffer based algorithms. We predict that the ultimate model for future graphic processors will be an algorithm-independent integrated shader which combines the functions of both vertex and pixel shaders. We design the register file model that supports 3-dimensional computer graphic on the programmable unified shader processor. we have verified the accurate calculated value using FPGA Virtex-4(xcvlx200) made by Xilinx for operating binary files made by the implementation progress based on synthesis results.

Design of Special Function Unit for Vectorized SIMD Programmable Unified Shader (벡터화된 SIMD 프로그램어블 통합 셰이더를 위한 특수 함수 유닛 설계)

  • Jung, Jin-Ha;Kim, Kyeong-Seob;Yun, Jeong-Hee;Seo, Jang-Won;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.5
    • /
    • pp.56-70
    • /
    • 2010
  • Rendering technique generating 2 dimensional image to give reality and high performance graphical processor for efficient processing of massive data are necessary to support realistic 3 dimensional graphical image. Recently, graphical hardwares have evolved rapidly. This enables high quality rendering effect that we were unable to process in realtime. Improving shading technique enabled us to render realistic images but still much time is required for this process. Multiple operational units are being integrated in a graphical processor for effective floating point operation using massive data to process almost real looking images. In this paper, we have designed and implemented a special functional unit to support high quality 3 dimensional computer graphic image on programmable integrated shader processor. We have done evaluation through functional level simulation of designed special functional unit. Hardware resource usage rate and execution speed are measured implementing directly on FPGA Virtex-4(xc4vlx200).

IEEE-754 Floating-Point Divider for Embedded Processors (내장형 프로세서를 위한 IEEE-754 고성능 부동소수점 나눗셈기의 설계)

  • Jeong, Jae-Won;Hong, In-Pyo;Jeong, Woo-Kyong;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.7
    • /
    • pp.66-73
    • /
    • 2002
  • As floating-point operations become widely used in various applications such as computer graphics and high-definition DSP, the needs for fast division become increased. However, conventional floating-point dividers occupy a large hardware area, and bring bottle-becks to the entire floating-point operations. In this paper, a high-performance and small-area floating-point divider, which is suitable for embedded processors, is designed using he series expansion algorithm. The algorithm is selected to utilize two MAC(Multiply-ACcumulate) units for quadratic convergence to the correct quotient. The two MAC units for SIMD-DSP features are shared and the additional area for the division only is very small. The proposed divider supports all rounding modes defined by IEEE 754 standard, and error estimations are performed for appropriate precision.