Search | Korea Science

Design and Verification of Adder Module for Fast Floating-Point Unit (부동 소수점 유닛의 고속처리를 위한 가산기 모듈의 설계 및 검증)

Jung, Myung-Su;Sonh, Seung-Il
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- v.9 no.2
- /
- pp.611-614
- /
- 2005
1970년대 말까지 초창기에 출시된 컴퓨터들은 부동 소수점을 표현하기 위한 자신들의 내부적 표현방식을 사용하였다. 따라서 각 컴퓨터마다 부동 소수점 연산에 대한 계산 결과가 약간씩 차이가 나기도 하였다. 이러한 문제점을 해결하기 위해 IEEE에서는 부동 소수점에 대한 표준안을 제안하였다. 이는 서로 다른 컴퓨터 간에 부동 소수점 데이터의 교환이 가능하게 할 뿐만 아니라 하드웨어 설계자들에게도 정확한 모델을 제공하는 것이 목적이었다. 이 당시 제정된 부동 소수점 표준안은 IEEE Standard 754 부동 소수점이며, 오늘날 인텔 CPU 기반의 PC, 매킨토시 및 대부분의 유닉스 플랫폼에서 컴퓨터 상의 실수를 표현하기 위해 사용하는 가장 일반적인 표현 방식으로 발전하였다. 본 논문에서는 부동 소수점의 기본적인 표현방식에 대해 연구하고, 이 중 32 bit 단일 정밀도 부동 소수점 가산기를 Microsoft Visual C++ 6.0을 이용해 시뮬레이션하고 이를 VHDL로 구현한다.
PDF

Design of a high-performance floating-point unit adopting a new divide/square root implementation (새로운 제산/제곱근기를 내장한 고성능 부동 소수점 유닛의 설계)

Lee, Tae-Young;Lee, Sung-Youn;Hong, In-Pyo;Lee, Yong-Surk
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.37 no.12
- /
- pp.79-90
- /
- 2000
In this paper, a high-performance floating point unit, which is suitable for high-performance superscalar microprocessors and supports IEEE 754 standard, is designed. Floating-point arithmetic unit (AU) supports all denormalized number processing through hardware, while eliminating the additional delay time due to the denormalized number processing by proposing the proposed gradual underflow prediction (GUP) scheme. Contrary to the existing fixed-radix implementations, floating-point divide/square root unit adopts a new architecture which determines variable length quotient bits per cycle. The new architecture is superior to the SRT implementations in terms of performance and design complexity. Moreover, sophisticated exception prediction scheme enables precise exception to be implemented with ease on various superscalar microprocessors, and removes the stall cycles in division. Designed floating-point AU and divide/square root unit are integrated with and instruction decoder, register file, memory model and multiplier to form a floating-point unit, and its function and performance is verified.
PDF

Design of Transformation Engine for Mobile 3D Graphics (모바일 3차원 그래픽을 위한 기하변환 엔진 설계)

Kim, Dae-Kyoung;Lee, Jee-Myong;Lee, Chan-Ho
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.10
- /
- pp.49-54
- /
- 2007
As digital contents based on 3D graphics are increased, the requirement for low power 3D graphic hardware for mobile devices is increased. We design a transformation engine for mobile 3D graphic processor. We propose a simplified transformation engine for mobile 3D graphic processor. The area of the transformation engine is reduced by merging a mapping transformation unit into a projective transformation unit and by replacing a clipping unit with a selection unit. It consists of a viewing transformation unit a projective transformation unit a divide by w nit, and a selection unit. It can process 32 bit floating point format of the IEEE-754 standard or a reduced 24 bit floating point format. It has a pipelined architecture so that a vertex is processed every 4 cycles except for the initial latency. The RTL code is verified using an FPGA.
PDF KSCI

Design of Floating Point Adder and Verification through PCI Interface (부동 소수점 가산기 모듈의 설계와 PCI 인터페이스를 통한 검증)

Jung Myung-Su;Sonh Seung-Il
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2006.05a
- /
- pp.886-889
- /
- 2006
수치연산 보조프로세서로도 알려져 있는 부동 소수점 연산장치(FPU)는 컴퓨터가 사용하는 기본 마이크로프로세서보다 더 빠르게 숫자를 다를 수 있는 특별한 회로 설계 또는 마이크로프로세서를 말한다. FPU는 전적으로 대형 수학적 연산에만 초점을 맞춘 특별한 명령 셋을 가지고 있어서 그렇게 빠르게 계산을 수행할 수 있는 것이다. FPU는 오늘날의 거의 모든 PC에 장착되고 있지만, 실은 그것은 그래픽 이미지 처리나 표현 등과 같은 특별할 일을 수행할 때에 필요하다. 초창기 컴퓨터 회사들은 각기 다른 연산방식을 사용했다. 이에 따라 연산결과가 컴퓨터마다 다른 문제점을 해결하기 위해 IEEE에서는 부동 소수점에 대한 표준안을 제안하였다. 이 표준안은 IEEE Standard 754 이며, 오늘날 인텔 CPU 기반의 PC, 매킨토시 및 대부분의 유닉스 플랫폼에서 컴퓨터 상의 실수를 표현하기 위해 사용하는 가장 일반적인 표현 방식으로 발전하였다. 본 논문에서는 부동 소수점 표준안 중 32-bit 단일 정밀도 부동 소수점 가산기를 VHDL로 구현하여 FPGA칩으로 다운하고 PCI 인터페이스를 통해 Visual C++로 데이터의 입출력을 검증하였다.
PDF

A design of Floating Point Arithmetic Unit for Geometry Operation of Mobile 3D Graphic Processor (모바일 3D 그래픽 프로세서의 지오메트리 연산을 위한 부동 소수점 연산기 구현)

Lee, Jee-Myong;Lee, Chan-Ho
- Proceedings of the IEEK Conference
- /
- 2005.11a
- /
- pp.711-714
- /
- 2005
We propose floating point arithmetic units for geometry operation of mobile 3D graphic processor. The proposed arithmetic units conform to the single precision format of IEEE standard 754-1985 that is a standard of floating point arithmetic. The rounding algorithm applies the nearest toward zero form. The proposed adder/subtraction unit and multiplier have one clock cycle latency, and the inversion unit has three clock cycle latency. We estimate the required numbers of arithmetic operation for Viewing transformation. The first stage of geometry operation is composed with translation, rotation and scaling operation. The translation operation requires three addition and the rotation operation needs three addition and six multiplication. The scaling operation requires three multiplication. The viewing transformation is performed in 15 clock cycles. If the adder and the multiplier have their own in/out ports, the viewing transformation can be done in 9 clock cycles. The error margin of proposed arithmetic units is smaller than $10^{-5}$ that is the request in the OpenGL standard. The proposed arithmetic units carry out operations in 100MHz clock frequency.
PDF

A Design of Parallel Processing for Wavelet Transformation on FPGA (ICCAS 2005)

Ngowsuwan, Krairuek;Chisobhuk, Orachat;Vongchumyen, Charoen
- 제어로봇시스템학회:학술대회논문집
- /
- 2005.06a
- /
- pp.864-867
- /
- 2005
In this paper we introduce a design of parallel architecture for wavelet transformation on FPGA. We implement wavelet transforms though lifting scheme and apply Daubechies4 transform equations. This technique has an advantage that we can obtain perfect reconstruction of the data. We divide our process to high pass filter and low pass filter. With this division, we can find coefficients from low and high pass filters simultaneously using parallel processing properties of FPGA to reduce processing time. From the equations, we have to design real number computation module, referred to IEEE754 standard. We choose 32 bit computation that is fine enough to reconstruct data. After that we arrange the real number module according to Daubechies4 transform though lifting scheme.
PDF

A design of floating-point multiplier for superscalar microprocessor (수퍼스칼라 마이크로프로세서용 부동 소수점 승산기의 설계)

최병윤;이문기
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.5
- /
- pp.1332-1344
- /
- 1996
This paper presents a pipelined floating point multiplier(FMUL) for superscalar microprocessors that conbines radix-16 recoding scheme based on signed-digit(SD) number system and new rouding and normalization scheme. The new rounding and normalization scheme enable the FMUL to compute sticky bit in parallel with multiple operation and elminate timing delay due to post-normalization. By expoliting SD radix-16 recoding scheme, we can achieves further reduction of silicon area and computation time. The FMUL can execute signle-precision or double-precision floating-point multiply operation through three-stage pipelined datapath and support IEEE standard 754. The algorithm andstructure of the designed multiplier have been successfully verified through Verilog HOL modeling and simulation.
PDF

A design of floating-point arithmetic unit for superscalar microprocessor (수퍼스칼라 마이크로프로세서용 부동 소수점 연산회로의 설계)

최병윤;손승일;이문기
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.5
- /
- pp.1345-1359
- /
- 1996
This paper presents a floating point arithmetic unit (FPAU) for supescalar microprocessor that executes fifteen operations such as addition, subtraction, data format converting, and compare operation using two pipelined arithmetic paths and new rounding and normalization scheme. By using two pipelined arithmetic paths, each aritchmetic operation can be assigned into appropriate arithmetic path which high speed operation is possible. The proposed normalization an rouding scheme enables the FPAU to execute roundig operation in parallel with normalization and to reduce timing delay of post-normalization. And by predicting leading one position of results using input operands, leading one detection(LOD) operation to normalize results in the conventional arithmetic unit can be eliminated. Because the FPAU can execuate fifteen single-precision or double-precision floating-point arithmetic operations through three-stage pipelined datapath and support IEEE standard 754, it has appropriate structure which can be ingegrated into superscalar microprocessor.
PDF

A Parallel-Architecture Processor Design for the Fast Multiplication of Homogeneous Transformation Matrices (Homogeneous Transformation Matrix의 곱셈을 위한 병렬구조 프로세서의 설계)

Kwon Do-All;Chung Tae-Sang
- The Transactions of the Korean Institute of Electrical Engineers D
- /
- v.54 no.12
- /
- pp.723-731
- /
- 2005
The $4{\times}4$ homogeneous transformation matrix is a compact representation of orientation and position of an object in robotics and computer graphics. A coordinate transformation is accomplished through the successive multiplications of homogeneous matrices, each of which represents the orientation and position of each corresponding link. Thus, for real time control applications in robotics or animation in computer graphics, the fast multiplication of homogeneous matrices is quite demanding. In this paper, a parallel-architecture vector processor is designed for this purpose. The processor has several key features. For the accuracy of computation for real application, the operands of the processors are floating point numbers based on the IEEE Standard 754. For the parallelism and reduction of hardware redundancy, the processor takes column vectors of homogeneous matrices as multiplication unit. To further improve the throughput, the processor structure and its control is based on a pipe-lined structure. Since the designed processor can be used as a special purpose coprocessor in robotics and computer graphics, additionally to special matrix/matrix or matrix/vector multiplication, several other useful instructions for various transformation algorithms are included for wide application of the new design. The suggested instruction set will serve as standard in future processor design for Robotics and Computer Graphics. The design is verified using FPGA implementation. Also a comparative performance improvement of the proposed design is studied compared to a uni-processor approach for possibilities of its real time application.
PDF KSCI

Search Result 19, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)