Search | Korea Science

A design of floating-point arithmetic unit for superscalar microprocessor (수퍼스칼라 마이크로프로세서용 부동 소수점 연산회로의 설계)

최병윤;손승일;이문기
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.21 no.5
- /
- pp.1345-1359
- /
- 1996
This paper presents a floating point arithmetic unit (FPAU) for supescalar microprocessor that executes fifteen operations such as addition, subtraction, data format converting, and compare operation using two pipelined arithmetic paths and new rounding and normalization scheme. By using two pipelined arithmetic paths, each aritchmetic operation can be assigned into appropriate arithmetic path which high speed operation is possible. The proposed normalization an rouding scheme enables the FPAU to execute roundig operation in parallel with normalization and to reduce timing delay of post-normalization. And by predicting leading one position of results using input operands, leading one detection(LOD) operation to normalize results in the conventional arithmetic unit can be eliminated. Because the FPAU can execuate fifteen single-precision or double-precision floating-point arithmetic operations through three-stage pipelined datapath and support IEEE standard 754, it has appropriate structure which can be ingegrated into superscalar microprocessor.
PDF

Recent Progress in Development of SFQ Arithmetic Logic Unit in Korea

Park, Jong-Hyuk;Jung, Ku-Rak;Lim, Hae-Ryong;Hahn, Taek-Sang
- 한국초전도학회:학술대회논문집
- /
- v.13
- /
- pp.15-15
- /
- 2003
PDF

Design of a Floating Point Multiplier for IEEE 754 Single-Precision Operations (IEEE 754 단정도 부동 소수점 연산용 곱셈기 설계)

Lee, Ju-Hun;Chung, Tae-Sang
- Proceedings of the KIEE Conference
- /
- 1999.11c
- /
- pp.778-780
- /
- 1999
Arithmetic unit speed depends strongly on the algorithms employed to realize the basic arithmetic operations.(add, subtract multiply, and divide) and on the logic design. Recent advances in VLSI have increased the feasibility of hardware implementation of floating point arithmetic units and microprocessors require a powerful floating-point processing unit as a standard option. This paper describes the design of floating-point multiplier for IEEE 754-1985 Single-Precision operation. Booth encoding algorithm method to reduce partial products and a Wallace tree of 4-2 CSA is adopted in fraction multiplication part to generate the $32{\times}32$ single-precision product. New scheme of rounding and sticky-bit generation is adopted to reduce area and timing. Also there is a true sign generator in this design. This multiplier have been implemented in a ALTERA FLEX EPF10K70RC240-4.
PDF

A design of Floating Point Arithmetic Unit for Geometry Operation of Mobile 3D Graphic Processor (모바일 3D 그래픽 프로세서의 지오메트리 연산을 위한 부동 소수점 연산기 구현)

Lee, Jee-Myong;Lee, Chan-Ho
- Proceedings of the IEEK Conference
- /
- 2005.11a
- /
- pp.711-714
- /
- 2005
We propose floating point arithmetic units for geometry operation of mobile 3D graphic processor. The proposed arithmetic units conform to the single precision format of IEEE standard 754-1985 that is a standard of floating point arithmetic. The rounding algorithm applies the nearest toward zero form. The proposed adder/subtraction unit and multiplier have one clock cycle latency, and the inversion unit has three clock cycle latency. We estimate the required numbers of arithmetic operation for Viewing transformation. The first stage of geometry operation is composed with translation, rotation and scaling operation. The translation operation requires three addition and the rotation operation needs three addition and six multiplication. The scaling operation requires three multiplication. The viewing transformation is performed in 15 clock cycles. If the adder and the multiplier have their own in/out ports, the viewing transformation can be done in 9 clock cycles. The error margin of proposed arithmetic units is smaller than $10^{-5}$ that is the request in the OpenGL standard. The proposed arithmetic units carry out operations in 100MHz clock frequency.
PDF

Study of the Superconductive Pipelined Multi-Bit ALU (초전도 Pipelined Multi-Bit ALU에 대한 연구)

Kim, Jin-Young;Ko, Ji-Hoon;Kang, Joon-Hee
- Progress in Superconductivity
- /
- v.7 no.2
- /
- pp.109-113
- /
- 2006
The Arithmetic Logic Unit (ALU) is a core element of a computer processor that performs arithmetic and logic operations on the operands in computer instruction words. We have developed and tested an RSFQ multi-bit ALU constructed with half adder unit cells. To reduce the complexity of the ALU, We used half adder unit cells. The unit cells were constructed of one half adder and three de switches. The timing problem in the complex circuits has been a very important issue. We have calculated the delay time of all components in the circuit by using Josephson circuit simulation tools of XIC, $WRspice^{TM}$, and Julia. To make the circuit work faster, we used a forward clocking scheme. This required a careful design of timing between clock and data pulses in ALU. The designed ALU had limited operation functions of OR, AND, XOR, and ADD. It had a pipeline structure. The fabricated 1-bit, 2-bit, and 4-bit ALU circuits were tested at a few kilo-hertz clock frequency as well as a few tens giga-hertz clock frequency, respectively. For high-speed tests, we used an eye-diagram technique. Our 4-bit ALU operated correctly at up to 5 GHz clock frequency.
PDF

A Design of Dual-Phase Instructions for a effective Logarithm and Exponent Arithmetic (효율적인 로그와 지수 연산을 위한 듀얼 페이즈 명령어 설계)

Kim, Chi-Yong;Lee, Kwang-Yeob
- Journal of IKEEE
- /
- v.14 no.2
- /
- pp.64-68
- /
- 2010
This paper proposes efficient log and exponent calculation methods using a dual phase instruction set without additional ALU unit for a mobile enviroment. Using the Dual Phase Instruction set, it extracts exponent and mantissa from expression of floating point and calculates 24bit single precision floating point of log approximation using the Taylor series expansion algorithm. And with dual phase instruction set, it reduces instruction excution cycles. The proposed Dual Phase architecture reduces the performance degradation and maintain smaller size.
PDF KSCI

A VLSI Architecture of Systolic Array for FET Computation (고속 퓨리어 변환 연산용 VLSI 시스토릭 어레이 아키텍춰)

신경욱;최병윤;이문기
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.25 no.9
- /
- pp.1115-1124
- /
- 1988
A two-dimensional systolic array for fast Fourier transform, which has a regular and recursive VLSI architecture is presented. The array is constructed with identical processing elements (PE) in mesh type, and due to its modularity, it can be expanded to an arbitrary size. A processing element consists of two data routing units, a butterfly arithmetic unit and a simple control unit. The array computes FFT through three procedures` I/O pipelining, data shuffling and butterfly arithmetic. By utilizing parallelism, pipelining and local communication geometry during data movement, the two-dimensional systolic array eliminates global and irregular commutation problems, which have been a limiting factor in VLSI implementation of FFT processor. The systolic array executes a half butterfly arithmetic based on a distributed arithmetic that can carry out multiplication with only adders. Also, the systolic array provides 100% PE activity, i.e., none of the PEs are idle at any time. A chip for half butterfly arithmetic, which consists of two BLC adders and registers, has been fabricated using a 3-um single metal P-well CMOS technology. With the half butterfly arithmetic execution time of about 500 ns which has been obtained b critical path delay simulation, totla FFT execution time for 1024 points is estimated about 16.6 us at clock frequency of 20MHz. A one-PE chip expnsible to anly size of array is being fabricated using a 2-um, double metal, P-well CMOS process. The chip was layouted using standard cell library and macrocell of BLC adder with the aid of auto-routing software. It consists of around 6000 transistors and 68 I/O pads on 3.4x2.8mm\ulcornerarea. A built-i self-testing circuit, BILBO (Built-In Logic Block Observation), was employed at the expense of 3% hardware overhead.
PDF

VLSI Design of Demodulating Fingers with Lowe Hardware Complexity for MC-CDMA Mobile System (MC-CDMA 이동국의 하드웨어 복잡도를 줄이기 위한 다중경로 복조기의 설계)

황상윤;이성주김재석
- Proceedings of the IEEK Conference
- /
- 1998.10a
- /
- pp.1113-1116
- /
- 1998
This paper presents an efficient hardware architecture of demodulating fingers to demodulate the multi-path propagating signals in MC-CDMA Mobile System. We design a new architecture of demodulating fingers which share the single arithmetic unit to reduce the hardware complexity. This arithmetic unit performs MAC(Multiplication and Accumulation) operations of all demodulating fingers. The proposed architecture is suitable for Is-95 based CDMA PCS system. Three demodulating fingers for MC-CDMA which demodulate 7 channels contain about 42K logic gates. Our proposed system is shown to be very useful for Multi-Code CDMA system in which several channels are demodulated simultaneously.
PDF

Simulation and Layout of Single Flux Quantum AND gate (단자속 양자 AND gate의 시뮬레이션과 Layout)

정구락;박종혁;임해용;강준희;한택상
- Proceedings of the Korea Institute of Applied Superconductivity and Cryogenics Conference
- /
- 2002.02a
- /
- pp.141-143
- /
- 2002
We have simulated and Laid out a Single Flux Quantum(SFQ) AND gate for Arithmetic Logic Unit by using XIC, WRspice and Lmeter. This circuit is a combination of two D Flip-Flop. D Flip- Flop and dc SQUID are the similar shape from the fact that it has the a loop inductor and two Josephson junction. We also obtained operating margins and accomplished layout of the AND gate. We got the margin of $\pm$42% over.
PDF

Design of Single Flux Quantum D2 Cell and Inverter for ALU (ALU를 위한 단자속 양자 D2 Cell과 Inverter의 설계)

정구락;박종혁;임해용;강준희;한택상
- Proceedings of the Korea Institute of Applied Superconductivity and Cryogenics Conference
- /
- 2003.02a
- /
- pp.140-142
- /
- 2003
We have designed a SFQ (Single Flux Quantum) D2 Cell and Inverter(NOT) for a superconducting ALU (Arithmetic Logic Unit). To optimize the circuit, we have used Julia, XIC and Lmeter for simulations and layouts. We obtained the circuit margin of larger than $\pm$25%. After layout, we drew chip for fabrication of SFQ D2 Cell and Inverter. We connected D2 Cell and Inverter to jtl, DC/SFQ, SFQ/DC and RS flip-flop for measurement.
PDF

Search Result 29, Processing Time 0.037 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)