Search | Korea Science

A Design of a 8-Thread Graphics Processor Unit with Variable-Length Instructions

Lee, Kwang-Yeob;Kwak, Jae-Chang
- Journal of information and communication convergence engineering
- /
- v.6 no.3
- /
- pp.285-288
- /
- 2008
Most of multimedia processors for 2D/3D graphics acceleration use a lot of integer/floating point arithmetic units. We present a new architecture with an efficient ALU, built in a smaller chip size. It reduces instruction cycles significantly based on a foundation of multi-thread operation, variable length instruction words, dual phase operation, and phase instruction's coordination. We can decrease the number of instruction cycles up to 50%, and can achieve twice better performance.
PDF KSCI

A Study on the Behavior of Floating-Point Unit Conforming the ANSI/IEEE Std. 754-1985 (ANSI/IEEE Std. 754-1985에 의거한 부동소수점 연산기의 동작원리에 관한 연구)

Kim, Kwang-Uk;Chung, Tae-Sang
- Proceedings of the KIEE Conference
- /
- 1999.11c
- /
- pp.788-790
- /
- 1999
A software implementation of floating-point addition and multiplication is presented. For this, the ANSI/IEEE standard for binary floating-point arithmetic is reviewed briefly. The architecture and behavior of the $Intel^{(R)}\;80{\times}87$ FPU is fully studied and basic algorithms for floating-point addition and multiplication are used for the implementation. Some examples and their verifications are also presented.
PDF

Design of high speed 64bit adder (고속 연산을 위한 64bit 가산기의 설계)

오재환;이영훈;김상수;상명희
- Proceedings of the IEEK Conference
- /
- 1998.06a
- /
- pp.843-846
- /
- 1998
산술연산을 수행하는 가산기는 ALU(arithmetic logic unit)의 성능을 좌우하는데 매우 중요한 역할을 하며, 어떠한 캐리 생성 방식을 사용하는냐에 따라 그 성능이 결정될 수 있다. RCA(Ripple carry adder)는 간단하고, 쉬운 설게로 널리 사용되자만, 캐리의 전파지연 문제로 인해 고속의 가산기 응용에의 부적합하다. 또한, CLA(carry lookahead adder)방식의 가산기는 캐리의 지연시간이 가산기의 단수와 무관하므로, 연산속도를 높일 수 있는 장점이 있지만 더하고자 하는 bit의 수가 클수록 회로가 매우 복잡해지는 큰 단점을 가지고 있다. 따라서, 본 논문에서는 간단하면서도 성능이 우수한 64bit 가산기를 설계하고 시뮬레이션을 통하여 설계된 회로의 우수성을 증명하였다.
PDF

Design of Single Flux Quantum D2 Cell and Inverter for ALU (ALU를 위한 단자속 양자 D2 Cell과 Inverter의 설계)

정구락;박종혁;임해용;강준희;한택상
- Proceedings of the Korea Institute of Applied Superconductivity and Cryogenics Conference
- /
- 2003.02a
- /
- pp.140-142
- /
- 2003
We have designed a SFQ (Single Flux Quantum) D2 Cell and Inverter(NOT) for a superconducting ALU (Arithmetic Logic Unit). To optimize the circuit, we have used Julia, XIC and Lmeter for simulations and layouts. We obtained the circuit margin of larger than $\pm$25%. After layout, we drew chip for fabrication of SFQ D2 Cell and Inverter. We connected D2 Cell and Inverter to jtl, DC/SFQ, SFQ/DC and RS flip-flop for measurement.
PDF

A study on application of GPU-accelerated kinematic wave rainfall-runoff model (GPU 가속 운동파 강우유출모형의 적용 연구)

Kim, Boram;Yun, Gwan Seon;Kim, Hyeong-Jun;Yoon, Kwang Seok
- Proceedings of the Korea Water Resources Association Conference
- /
- 2020.06a
- /
- pp.323-323
- /
- 2020
그래픽 처리 장치(Graphic Processing Unit: GPU)는 그래픽 처리 작업에 특화된 다수의 산술논리 장치(Arithmetic Logic Unit: ALU)로 구성되어 있어서 중앙 처리 장치(Central Processing Unit: CPU)보다 한 번에 더 많은 연산 수행이 가능하다. 본 연구는 GPU 가속 운동파모형을 실제 유역에 적용하여, GPU 가속 운동파 강우유출모형 결과에 대한 정확성과 연산 소요 시간에 대한 효율성을 확인하였다. GPU 가속 운동파모형은 분포형 강우유출모형의 수치모의 연산시간을 단축시키기 위해 CUDA 포트란을 이용하여 개발되었다. 분포형모형의 지배방정식은 운동파모형과 Green-Ampt모형으로 구성되었고, 운동파모형은 유한체적법을 이용하여 이산화 하였다. GPU 가속 운동파모형을 이용하여 금강의 미호천 유역에서 발생하는 강우유출현상을 모의 하였고, 동일한 유한체적법을 이용한 CPU(Central Processing Unit) 기반의 강우유출모형과 비교하였다. 그 결과 GPU 가속모형의 결과는 미호천 유역 하류단에서 관측한 결과와 유사한 결과를 나타냈다. 또한, 연산소요시간은 CPU 기반의 강우유출모형의 연산소요시간보다 단축되었으며, 본 연구에 사용된 장비를 기준으로 최대 100배 정도 단축되었다.
PDF

A Scalable Structure for a Multiplier and an Inversion Unit in $GF(2^m)$

Lee, Chan-Ho;Lee, Jeong-Ho
- ETRI Journal
- /
- v.25 no.5
- /
- pp.315-320
- /
- 2003
Elliptic curve cryptography (ECC) offers the highest security per bit among the known public key cryptosystems. The operation of ECC is based on the arithmetic of the finite field. This paper presents the design of a 193-bit finite field multiplier and an inversion unit based on a normal basis representation in which the inversion and the square operation units are easy to implement. This scalable multiplier can be constructed in a variable structure depending on the performance area trade-off. We implement it using Verilog HDL and a 0.35 ${\mu}m$ CMOS cell library and verify the operation by simulation.
PDF

IEEE Standard Floating Poing ALU with 60MHz Clock Frequency (60MHz Clock 주파수의 IEEE 표준 Floating Point ALU)

Yong Surk Lee
- Journal of the Korean Institute of Telematics and Electronics A
- /
- v.28A no.11
- /
- pp.915-922
- /
- 1991
This research paper presents an ALU unit using 1.0$\mu$m CMOS technology capable of doing IEEE standard single and double precision floating poing calculation within 32ns (2 clock) at 60 MHz clock speed. This 32ns speed was achieved by using 9ns 1's complement arithmetic 54 bit carry select adder instead of previous 2's complement adders. On the first cycle, this adder is used for addition or subtraction and the second cycle uses this adder for rounding. This reduces the number of required adders from two to one. Speed improvement is 2 to 5 times compared with previous 40MHz design. Design goal was 60MHz, however, this unit is functioning at 80 MHz at room temperature.
PDF

Development of on-line inverse kinematic algorithm and its experimental implementation (온라인 좌표 역변환 알고리듬의 개발과 이의 실험적 수행)

오준호;박서욱;이두현
- 제어로봇시스템학회:학술대회논문집
- /
- 1988.10a
- /
- pp.16-20
- /
- 1988
This paper presents a new algorithm for solving the inverse kinematics in real-time applications. The end-tip movement of each link can be resolved into the basic resolution unit, .DELTA.l, which depends on link length, reduction ratio and resolution of the incremental encoder attached to the joint. When x- and y-axis projection of the end-tip movement are expressed in .DELTA.l unit, projectional increments .DELTA.x and .DELTA.y become -1, 0 or I by truncation. By using the incremental computation with these ternary value and some simple logic rules, a coordinate transformation can be realized. Through this approach, it should be noted that the floating-point arithmetic and the manipulation of trigonometric functions are completely eliminated. This paper demonstrates the proposed method in a parallelogram linkage type, two-link arm.
PDF

Implementation of Lattice Reduction-aided Detector using GPU on SDR System (SDR 시스템에서 GPU를 사용한 Lattice Reduction-aided 검출기 구현)

Kim, Tae Hyun;Leem, Hyun Seok;Choi, Seung Won
- Journal of Korea Society of Digital Industry and Information Management
- /
- v.7 no.3
- /
- pp.55-61
- /
- 2011
This paper presents an implementation of Lattice Reduction (LR)-aided detector for Multiple-Input Multiple-Output (MIMO) system using Graphics Processing Unit (GPU). GPU is a parallel processor which has a number of Arithmetic Logic Units (ALUs), thus, it can minimize the operation time of LR algorithm through the parallelization using multiple threads in the GPU. Through the implemented LR-aided detector, we verify that the LR-aided detector operates a lot faster than Maximum Likelihood (ML) detector. The implemented LR-aided detector has been applied to WiMAX system to show the feasibility of its real-time processing. In addition, we demonstrate that the processing time can be reduced at the cost of 3dB SNR loss by limiting the repeating loop in Lenstra-Lenstra-Lovasz (LLL) algorithm which is frequently used in LR-aided detector.
https://doi.org/10.17662/ksdim.2011.7.3.055 인용 PDF KSCI

A Design of Sign-magnitude based Multi-mode LDPC Decoder for WiMAX (Sign-magnitude 수체계 기반의 WiMAX용 다중모드 LDPC 복호기 설계)

Seo, Jin-Ho;Park, Hae-Won;Shin, Kyung-Wook
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.15 no.11
- /
- pp.2465-2473
- /
- 2011
This paper describes a circuit-level optimization of DFU(decoding function unit) for LDPC decoder which is used in wireless communication systems including WiMAX and WLAN. A new design of DFU based on sign-magnitude arithmetic instead of two's complement arithmetic is proposed, resulting in 18% reduction of gate count for 96 DFUs array used in mobile WiMAX LDPC decoder. A multi-mode LDPC decoder for mobile WiMAX standard is designed using the proposed DFU. The LDPC decoder synthesized using a 0.18-${\mu}m$ CMOS cell library with 50 MHz clock has 268,870 gates and 71,424 bits RAM, and it is verified by FPGA implementation.
https://doi.org/10.6109/jkiice.2011.15.11.2465 인용 PDF KSCI

Search Result 167, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)