Search | Korea Science

Design of a Dispatch Unit & Operand Selection Unit for Improving the SIMT Based GP-GPU Instruction Performance (SIMT구조 GP-GPU의 명령어 처리 성능 향상을 위한 Dispatch Unit과 Operand Selection Unit설계)

Kwak, Jae Chang
- Journal of IKEEE
- /
- v.19 no.3
- /
- pp.455-459
- /
- 2015
This paper proposes a dispatch unit of GP-GPU with SIMT architecture to support the acceleration of general-purpose operation as well as graphics processing. If all the information of an operand used instructions issued from the warp scheduler is decoded, an unnecessary operand load occurs, resulting in register loads. To resolve this problem, this paper proposes a method that can reduce the operand load and the load on the resister by decoding only the information of the operand using a pre-decoding method. The operand information from the dispatch unit is passed to the operand selection unit with preventing register bank collisions. Thus the overall performance are improved. In the simulation test, the total clock cycles required by processing 10,000 arbitrary instructions issued from the wrap scheduler using ModelSim SE 10.0b are measured. It shows that the application of the dispatch unit equipped with the pre-decoding function proposed in this paper can make an improvement of about 12% in processing performance compared to the conventional method.
https://doi.org/10.7471/ikeee.2015.19.3.455 인용 PDF KSCI

Consecutive Operand-Caching Method for Multiprecision Multiplication, Revisited

Seo, Hwajeong;Kim, Howon
- Journal of information and communication convergence engineering
- /
- v.13 no.1
- /
- pp.27-35
- /
- 2015
Multiprecision multiplication is the most expensive operation in public key-based cryptography. Therefore, many multiplication methods have been studied intensively for several decades. In Workshop on Cryptographic Hardware and Embedded Systems 2011 (CHES2011), a novel multiplication method called 'operand caching' was proposed. This method reduces the number of required load instructions by caching the operands. However, it does not provide full operand caching when changing the row of partial products. To overcome this problem, a novel method, that is, 'consecutive operand caching' was proposed in Workshop on Information Security Applications 2012 (WISA2012). It divides a multiplication structure into partial products and reconstructs them to share common operands between previous and next partial products. However, there is still room for improvement; therefore, we propose a finely designed operand-caching mode to minimize useless memory accesses when the first row is changed. Finally, we reduce the number of memory access instructions and boost the speed of the overall multiprecision multiplication for public key cryptography.
https://doi.org/10.6109/jicce.2015.13.1.027 인용 PDF KSCI KPUBS HTML

Performance Improvement of Operand Fetching with the Operand Reference Prediction Cache(ORPC) (오퍼랜드 참조 예측 캐쉬(ORPC)를 활용한 오퍼랜드 페치의 성능 개선)

Kim, Heung-Jun;Cho, Kyung-San
- The Transactions of the Korea Information Processing Society
- /
- v.5 no.6
- /
- pp.1652-1659
- /
- 1998
To provide performance gains by reducing the operand referencing latency and data cache bandwidth requirements, we present an operand reference prediction cache (ORPC) which predicts operand value and address translation during the instruction fetch stage. The prediction is verified in the early stage, and thus it minimizes the performance penalty caused by the misprediction. Through the trace-driven simulation of six benchmark programs, the performance improvement by proposed three aRPC stmctures (OfiPC1, OfiPC2. ORPC3)is analysed and validated.
PDF

Multi-Operand Radix-2 Signed-Digit Adder using Current Mode MOSEET Circuits

Sakamoto, Masahiro;Hamano, Daisuke;Higuchi, Yuuichi;Kiriya, Takechika;Morisue, Mititada
- Proceedings of the IEEK Conference
- /
- 2000.07a
- /
- pp.167-170
- /
- 2000
This paper describes a novel multi-operand radix-2 signed-digit(SD) adder. The novel multi-operand addition algorithm can eliminate carry propagation chain by dividing the input operands into even place part and odd place part, and adding them each. The multi-operand adder with this algorithm can add six operands in parallel, and is faster than the ordinary method of SD adder binary tree. A hardware model for proposed adder is shown which is implemented by the current-mode MOSFET circuit technology. Simulations have been made by SPICE in order to verify the function of the proposed circuit.
PDF

An Improved Load Operand Referencing Scheme Using A Hybrid Predictor (혼합 예측기를 사용하는 효율적인 적재 명령어의 오퍼랜드 참조 기법)

Choe, Seung-Gyo;Jo, Gyeong-San
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.7
- /
- pp.2196-2203
- /
- 2000
As processor's operational frequency increases and processors execute multiple instructions per cycle, the processor performance becomes more dependent on the load operand referencing latency and the data dependency. To reduce the operand fetch latency and to increase ILP by breaking the data dependency, we propose a value-address hybrid predictor using a reasonable size prediction buffer and analyse the performance improvement by the proposed predictor. Through the extensive simulation of 5 benchmark programs, the proposed hybrid prediction scheme accurately predicts 62.72% of all loads which are 12.64% higher than the value prediction scheme and show its cost-effectiveness compared to the address predition scheme. In addition, we analyse the performance improvement achieved by the stride management and the history of previous predictions.
PDF

Implementation of the noise eliminating operators of binary image (이진화상 잡음제거 연산자에 관한 연구)

Hong, Hee-Kyung;Cho, Dung-Sub
- Proceedings of the KIEE Conference
- /
- 1988.07a
- /
- pp.636-639
- /
- 1988
This paper suggests the operation performing the noise elimination of binary image. The image is read by the scanner. And operand is selected according to the size of input image. Through the Dilation and Erosion, elementary vector operation with selected operand, the noise of input image is eliminated.
PDF

Design of Partial Product Accumulator using Multi-Operand Decimal CSA and Improved Decimal CLA (다중 피연산자 십진 CSA와 개선된 십진 CLA를 이용한 부분곱 누산기 설계)

Lee, Yang;Park, TaeShin;Kim, Kanghee;Choi, SangBang
- Journal of the Institute of Electronics and Information Engineers
- /
- v.53 no.11
- /
- pp.56-65
- /
- 2016
In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.
https://doi.org/10.5573/ieie.2016.53.11.056 인용 PDF KSCI

The Security Analysis of Previous CRT-RSA Scheme on Modified Opcode and Operand Attack (연산자 조작 공격과 피연산자 조작 공격에 대한 기존 CRT-RSA Scheme의 안전성 분석)

Hur, Soon-Haeng;Lee, Hyung-Sub;Rhee, Hyun-Seung;Choi, Dong-Hyun;Won, Dong-Ho;Kim, Seung-Joo
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.19 no.6
- /
- pp.185-190
- /
- 2009
As the use of RSA based on chinese remainder theorem(CRT-RSA) is being generalized, the security of CRT-RSA has been important. Since Bellcore researchers introduced the fault attacks on CRT-RSA, various countermeasures have been proposed. In 1999, Shamir firstly proposed a countermeasure using checking procedure. After Shamir's countermeasure was introduced, various countermeasures based on checking procedure have been proposed. However, Shamir's countermeasure was known to be vulnerable to the modified operand attack by Joey et al. in 2001, and the checking procedure was known to be vulnerable to the modified opcode attack by Yen et al. in 2003. Yen et al. proposed a new countermeasure without checking procedure, but their countermeasure was known to be also vulnerable to the modified operand attack by Yen and Kim in 2007. In this paper, we point out that pre, but countermeasures were vulnerable to the modified operand attack or the modified opcode attack.
https://doi.org/10.13089/JKIISC.2009.19.6.185 인용 PDF KSCI HTML

The CORDIC Circuit of Redundant Signed Binary Number (Redundant Signed Binary Number에 의한 CORDIC 회로)

김승열;김용대;한선경;유영갑
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.40 no.6
- /
- pp.1-8
- /
- 2003
A novel CORDIC circuit is presented based on a redundant number system eliminating global carry Propagation. The number format employs a new recoding scheme similar to the Booth receding resolving carry problems in addition. A pipelined architecture is introduced having a constant scale factor in its computation of trigonometric functions. The operational time of the circuit is constant independent of the number of operand digits.
PDF KSCI

DSP Performance Maximization with Multisample Technique

Lee, Hosun;Lawrence K.W. Law;Youngyearl Han
- Proceedings of the IEEK Conference
- /
- 2000.09a
- /
- pp.471-474
- /
- 2000
In this paper, we present multisample DSP coding technique for StarCore, SC 140 DSP. The multisample programming is a pipelining technique that exploits operand reuse both coefficients and variables within kernel. A coefficient or operand is loaded once from memory and then the value may be used by multiple ALUs. It is possible to evaluate one intermediate product from each of four output sample calculations in parallel . Therefore, parallelization has been achieved by processing multiple samples in parallel rather than multiple intermediate products belonging to only one sample. The benefits of decreasing the number of memory moves per sample is to increase the algorithm perforomance. In this paper, the multisample technique has been implemented in FIR filter calculation using Motorola StarCore DSP development tool.
PDF

Search Result 51, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)