• Title/Summary/Keyword: 64bit

Search Result 393, Processing Time 0.027 seconds

On the design of 64bit CLSA adder using the optimized algorithm (최적 알고리즘을 이용한 64비트 CLSA 가산기 설계)

  • 이영훈;김상수
    • Journal of the Korea Society of Computer and Information
    • /
    • v.4 no.3
    • /
    • pp.47-52
    • /
    • 1999
  • The efficiency of an adder which plays an important role in micro-process and DSP greatly depends on the kinds of carry generation method. So in this paper. I used both CLA excellent in the speed and CSA best in the chip-size. The 64bit adder is designed with high speed which is two optimum combination. Therefore this paper suggested the way of CLSA improving both speed and chip-size. and proved the excellence of the designed circuit.

Implementation of 2,048-bit RSA Based on RNS(Residue Number Systems) (RNS(Residue Number Systems) 기반의 2,048 비트 RSA 설계)

  • 권택원;최준림
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.4
    • /
    • pp.57-66
    • /
    • 2004
  • This paper proposes the design of a 2,048-bit RSA based on RNS(residue number systems) Montgomery modular multiplier As the systems that RNS processes a fast parallel modular multiplication for a large word partitioned into small words, we introduce Montgomery reduction method(MRM)[1]based on Wallace tree modular multiplier and 33 RNS bases with 64-bit size for RNS Montgomery modular multiplication in this paper. Also, for fast RNS modular multiplication, a modified method based on Chinese remainder theorem(CRT)[2] is presented. We have verified 2,048-bit RSA based on RNS using Samsung 0.35${\mu}{\textrm}{m}$ technology and the 2,048-bit RSA is performed in 2.54㎳ at 100MHz.

Hardware Design of the Synchronizer and the Demodulator of a 18000-3 PJM Mode Tag (18000-3 PJM 모드 태그의 동기부 및 복조부 하드웨어 설계)

  • Jeon, Don-Guk;Yang, Hoon-Gee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.10 no.2
    • /
    • pp.77-83
    • /
    • 2011
  • In this paper, we present the design procedure of the synchronizer and the demodulator of a 13.56MHz RFID PJM tag, which was standardized in ISO 18000-3 mode 3. We optimize the algorithms in order to minimize the number of registers and implement them based on international standard. The designed module is simulated by Modelsim and FPGA. The synchronizer is composed of 3 correlators that is implemented by 1,024(16bit ${\times}$ 64cycle) registers. The demodulator is composed of 2 correlators that is implemented by 128(2bit ${\times}$ 64cycle) registers. The simulation performed with the demodulator integrated with the synchronizer shows that it works at about 87% success rate with the test data of SNR -2dB and 100% with those of SNR 4dB.

A module generator for variable-precision multiplier core with error compensation for low-power DSP applications (저전력 DSP 응용을 위한 오차보상을 갖는 가변 정밀도 승산기 코어 생성기)

  • Hwang, Seok-Ki;Lee, Jin-Woo;Shin, Kyung-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.2A
    • /
    • pp.129-136
    • /
    • 2005
  • A multiplier generator, VPM_Gen (Variable-Precision Multiplier Generator), which generates Verilog-HDL models of multiplier cores with user-defined bit-width specification, is described. The bit-widths of operands are parameterized in the range of $8-bit{\sim}32-bit$ with 1-bit step, and the product from multiplier core can be truncated in the range of $8-bit{\sim}64-bit$ with 2-bit step, resulting that the VPM_Gen can generate 3,455 multiplier cores. In the case of truncating multiplier output, by eliminating the circuits corresponding to the truncation part, the gate counts and power dissipation can be reduced by about 40% and 30%, respectively, compared with full-precision multiplier. As a result, an area-efficient and low-power multiplier core can be obtained. To minimize truncation error, an adaptive error-compensation method considering the number of truncation bits is employed. The multiplier cores generated by VPM_Gen have been verified using Xilinx FFGA board and logic analyzer.

An implementation of block cipher algorithm HIGHT for mobile applications (모바일용 블록암호 알고리듬 HIGHT의 하드웨어 구현)

  • Park, Hae-Won;Shin, Kyung-Wook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.05a
    • /
    • pp.125-128
    • /
    • 2011
  • This paper describes an efficient hardware implementation of HIGHT block cipher algorithm, which was approved as standard of cryptographic algorithm by KATS(Korean Agency for Technology and Standards) and ISO/IEC. The HIGHT algorithm, which is suitable for ubiquitous computing devices such as a sensor in USN or a RFID tag, encrypts a 64-bit data block with a 128-bit cipher key to make a 64-bit cipher text, and vice versa. For area-efficient and low-power implementation, we optimize round transform block and key scheduler to share hardware resources for encryption and decryption. The HIGHT64 core synthesized using a $0.35-{\mu}m$ CMOS cell library consists of 3,226 gates, and the estimated throughput is 150-Mbps with 80-MHz@2.5-V clock.

  • PDF

The Design of A Fast Two′s Complement Adder with Redundant Binary Arithmetic (RB 연산을 이용한 고속 2의 보수 덧셈기의 설계)

  • Lee, Tae-Uk;Jo, Sang-Bok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.5
    • /
    • pp.55-65
    • /
    • 2000
  • In this paper a new architecture of 24-bit two's complement adder is designed by using RB(Redundant Binary) arithmetic which has the advantage of CPF(Carry-Propagation-Free). A MPPL(Modified PPL) XOR/XNOR gate is applied to improve a TC2RB(Two's Complement to RB SUM converter) speed and to reduce the number of transistors, and we proposed two types adder which used a fast RB2TC(RB SUM to Two's Complement converter). The property of two types adder is followings. The improvement of TYPE 1 adder speed is archived through the use of VGS(Variable Group Select) method and TYPE 2 adder is through the use of a 64-bit GCG(Group Change bit Generator) circuit and a 8-bit TYPE 1 adder. For 64-bit, TYPE 1 adder can be expected speed improvement of 23.5%, 25.7% comparing with the CLA and CSA, and TYPE 2 adder can be expected 41.2%, 45.9% respectively. The propagation delay of designed 24-bit TYPE 1 adder is 1.4ns and TYPE 2 adder is 1.2ns. The implementation is highly regular with repeated modules and is very well suited for microprocessor systems and fast DSP units.

  • PDF

Hardware Implementation of the 3GPP KASUMI crypto algorithm

  • Kim, Ho-Won;Park, Yong-Je;Kim, Moo-Seop;Ryu, Hui-Su
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.317-320
    • /
    • 2002
  • In this paper, we will present the design and implementation of the KASUMI crypto algorithm and confidentiality algorithm (f8) to an hardware chip for 3GPP system. The f8 algorithm is based on the KASUMI which is a block cipher that produces a 64-bit output from a 64-bit input under the control of a 128-bit key. Various architectures (low hardware complexity version and high performance version) of the KASUMI are made with a Xilinx FPGA and the characteristics such as hardware complexity and thor performance are analyzed.

  • PDF

JPEG-based Still Image Codec Architecture for Display Systems at FHD@240Hz (FHD@240Hz 디스플레이 시스템을 위한 JPEG 기반 정지영상 코덱의 구조)

  • Park, Hyun Sang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.11a
    • /
    • pp.117-120
    • /
    • 2011
  • 240 Hz 이상의 높은 프레임율을 가지는 LCD 기반 평판 FHD급 디스플레이 시스템은 높은 프레임율로 인하여 디스플레이 패널로 전송해야하는 유효한 데이터율이 1.9GB/s까지 이르게 되며, 수평/수직 동기를 감안하면 2GB/s 이상의 데이터 전송 대역이 필요하다. DRAM을 이용하여 이런 데이터 대역폭을 제공하려면 다수의 메모리 장치를 사용해야하기 때문에, 비용 상승, 전력소모량 증가 등의 문제를 야기한다. 이런 문제를 해결하기 위하여 본 논문에서는 JPEG 기반의 정지 영상 압축 시스템을 제안한다. 제안한 시스템은 8개의 디코더가 동시에 동작하는 구조를 가지고 있으며, 단일 데이터 열로부터 8개의 데이터 열을 용이하게 구분할 수 있도록 128-bit 데이터에 정렬된 64-bit 마커를 사용한다. 제안한 64-bit 마커는 마커 에뮬레이션을 야기하지 않도록 설계되었기 때문에, 인코더와 디코더의 구현 복잡도를 낮출 수 있고 단일 데이터열을 8개의 데이터열로 분리하는 작업을 매우 용이하게 한다.

  • PDF

Branch Predictor Design and Its Performance Evaluation for A High Performance Embedded Microprocessor (고성능 내장형 마이크로프로세서를 위한 분기예측기의 설계 및 성능평가)

  • Lee, Sang-Hyuk;Kim, Il-Kwan;Choi, Lynn
    • Proceedings of the IEEK Conference
    • /
    • 2002.06b
    • /
    • pp.129-132
    • /
    • 2002
  • AE64000 is the 64-bit high-performance microprocessor that ADC Co. Ltd. is developing for an embedded environment. It has a 5-stage pipeline and uses Havard architecture with a separated instruction and data caches. It also provides SIMD-like DSP and FP operation by enabling the 8/16/32/64-bit MAC operation on 64-bit registers. AE64000 processor implements the EISC ISA and uses the instruction folding mechanism (Instruction Folding Unit) that effectively deals with LERI instruction in EISC ISA. But this unit makes branch prediction behavior difficult. In this paper, we designs a branch predictor optimized for AE64000 Pipeline and develops a AES4000 simulator that has cycle-level precision to validate the performance of the designed branch predictor. We makes TAC(Target address cache) and BPT(branch prediction table) seperated for effective branch prediction and uses the BPT(removed indexed) that has no address tags.

  • PDF

On a Reduction of Computation Time of FFT Cepstrum (FFT 켑스트럼의 처리시간 단축에 관한 연구)

  • Jo, Wang-Rae;Kim, Jong-Kuk;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.57-64
    • /
    • 2003
  • The cepstrum coefficients are the most popular feature for speech recognition or speaker recognition. The cepstrum coefficients are also used for speech synthesis and speech coding but has major drawback of long processing time. In this paper, we proposed a new method that can reduce the processing time of FFT cepstrum analysis. We use the normal ordered inputs for FFT function and the bit-reversed inputs for IFFT function. Therefore we can omit the bit-reversing process and reduce the processing time of FFT ceptrum analysis.

  • PDF