• Title/Summary/Keyword: 5-stage pipeline

Search Result 73, Processing Time 0.028 seconds

Branch Predictor Design and Its Performance Evaluation for A High Performance Embedded Microprocessor (고성능 내장형 마이크로프로세서를 위한 분기예측기의 설계 및 성능평가)

  • Lee, Sang-Hyuk;Kim, Il-Kwan;Choi, Lynn
    • Proceedings of the IEEK Conference
    • /
    • 2002.06b
    • /
    • pp.129-132
    • /
    • 2002
  • AE64000 is the 64-bit high-performance microprocessor that ADC Co. Ltd. is developing for an embedded environment. It has a 5-stage pipeline and uses Havard architecture with a separated instruction and data caches. It also provides SIMD-like DSP and FP operation by enabling the 8/16/32/64-bit MAC operation on 64-bit registers. AE64000 processor implements the EISC ISA and uses the instruction folding mechanism (Instruction Folding Unit) that effectively deals with LERI instruction in EISC ISA. But this unit makes branch prediction behavior difficult. In this paper, we designs a branch predictor optimized for AE64000 Pipeline and develops a AES4000 simulator that has cycle-level precision to validate the performance of the designed branch predictor. We makes TAC(Target address cache) and BPT(branch prediction table) seperated for effective branch prediction and uses the BPT(removed indexed) that has no address tags.

  • PDF

Optimal Clock Period Selection Algorithm for Low Power Register Transfer Level Design (저전력 레지스티 전송 단계 설계를 위한 최적 클럭 주기 선택 알고리듬)

  • 최지영;김희석
    • Journal of the Korea Society of Computer and Information
    • /
    • v.8 no.4
    • /
    • pp.111-116
    • /
    • 2003
  • We proposed a optimal clock period selection algorithm for low power Register Transfer Level design. The proposed algorithm use the way of maintaining the throughput by reducing supply voltage after improve the system performance in order to minimize the power consumption. In this paper, it select the low power to use pipeline in the transformation of architecture. Also, the proposed algorithm is important the clock period selection in order to maximize the resource sharing. however, it execute the optimal clock period selection algorithm. The experiment result is to set the same result AR and HAL filter on the high level benchmark and to reduce in the case of two pipe stage 10.5% and three pipe stage as many as 33.4%.

  • PDF

A Design and Implementation of 32-bit Five-Stage RISC-V Processor Using FPGA (FPGA를 이용한 32-bit RISC-V 5단계 파이프라인 프로세서 설계 및 구현)

  • Jo, Sangun;Lee, Jonghwan;Kim, Yongwoo
    • Journal of the Semiconductor & Display Technology
    • /
    • v.21 no.4
    • /
    • pp.27-32
    • /
    • 2022
  • RISC-V is an open instruction set architecture (ISA) developed in 2010 at UC Berkeley, and active research is being conducted as a processor to compete with ARM. In this paper, we propose an SoC system including an RV32I ISA-based 32-bit 5-stage pipeline processor and AHB bus master. The proposed RISC-V processor supports 37 instructions, excluding FENCE, ECALL, and EBREAK instructions, out of a total of 40 instructions based on RV32I ISA. In addition, the RISC-V processor can be connected to peripheral devices such as BRAM, UART, and TIMER using the AHB-lite bus protocol through the proposed AHB bus master. The proposed SoC system was implemented in Arty A7-35T FPGA with 1,959 LUTs and 1,982 flip-flops. Furthermore, the proposed hardware has a maximum operating frequency of 50 MHz. In the Dhrystone benchmark, the proposed processor performance was confirmed to be 0.48 DMIPS.

간단한 데이터 스케줄링 기법을 이용한 2차원 DWT 처리기 설계

  • Kim, Gi-Yeong;Sin, Ho-Cheol;Lee, Sang-Beom;Kim, Yeong-Seop
    • Proceedings of the Korean Society Of Semiconductor Equipment Technology
    • /
    • 2006.10a
    • /
    • pp.174-177
    • /
    • 2006
  • 본 논문에서는 리프팅 기반의 DWT 구조의 단점을 개선하고자 플립핑(Flipping)기법과 5 단 파이프라인 구조(5 Stage Pipeline)를 적용하여 임계경로가 획기적으로 줄어든 1 차원 DWT 구조를 제안하고 이를 활용하여 JPEG2000 표준의 손실 압축 모드에서 이용되는 9/7 필터계수의 2 차원 DWT 를 수행 할 수 있도록 열 방향 DWT 처리기를 설계하였다. 2 차원 DWT 는 1 차원 DWT 의 처리 결과에 대해 열상의 열(Column) 방향으로 2 차원 처리를 수행해야 하므로 1 차원 결과를 저장하기 위한 한 영상 사이즈 만큼의 메모리 버퍼를 필요로 한다. 기존 $N^2$이 필요하던 메모리 사이즈를 14N으로 줄인 2차원 구조를 제안한다.

  • PDF

A Study on the 32 bit RISC/DSP Microprocessor Appropriate for Embedded Systems (내장형 시스템에 적합한 32 비트 RISC/DSP 마이크로프로세서에 관한 연구)

  • 유동열;문병인;홍종욱;이태영;이용석
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.257-260
    • /
    • 1999
  • We have designed a 32-bit RISC microprocessor with 16/32-bit fixed-point DSP functionality. This processor, called YRD-5, combines both general-purpose microprocessor and digital signal processor (DSP) functionality using the reduced instruction set computer (RISC) design principles. It has functional units for arithmetic operation, digital signal processing (DSP) and memory access. They operate in parallel in order to remove stall cycles after DSP and load/store instructions with one or more issue latency cycles. High performance was achieved with these parallel functional units while adopting a sophisticated 5-stage pipeline structure and an improved DSP unit.

  • PDF

Redesign of Stream Cipher Salsa20/8 (스트림 암호 Salsa20/8의 재설계)

  • Kim, Gil-Ho;Kim, Sung-Gi;Cho, Gyeong-Yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.8
    • /
    • pp.1904-1913
    • /
    • 2014
  • Was develop 256bit output stream cipher of improving for same key reuse prohibition and integrity. The developed stream cipher used Salsa20 round function was implemented to hardware of applying a 5-stage pipeline architecture, such as WSN and DMB for real-time processing can satisfy the speed and security requirements.

A study on the low power architecture of multi-giga bit synchronous DRAM's (Giga Bit급 저전력 synchronous DRAM 구조에 대한 연구)

  • 유회준;이정우
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.34C no.11
    • /
    • pp.1-11
    • /
    • 1997
  • The transient current components of the dRAM are analyzed and the sensing current, data path operation current and DC leakage current are revealed to be the major curretn components. It is expected that the supply voltage of less than 1.5V with low VT MOS witll be used in multi-giga bit dRAM. A low voltage dual VT self-timed CMOS logic in which the subthreshold leakage current path is blocked by a large high-VT MOS is proposed. An active signal at each node of the nature speeds up the signal propagation and enables the synchronous DRAM to adopt a fast pipelining scheme. The sensing current can be reduced by adopting 8 bit prefetch scheme with 1.2V VDD. Although the total cycle time for the sequential 8 bit read is the same as that of the 3.3V conventional DRAM, the sensing current is loered to 0.7mA or less than 2.3% of the current of 3.3V conventional DRAM. 4 stage pipeline scheme is used to rduce the power consumption in the 4 giga bit DRAM data path of which length and RC delay amount to 3 cm and 23.3ns, respectively. A simple wave pipeline scheme is used in the data path where 4 sequential data pulses of 5 ns width are concurrently transferred. With the reduction of the supply voltage from 3.3V to 1.2V, the operation current is lowered from 22mA to 2.5mA while the operation speed is enhanced more than 4 times with 6 ns cycle time.

  • PDF

A Design of Pipelined Adaptive Decision-Feedback Equalized using Delayed LMS and Redundant Binary Complex Filter Structure (Delayed LMS와 Redundant Binary 복소수 필터구조를 이용한 파이프라인 적응 결정귀환 등화기 설계)

  • An, Byung-Gyu;Lee, Jong-Nam;Shin, Kyung-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.12
    • /
    • pp.60-69
    • /
    • 2000
  • This paper describes a single-chip full-custom implementation of pipelined adaptive decision-feedback equalizer(PADFE) using a 0.25-${\mu}m$ CMOS technology for wide-band wireless digital communication systems. To enhance the throughput rate of ADFE, two pipeline stages are inserted into the critical path of the ADFE by using delayed least-mean-square(DLMS) algorithm. Redundant binary (RB) arithmetic is applied to all the data processing of the PADFE including filter taps and coefficient update blocks. When compared with conventional methods based on two's complement arithmetic, the proposed approach reduces arithmetic complexity, as well as results in a very simple complex-valued filter structure, thus suitable for VLSI implementation. The design parameters including pipeline stage, filter tap, coefficient and internal bit-width, and equalization performance such as bit error rate (BER) and convergence speed are analyzed by algorithm-level simulation using COSSAP. The single-chip PADFE contains about 205,000 transistors on an area of about $1.96\times1.35-mm^2$. Simulation results show that it can safely operate with 200-MHz clock frequency at 2.5-V supply, and its estimated power dissipation is about 890-mW. Test results show that the fabricated chip works functionally well.

  • PDF

Design and Verification of Pipelined Face Detection Hardware (파이프라인 구조의 얼굴 검출 하드웨어 설계 및 검증)

  • Kim, Shin-Ho;Jeong, Yong-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.10
    • /
    • pp.1247-1256
    • /
    • 2012
  • There are many filter based image processing algorithms and they usually require a huge amount of computations and memory accesses making it hard to attain a real-time performance, expecially in embedded applications. In this paper, we propose a pipelined hardware structure of the filter based face detection algorithm to show that the real time performance can be achieved by hardware design. In our design, the whole computation is divided into three pipeline stages: resizing the image (Resize), Transforming the image (ICT), and finding candidate area (Find Candidate). Each stage is optimized by considering the parallelism of the computation to reduce the number of cycles and utilizing the line memory to minimize the memory accesses. The resulting hardware uses 507 KB internal SRAM and occupies 9,039 LUTs when synthesized and configured on Xilinx Virtex5LX330 FPGA. It can operate at maximum 165MHz clock, giving the performance of 108 frame/sec, while detecting up to 20 faces.

A Calibration-Free 14b 70MS/s 0.13um CMOS Pipeline A/D Converter with High-Matching 3-D Symmetric Capacitors (높은 정확도의 3차원 대칭 커패시터를 가진 보정기법을 사용하지 않는 14비트 70MS/s 0.13um CMOS 파이프라인 A/D 변환기)

  • Moon, Kyoung-Jun;Lee, Kyung-Hoon;Lee, Seung-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.12 s.354
    • /
    • pp.55-64
    • /
    • 2006
  • This work proposes a calibration-free 14b 70MS/s 0.13um CMOS ADC for high-performance integrated systems such as WLAN and high-definition video systems simultaneously requiring high resolution, low power, and small size at high speed. The proposed ADC employs signal insensitive 3-D fully symmetric layout techniques in two MDACs for high matching accuracy without any calibration. A three-stage pipeline architecture minimizes power consumption and chip area at the target resolution and sampling rate. The input SHA with a controlled trans-conductance ratio of two amplifier stages simultaneously achieves high gain and high phase margin with gate-bootstrapped sampling switches for 14b input accuracy at the Nyquist frequency. A back-end sub-ranging flash ADC with open-loop offset cancellation and interpolation achieves 6b accuracy at 70MS/s. Low-noise current and voltage references are employed on chip with optional off-chip reference voltages. The prototype ADC implemented in a 0.13um CMOS is based on a 0.35um minimum channel length for 2.5V applications. The measured DNL and INL are within 0.65LSB and l.80LSB, respectively. The prototype ADC shows maximum SNDR and SFDR of 66dB and 81dB and a power consumption of 235mW at 70MS/s. The active die area is $3.3mm^2$.