• Title/Summary/Keyword: FPGA Implementation

Search Result 961, Processing Time 0.027 seconds

Implementation of a Grant Processor for Upstream Cell Transmission at the ONU in the ATM-PON (ATM-PON의 ONU에서 상향 셀 전송을 위한 승인처리기의 구현)

  • 우만식;정해;유건일
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.5C
    • /
    • pp.454-464
    • /
    • 2002
  • In the ATM-PON (Asynchronous Transfer Mode-Passive Optical Network), the downstream cell transmitted by an OLT is broadcast to all ONUs. The ONU receives selectively its own cells by VP filtering. On the other hand, the upstream cell can be transmitted by ONU in the case of receiving a grant from the OLT. After providing the grant to an ONU, the OLT expects the arrival of a cell after an elapse of the equalized round trip delay. ITU-T G.983.1 recommends that one bit error is allowed between the expected arrival time and the actual arrival time at the OLT. Because the ONU processes the different delay to each type of grant (ranging, user cell, and mimi-slot grant), it is not simple to design the transmission part of ONU. In this paper, we implement a grant processor which provides the delay accurately in the ONU TC chip with the FPGA. For the given equalized delay, it deals with the delay for the cell, the byte, and the bit unit by using the shift register, the byte counter, and the D flip-flop, respectively. We verify the operation of the grant processor by the time simulation and the measurement of the optical board output.

VLSI Design of Reed-Solomon Decoder over GF($2^8$) with Extreme Use of Resource Sharing (하드웨어 공유 극대화에 의한 GF($2^8$) Reed-Solomon Decoder의 VLSI설계)

  • 이주태;이승우;조중휘
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.3
    • /
    • pp.8-16
    • /
    • 1999
  • This paper describes a VLSI design of Reed-Solomon(RS) decoder using the modified Euclid algorithm, with the main theme focused on the $\textit{GF}(2^8)$. To get area-efficient design, a number of new architectures have been devised with maximal register and Euclidean ALU unit sharing. One ALU is shared to replace 18 ALUs which computes an error locator polynomial and an error evaluation polynomial. Also, 18 registers are shared to replace 24 registers which stores coefficients of those polynomials. The validity and efficiency of the proposed architecture have been verified by simulation and by FLEX$^TM$ FPGA implementation in hardware description language VHDL. The proposed Reed-Solomon decoder, which has the capability of decoding RS(208,192,17) and RS(182,172,11) for Digital Versatile Disc(DVD), has been designed by using O.6$\mu\textrm{m}$ CMOS TLM Compass$^TM$ technology library, which contains totally 17k gates with a core area of 2.299$\times$2.284 (5.25$\textrm{mm}^2$). The chip can run at 20MHz while the DVD requirement is 3.74MHz.

  • PDF

Design of Modified JTAG for Debuggers of RISC Processors (RISC 프로세서의 디버거를 위한 변형된 JTAG 설계)

  • Xu, Jingzhe;Park, Hyung-Bae;Jung, Seung-Pyo;Park, Ju-Sung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.7
    • /
    • pp.65-75
    • /
    • 2011
  • As the technology of SoC design has been developed, the debugging is more and more important and users want a fast and reliable debugger. This paper deals with an implementation of the fast debugger which can reduce a debugging processing cycle by designing a modified JTAG suitable for a new RISC processor debugger. Designed JTAG is embedded to the OCD of Core-A and works with SW debugger. We confirmed the functions and reliability of the debugger. By comparing to the original JTAG system, the debugging processing cycle of the proposed JTAG is reduced at 8.5~72.2% by each debugging function. Further more, the gate count is reduced at 31.8%.

The analysis of the detection probability of FMCW radar and implementation of signal processing part (차량용 FMCW 레이더의 탐지 성능 분석 및 신호처리부 개발)

  • Kim, Sang-Dong;Hyun, Eu-Gin;Lee, Jong-Hun;Choi, Jun-Hyeok;Park, Jung-Ho;Park, Sang-Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.12
    • /
    • pp.2628-2635
    • /
    • 2010
  • This paper analyzes the detection probability of FMCW (Frequency Modulated Continuous Wave) radar based on Doppler frequency and analog-digital converter bit and designs and implements signal processing part of FMCW radar. For performance evaluation, the FMCW radar system consists of a transmitted part and a received part and uses AWGN channel. The system model is verified through analysis and simulation. Frequency offset occurs in the received part caused by the mismatching between the received signal and the reference signal. In case of Doppler frequency less than about 38KHz, performance degradation of detection does not occur in FMCW radar with 75cm resolution The analog-digital converter needs at least 6 bit in order not to degrade the detection probability. And, we design and implement digital signal processing part based on DDS chip of digital transmitted signal generator for FMCW radar.

Implementation of Optical-based Measuring Instrument for Overhead Contact Wire in Railway (전기철도 전차선로의 광학기반 형상 검측 하드웨어 구현)

  • Park, Young;Cho, Yong-Hyeon;Park, Hyun-June;Kwon, Sam-Young
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2008.06a
    • /
    • pp.518-518
    • /
    • 2008
  • We propose an optical-based measuring instrument of catenary system in electric railway. This system was made to utilize line scan camera as inspecting system to measure the stagger and height of overhead contact wire in railway and composed with optical type source and FPGA-based image acquisition system with PCI slot. Vision acquisition software has been used for the application to programming interface for image acquisition, display, and storage with a frequency of sampling. The proposed optical-based measuring instrument to measure the contact wire geometry shows promising on-field applications for online condition motoring. Also, this system can be applied to measure the hight and stagger or other geometry of different type of overhead catenary system.

  • PDF

Design and Implementation of a Hardware-based Transmission/Reception Accelerator for a Hybrid TCP/IP Offload Engine (하이브리드 TCP/IP Offload Engine을 위한 하드웨어 기반 송수신 가속기의 설계 및 구현)

  • Jang, Han-Kook;Chung, Sang-Hwa;Yoo, Dae-Hyun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.9
    • /
    • pp.459-466
    • /
    • 2007
  • TCP/IP processing imposes a heavy load on the host CPU when it is processed by the host CPU on a very high-speed network. Recently the TCP/IP Offload Engine (TOE), which processes TCP/IP on a network adapter instead of the host CPU, has become an attractive solution to reduce the load in the host CPU. There have been two approaches to implement TOE. One is the software TOE in which TCP/IP is processed by an embedded processor and the other is the hardware TOE in which TCP/IP is processed by a dedicated ASIC. The software TOE has poor performance and the hardware TOE is neither flexible nor expandable enough to add new features. In this paper we designed and implemented a hybrid TOE architecture, in which TCP/IP is processed by cooperation of hardware and software, based on an FPGA that has two embedded processor cores. The hybrid TOE can have high performance by processing time-critical operations such as making and processing data packets in hardware. The software based on the embedded Linux performs operations that are not time-critical such as connection establishment, flow control and congestions, thus the hybrid TOE can have enough flexibility and expandability. To improve the performance of the hybrid TOE, we developed a hardware-based transmission/reception accelerator that processes important operations such as creating data packets. In the experiments the hybrid TOE shows the minimum latency of about $19{\mu}s$. The CPU utilization of the hybrid TOE is below 6 % and the maximum bandwidth of the hybrid TOE is about 675 Mbps.

Implementation of High-Throughput SHA-1 Hash Algorithm using Multiple Unfolding Technique (다중 언폴딩 기법을 이용한 SHA-1 해쉬 알고리즘 고속 구현)

  • Lee, Eun-Hee;Lee, Je-Hoon;Jang, Young-Jo;Cho, Kyoung-Rok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.4
    • /
    • pp.41-49
    • /
    • 2010
  • This paper proposes a new high speed SHA-1 architecture using multiple unfolding and pre-computation techniques. We unfolds iterative hash operations to 2 continuos hash stage and reschedules computation timing. Then, the part of critical path is computed at the previous hash operation round and the rest is performed in the present round. These techniques reduce 3 additions to 2 additions on the critical path. It makes the maximum clock frequency of 118 MHz which provides throughput rate of 5.9 Gbps. The proposed architecture shows 26% higher throughput with a 32% smaller hardware size compared to other counterparts. This paper also introduces a analytical model of multiple SHA-1 architecture at the system level that maps a large input data on SHA-1 block in parallel. The model gives us the required number of SHA-1 blocks for a large multimedia data processing that it helps to make decision hardware configuration. The hs fospeed SHA-1 is useful to generate a condensed message and may strengthen the security of mobile communication and internet service.

A Model-based Methodology for Application Specific Energy Efficient Data path Design Using FPGAs (FPGA에서 에너지 효율이 높은 데이터 경로 구성을 위한 계층적 설계 방법)

  • Jang Ju-Wook;Lee Mi-Sook;Mohanty Sumit;Choi Seonil;Prasanna Viktor K.
    • The KIPS Transactions:PartA
    • /
    • v.12A no.5 s.95
    • /
    • pp.451-460
    • /
    • 2005
  • We present a methodology to design energy-efficient data paths using FPGAs. Our methodology integrates domain specific modeling, coarse-grained performance evaluation, design space exploration, and low-level simulation to understand the tradeoffs between energy, latency, and area. The domain specific modeling technique defines a high-level model by identifying various components and parameters specific to a domain that affect the system-wide energy dissipation. A domain is a family of architectures and corresponding algorithms for a given application kernel. The high-level model also consists of functions for estimating energy, latency, and area that facilitate tradeoff analysis. Design space exploration(DSE) analyzes the design space defined by the domain and selects a set of designs. Low-level simulations are used for accurate performance estimation for the designs selected by the DSE and also for final design selection We illustrate our methodology using a family of architectures and algorithms for matrix multiplication. The designs identified by our methodology demonstrate tradeoffs among energy, latency, and area. We compare our designs with a vendor specified matrix multiplication kernel to demonstrate the effectiveness of our methodology. To illustrate the effectiveness of our methodology, we used average power density(E/AT), energy/(area x latency), as themetric for comparison. For various problem sizes, designs obtained using our methodology are on average $25\%$ superior with respect to the E/AT performance metric, compared with the state-of-the-art designs by Xilinx. We also discuss the implementation of our methodology using the MILAN framework.

An Improvement of Implementation Method for Multi-Layer AHB BusMatrix (ML-AHB 버스 매트릭스 구현 방법의 개선)

  • Hwang Soo-Yun;Jhang Kyoung-Sun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.629-638
    • /
    • 2005
  • In the System on a Chip design, the on chip bus is one of the critical factors that decides the overall system performance. Especially, in the case or reusing the IPs such as processors, DSPs and multimedia IPs that requires higher bandwidth, the bandwidth problems of on chip bus are getting more serious. Recently ARM proposes the Multi-Layer AHB BusMatrix that is a highly efficient on chip bus to solve the bandwidth problems. The Multi-Layer AHB BusMatrix allows parallel access paths between multiple masters and slaves in a system. This is achieved by using a more complex interconnection matrix and gives the benefit of increased overall bus bandwidth, and a more flexible system architecture. However, there is one clock cycle delay for each master in existing Multi-Layer AHB BusMatrix whenever the master starts new transactions or changes the slave layers because of the Input Stage and arbitration logic realized with Moore type. In this paper, we improved the existing Multi-Layer AHB BusMatrix architecture to solve the one clock cycle delay problems and to reduce the area overhead of the Input Stage. With the elimination of the Input Stage and some restrictions on the arbitration scheme, we tan take away the one clock cycle delay and reduce the area overhead. Experimental results show that the end time of total bus transaction and the average latency time of improved Multi-Layer AHB BusMatrix are improved by $20\%\;and\;24\%$ respectively. in ease of executing a number of transactions by 4-beat incrementing burst type. Besides the total area and the clock period are reduced by $22\%\;and\;29\%$ respectively, compared with existing Multi-layer AHB BusMatrix.

Optimization of Approximate Modular Multiplier for R-LWE Cryptosystem (R-LWE 암호화를 위한 근사 모듈식 다항식 곱셈기 최적화)

  • Jae-Woo, Lee;Youngmin, Kim
    • Journal of IKEEE
    • /
    • v.26 no.4
    • /
    • pp.736-741
    • /
    • 2022
  • Lattice-based cryptography is the most practical post-quantum cryptography because it enjoys strong worst-case security, relatively efficient implementation, and simplicity. Ring learning with errors (R-LWE) is a public key encryption (PKE) method of lattice-based encryption (LBC), and the most important operation of R-LWE is the modular polynomial multiplication of rings. This paper proposes a method for optimizing modular multipliers based on approximate computing (AC) technology, targeting the medium-security parameter set of the R-LWE cryptosystem. First, as a simple way to implement complex logic, LUT is used to omit some of the approximate multiplication operations, and the 2's complement method is used to calculate the number of bits whose value is 1 when converting the value of the input data to binary. We propose a total of two methods to reduce the number of required adders by minimizing them. The proposed LUT-based modular multiplier reduced both speed and area by 9% compared to the existing R-LWE modular multiplier, and the modular multiplier using the 2's complement method reduced the area by 40% and improved the speed by 2%. appear. Finally, the area of the optimized modular multiplier with both of these methods applied was reduced by up to 43% compared to the previous one, and the speed was reduced by up to 10%.