• Title/Summary/Keyword: Verilog-A model

Search Result 65, Processing Time 0.021 seconds

A Novel Low Power Design of ALU Using Ad Hoc Techniques

  • Agarwa, Ankur;Pandya, A.S.;Lho, Young-Uhg
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.2
    • /
    • pp.102-107
    • /
    • 2005
  • This paper presents the comparison and performance analysis for CPL and CMOS based designs. We have developed the Verilog-HDL codes for the proposed designs and simulated them using ModelSim for verifying the logical correctness and the timing properties of the proposed designs. The proposed designs are then analyzed at the layout level using LASI. The layouts of the proposed designs are simulated in Winspice for timing and power characteristics. The result shows that the new circuits presented consistently consume less power than the conventional design of the same circuits. It can also be seen that these circuits have the lesser propagation delay and thus higher speed than the conventional designs.

Efficient Integrated Design of AES Crypto Engine Based on Unified Data-Path Architecture (단일 데이터패스 구조에 기반한 AES 암호화 및 복호화 엔진의 효율적인 통합설계)

  • Jeong, Chan-Bok;Moon, Yong-Ho
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.7 no.3
    • /
    • pp.121-127
    • /
    • 2012
  • An integrated crypto engine for encryption and decryption of AES algorithm based on unified data-path architecture is efficiently designed and implemented in this paper. In order to unify the design of encryption and decryption, internal steps in single round is adjusted so as to operate with columns after row operation is completed and efficient method for a buffer is developed to simplify the Shift Rows operation. Also, only one S-box is used for both key expansion and crypto operation and Key-Box saving expended key is introduced provide the key required in encryption and decryption. The functional simulation based on ModelSim simulator shows that 164 clocks are required to process the data of 128bits in the proposed engine. In addition, the proposed engine is implemented with 6,801 gates by using Xilinx Synthesizer. This demonstrate that 40% gates savings is achieved in the proposed engine, compared to individual designs of encryption and decryption engine.

Hardware Implementation of Time Skew Calibration Block for Time Interleaved ADC (TI ADC를 위한 시간 왜곡 교정 블록의 하드웨어 구현)

  • Khan, Sadeque Reza;Choi, Goangseog
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.13 no.3
    • /
    • pp.35-42
    • /
    • 2017
  • This paper presents hardware implementation of background timing-skew calibration technique for time-interleaved analog-to-digital converters (TI ADCs). The timing skew between any two adjacent analog-digital (A/D) channels is detected by using pure digital Finite Impulse Response (FIR) delay filter. This paper includes hardware architecture of the system, main units and small sub-blocks along with control logic circuits. Moreover, timing diagrams of logic simulations using ModelSim are provided and discussed for further understanding about simulations. Simulation process in MATLAB and Verilog is also included and provided with basic settings need to be done. For hardware implementation it not practical to work with all samples. Hence, the simulation is conducted on 512 TI ADC output samples which are stored in the buffer simultaneously and the correction arithmetic is done on those samples according to the time skew algorithm. Through the simulated results, we verified the implemented hardware is working well.

A Study on the Implementation of Low Power DCT Architecture for MPEG-4 AVC (저전력 DCT를 이용한 MPEG-4 AVC 압축에 관한 연구)

  • Kim, Dong-Hoon;Seo, Sang-Jin;Park, Sang-Bong;Jin, Hyun-Joon;Park, Nho-Kyung
    • Proceedings of the KIEE Conference
    • /
    • 2007.10a
    • /
    • pp.371-372
    • /
    • 2007
  • In this paper we present performance and implementation comparisons of high performance two dimensional forward and inverse Discrete Cosine Transform (2D-DCT/IDCT) algorithm and low power algorithm for $8{\times}8$ 20 DCT and quantization based on partial sum and its corresponding hardware architecture for FPGA in MPEG-4. The architecture used in both low power 20 DCT and 2D IDCT is based on the conventional row-column decomposition method. The use of Fast algorithm and distributed arithmetic(DA) technique to implement the DCT/IDCT reduces the hardware complexity. The design was made using Mentor Graphics Tools for design entry and implementation. Mentor Graphics ModelSim SE6.1f was used for Verilog HDL entry, behavioral Simulation and Synthesis. The 2D DCT/IDCT consumes only 50% of the Operating Power.

  • PDF

Radix-2 Booth-based Variable Precision Multiplier for Lightweight CNN Accelerators (경량 CNN 가속기를 위한 Radix-2 Booth 기반 가변 정밀도 곱셈기)

  • Guem, Duck-Hyun;Jeon, Seung-Jin;Choi, Jae-Young;Kim, Ji-Hyeok;Kim, Sunhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.494-496
    • /
    • 2022
  • 엣지 디바이스에서 딥러닝을 활용하기 위하여 CNN 경량화 연구들이 진행되고 있다. 경량 CNN 은 대부분 고정 소수점을 사용하며, 계층에 따라 정밀도는 달라진다. 본 논문에서는 경량 CNN 을 지원하기 위하여, 사용 계층에 따라 정밀도를 선택할 수 있는 가변 정밀도 곱셈기를 제안한다. 제안하는 가변 정밀도 곱셈기는 낮은 정밀도 곱셈기를 병합하는 구조로, 정밀도가 낮을 때는 병렬 처리를 통해 효율을 높인다. 제안하는 곱셈기를 Verilog HDL로 설계하고 ModelSim 에서 동작을 확인하였다. 설계된 곱셈기는 계층별로 정밀도가 다른 CNN 가속기에서 효율적으로 적용될 것으로 기대된다.

Design of Message Passing Engine Based on Processing Node Status for MPI Collective Communication (MPI 집합통신을 위한 프로세싱 노드 상태 기반의 메시지 전달 엔진 설계)

  • Chung, Won-Young;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.8B
    • /
    • pp.668-676
    • /
    • 2012
  • In this paper, on the assumption that MPI collective communication function is converted into a group of point-to-point communication functions in the transaction level, an algorithm that optimizes broadcast, scatter and gather function among MPI collective communication is proposed. The MPI hardware engine that operates the proposed algorithm was designed, and it was named the OCC-MPE (Optimized Collective Communication Message Passing Engine). The OCC-MPE operates point-to-point communication by using the standard send mode. The transmission order is arranged according to the algorithm that proposes the most frequently used broadcast, scatter and gather functions among the collective communications, so the whole communication time is reduced. To measure the performance of the proposed algorithm, the OCC-MPE with the Bus Functional Model (BFM) based on SystemC was designed. After evaluating the performance through the BFM based on SystemC, the proposed OCC-MPE is designed by using VerilogHDL. As a result of synthesizing with the TSMC $0.18{\mu}m$, the gate count of each OCC-MPE is approximately 1978.95 with four processing nodes. That occupies approximately 4.15% in the whole system, which means it takes up a relatively small amount. Improved performance is expected with relatively small amounts of area increase if the OCC-MPE operated by the proposed algorithm is added to the MPSoC (Multi-Processor System on a Chip).

Hardware Design of Rate Control for H.264/AVC Real-Time Video Encoding (실시간 영상 부호화를 위한 H.264/AVC의 비트율 제어 하드웨어 설계)

  • Kim, Changho;Ryoo, Kwangki
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.12
    • /
    • pp.201-208
    • /
    • 2012
  • In this paper, the hardware design of rate control for real-time video encoded is proposed. In the proposed method, a quadratic rate distortion model with high-computational complexity is not used when quantization parameter values are being decided. Instead, for low-computational complexity, average complexity weight values of frames are used to calculate QP. For high speed and low computational prediction, the MAD is predicted based on the coded basic unit, using spacial and temporal correlation in sequences. The rate control is designed with the hardware for fast QP decision. In the proposed method, a quadratic rate distortion model with high-computational complexity is not used when quantization parameter values are being decided. Instead, for low-computational complexity, average complexity weight values of frames are used to calculate QP. In addition, the rate control is designed with the hardware for fast QP decision. The execution cycle and gate count of the proposed architecture were reduced about 65% and 85% respectively compared with those of previous architecture. The proposed RC was implemented using Verilog HDL and synthesized with UMC $0.18{\mu}m$ standard cell library. The synthesis result shows that the gate count of the architecture is about 19.1k with 108MHz clock frequency.

A Simulation Framework for Mobile 3D Graphics Architecture (모바일 3차원 그래픽 아키덱쳐를 위한 시뮬레이션 프레임웍)

  • Lee Won-Jong;Park Jeong-Soo;Han Tack-Don
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06a
    • /
    • pp.226-228
    • /
    • 2006
  • In this paper we describe a simulation and development framework for designing mobile 3D graphics architectures. We are developing a simple and flexible simulation and verification environment (SVE) that uses gITrace's ability to intercept and redirect an OpenGL/ES streams. In combination wlth gITrace to trace OpenGL/ES commands, the SVE simulates the behavior of mobile 3D graphics pipeline during playback of traces, and then produces the second geometry trace that can be used as a test vector for the Verilog/HDL RT-level model. By comparing the frame-by-frame results, we can conduct architectural verification. To demonstrate the functionality of the SVE, we show the implementation of the verified mobile 3D architecture on a FPGA board. For this, we also present an application development environment (ADE) includes a mobile graphics API and a device driver interface (DDI). The proposed two software environments, the SVE and the ADE could be used fer developing and testing mobile applications, architectural study and speculative hardware designs.

  • PDF

Design and Implementation of Seismic Data Acquisition System using MEMS Accelerometer (MEMS형 가속도 센서를 이용한 지진 데이터 취득 시스템의 설계 및 구현)

  • Choi, Hun;Bae, Hyeon-Deok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.6
    • /
    • pp.851-858
    • /
    • 2012
  • In this paper, we design a seismic data acquisition system(SDAS) and implement it. This system is essential for development of a noble local earthquake disaster preventing system in population center. In the system, we choose a proper MEMS-type triaxial accelerometer as a sensor, and FPGA and ARM processor are used for implementing the system. In the SDAS, each module is realized by Verilog HDL and C Language. We carry out the ModelSim simulation to verify the performances of important modules. The simulation results show that the FPGA-based data acquisition module can guarantee an accurate time-synchronization for the measured data from each axis sensor. Moreover, the FPGA-ARM based embedded technology in system hardware design can reduce the system cost by the integration of data logger, communication sever, and facility control system. To evaluate the data acquisition performance of the SDAS, we perform experiments for real seismic signals with the exciter. Performances comparison between the acquired data of the SDAS and the reference sensor shows that the data acquisition performance of the SDAS is valid.

Design of Low Area Decimation Filters Using CIC Filters (CIC 필터를 이용한 저면적 데시메이션 필터 설계)

  • Kim, Sunhee;Oh, Jaeil;Hong, Dae-ki
    • Journal of the Semiconductor & Display Technology
    • /
    • v.20 no.3
    • /
    • pp.71-76
    • /
    • 2021
  • Digital decimation filters are used in various digital signal processing systems using ADCs, including digital communication systems and sensor network systems. When the sampling rate of digital data is reduced, aliasing occurs. So, an anti-aliasing filter is necessary to suppress aliasing before down-sampling the data. Since the anti-aliasing filter has to have a sharp transition band between the passband and the stopband, the order of the filter is very high. However, as the order of the filter increases, the complexity and area of the filter increase, and more power is consumed. Therefore, in this paper, we propose two types of decimation filters, focusing on reducing the area of the hardware. In both cases, the complexity of the circuit is reduced by applying the required down-sampling rate in two times instead of at once. In addition, CIC decimation filters without a multiplier are used as the decimation filter of the first stage. The second stage is implemented using a CIC filter and a down sampler with an anti-aliasing filter, respectively. It is designed with Verilog-HDL and its function and implementation are validated using ModelSim and Quartus, respectively.