• Title/Summary/Keyword: CMOS VLSI

Search Result 200, Processing Time 0.025 seconds

A Single-Chip Video/Audio CODEC for Low Bit Rate Application

  • Park, Seong-Mo;Kim, Seong-Min;Kim, Ig-Kyun;Byun, Kyung-Jin;Cha, Jin-Jong;Cho, Han-Jin
    • ETRI Journal
    • /
    • v.22 no.1
    • /
    • pp.20-29
    • /
    • 2000
  • In this paper, we present a design of video and audio single chip encoder/decoder for portable multimedia application. The single-chip called as video audio signal processor (VASP) consists of a video signal processing block and an audio single processing block. This chip has mixed hardware/software architecture to combine performance and flexibility. We designed the chip by partitioning between video and audio block. The video signal processing block was designed to implement hardware solution of pixel input/output, full pixel motion estimation, half pixel motion estimation, discrete cosine transform, quantization, run length coding, host interface, and 16 bits RISC type internal controller. The audio signal processing block is implemented with software solution using a 16 bits fixed point DSP. This chip contains 142,300 gates, 22 Kbits FIFO, 107 kbits SRAM, and 556 kbits ROM, and the chip size is $9.02mm{\times}9.06mm$ which is fabricated using 0.5 micron 3-layer metal CMOS technology.

  • PDF

32비트 VLSI프로세서 HARP의 마이크로 아키텍츄어 최적설계에 관한 연구

  • Park, Seong-Bae;Kim, Jong-Hyeon;O, Gil-Rok
    • ETRI Journal
    • /
    • v.11 no.4
    • /
    • pp.105-118
    • /
    • 1989
  • HARP(High performance Architecture for RISC type Processor)는 고유의 명령어 세트, 데이터 타입, 메모리 입출력, 예외 처리 기능을갖는 32비트 VLSI 프로세서 구조이다. 마이크로 아키텍츄어는 설계된 구조를 기대할 수 있는최고 성능을 갖도록 구조(architecture)와 구현(implementation) 사이의 최적 모델링을 통해 정의되는 구조체로서 구조의 개념 설계를 구현의 실물 설계로 변환 시켜주는 조율(tuning)모델이다. HARP의 고유한 명령어 세트를 비롯한 구조적 기능들을 최적 구현 하기위해 32비트 크기의 명령어 입력 유니트(Instruction Fetch Unit), 데이터 입출력 유니트(Data I/O Unit), 명령어/데이터 처리유니트(Instruction/Data Processing Unit), 예외 상황 처리 유니트(Exception Processing Unit)등 4개 유니트가 설계되었으며 이들 4개 유니트의 동작을 최대 속도로 유지시키기 위해 각급 주요 설계 변수들이 시뮬레이션을 통해 최적화 되었다. 유효 채널길이 $0.7\mum$급 3층 메탈 배선의 HCMOS(High performance CMOS)공정 기술을 구현 기준 기술로 사용하여 50MHz외 동작 주파수에서 최대50 MIPS(Million Instructions Per Second)의 성능을 갖도록 3단계 파이프라인이 설계되었다. 단일 위상의 50MHz클럭 입력과 동기화된 명령어/데이터 입출력을 위해 액세스 타임 20nsec이내의 고속 메모리 입출력 구조가 시뮬레이션되었으며 설계된 마이크로 아키텍츄어를 이용하여 HARP구조의 기대된 최대 성능을 검증하였다.

  • PDF

Area and Power Efficient VLSI Architecture for Two Dimensional 16-point Modified Gate Diffusion Input Discrete Cosine Transform

  • Thiruveni, M.;Shanthi, D.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.4
    • /
    • pp.497-505
    • /
    • 2016
  • The two-dimensional (2D) Discrete Cosine Transform (DCT) is used widely in image and video processing systems. The perception of human visualization permits us to design approximate rather than exact DCT. In this paper, we propose a digital implementation of 16-point approximate 2D DCT architecture based on one-dimensional (1D) DCT and Modified Gate Diffusion Input (MGDI) technique. The 8-point 1D Approximate DCT architecture requires only 12 additions for realization in digital VLSI. Additions can be performed using the proposed 8 transistor (8T) MGDI Full Adder which reduces 2 transistors than the existing 10 transistor (10T) MGDI Full Adder. The Approximate MGDI 2D DCT using 8T MGDI Full adders is simulated in Tanner SPICE for $0.18{\mu}m$ CMOS process technology at 100MHZ.The simulation result shows that 13.9% of area and 15.08 % of power is reduced in the 8-point approximate 2D DCT, 10.63 % of area and 15.48% of power is reduced in case of 16-point approximate 2D DCT using 8 Transistor MGDI Full Adder than 10 Transistor MGDI Full Adder. The proposed architecture enhances results in terms of hardware complexity, regularity and modularity with a little compromise in accuracy.

VLSI Design of DWT-based Image Processor for Real-Time Image Compression and Reconstruction System (실시간 영상압축과 복원시스템을 위한 DWT기반의 영상처리 프로세서의 VLSI 설계)

  • Seo, Young-Ho;Kim, Dong-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1C
    • /
    • pp.102-110
    • /
    • 2004
  • In this paper, we propose a VLSI structure of real-time image compression and reconstruction processor using 2-D discrete wavelet transform and implement into a hardware which use minimal hardware resource using ASIC library. In the implemented hardware, Data path part consists of the DWT kernel for the wavelet transform and inverse transform, quantizer/dequantizer, the huffman encoder/huffman decoder, the adder/buffer for the inverse wavelet transform, and the interface modules for input/output. Control part consists of the programming register, the controller which decodes the instructions and generates the control signals, and the status register for indicating the internal state into the external of circuit. According to the programming condition, the designed circuit has the various selective output formats which are wavelet coefficient, quantization coefficient or index, and Huffman code in image compression mode, and Huffman decoding result, reconstructed quantization coefficient, and reconstructed wavelet coefficient in image reconstructed mode. The programming register has 16 stages and one instruction can be used for a horizontal(or vertical) filtering in a level. Since each register automatically operated in the right order, 4-level discrete wavelet transform can be executed by a programming. We synthesized the designed circuit with synthesis library of Hynix 0.35um CMOS fabrication using the synthesis tool, Synopsys and extracted the gate-level netlist. From the netlist, timing information was extracted using Vela tool. We executed the timing simulation with the extracted netlist and timing information using NC-Verilog tool. Also PNR and layout process was executed using Apollo tool. The Implemented hardware has about 50,000 gate sizes and stably operates in 80MHz clock frequency.

A VLSI Design of High Performance H.264 CAVLC Decoder Using Pipeline Stage Optimization (파이프라인 최적화를 통한 고성능 H.264 CAVLC 복호기의 VLSI 설계)

  • Lee, Byung-Yup;Ryoo, Kwang-Ki
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.12
    • /
    • pp.50-57
    • /
    • 2009
  • This paper proposes a VLSI architecture of CAVLC hardware decoder which is a tool eliminating statistical redundancy in H.264/AVC video compression. The previous CAVLC hardware decoder used four stages to decode five code symbols. The previous CAVLC hardware architectures decreased decoding performance because there was an unnecessary idle cycle in between state transitions. Likewise, the computation of valid bit length includes an unnecessary idle cycle. This paper proposes hardware architecture to eliminate the idle cycle efficiently. Two methods are applied to the architecture. One is a method which eliminates an unnecessary things of buffers storing decoded codes and then makes efficient pipeline architecture. The other one is a shifter control to simplify operations and controls in the process of calculating valid bit length. The experimental result shows that the proposed architecture needs only 89 cycle in average for one macroblock decoding. This architecture improves the performance by about 29% than previous designs. The synthesis result shows that the design achieves the maximum operating frequency at 140Mhz and the hardware cost is about 11.5K under a 0.18um CMOS process. Comparing with the previous design, it can achieve low-power operation because this design is implemented with high throughputs and low gate count.

A VLSI Architecture Design of CDMA/TDMA Modem Chipsets for Wireless Telemetry Systems (CDMA/TDMA 기반 무선 원격계측 시스템용 모뎀의 VLSI 구조 설계)

  • 이원재;이성주;이서구;정석호;김재석
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.5
    • /
    • pp.107-114
    • /
    • 2004
  • In this paper, we present the architecture design of CDMA/TDMA modem chipset for wireless telemetry system. The wireless telemetry system a measuring data collecting system from many RTs(Remote Terminal) installed at the specific area using wireless communication technology. It consists of CU single CU (Central Unit) for collecting data and a large amount of RTs for transmitting the measuring data. We propose the hardware architecture of the modem for RT and CU. We also design those modem using Verilog HDL and synthesis them using Synopsys$^{TM}$ CAD tool. The modem of RT is implemented with 27K gates and that of CU is implemented around 220k gates using 0.6${\mu}{\textrm}{m}$ CMOS standard cell. The proposed system is implemented and tested using Altera$^{TM}$ FPGA.PGA.

Design of Multiplierless Lifting-based Wavelet Transform using Pattern Search Methods (패턴 탐색 기법을 사용한 Multiplierless 리프팅 기반의 웨이블릿 변환의 설계)

  • Son, Chang-Hoon;Park, Seong-Mo;Kim, Young-Min
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.7
    • /
    • pp.943-949
    • /
    • 2010
  • This paper presents some improvements on VLSI implementation of lifting-based 9/7 wavelet transform by optimization hardware multiplication. The proposed solution requires less logic area and power consumption without performance loss compared to previous wavelet filter structure based on lifting scheme. This paper proposes a better approach to the hardware implementation using Lefevre algorithm based on extensions of Pattern search methods. To compare the proposed structure to the previous solutions on full multiplier blocks, we implemented them using Verilog HDL. For a hardware implementation of the two solutions, the logical synthesis on 0.18 um standard cells technology show that area, maximum delay and power consumption of the proposed architecture can be reduced up to 51%, 43% and 30%, respectively, compared to previous solutions for a 200 MHz target clock frequency. Our evaluation show that when design VLSI chip of lifting-based 9/7 wavelet filter, our solution is better suited for standard-cell application-specific integrated circuits than prior works on complete multiplier blocks.

VLIS Design of OCB-AES Cryptographic Processor (OCB-AES 암호 프로세서의 VLSI 설계)

  • Choi Byeong-Yoon;Lee Jong-Hyoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.8
    • /
    • pp.1741-1748
    • /
    • 2005
  • In this paper, we describe VLSI design and performance evaluation of OCB-AES crytographic algorithm that simulataneously provides privacy and authenticity. The OCB-AES crytographic algorithm sovles the problems such as long operation time and large hardware of conventional crytographic system, because the conventional system must implement the privancy and authenticity sequentially with seqarated algorithms and hardware. The OCB-AES processor with area-efficient modular offset generator and tag generator is designed using IDEC Samsung 0.35um standard cell library and consists of about 55,700 gates. Its cipher rate is about 930Mbps and the number of clock cycles needed to generate the 128-bit tags for authenticity and integrity is (m+2)${\times}$(Nr+1), where m and Nr represent the number of block for message and number of rounds for AES encryption, respectively. The OCB-AES processor can be applicable to soft cryptographic IP of IEEE 802.11i wireless LAN and Mobile SoC.

An I/O Interface Circuit Using CTR Code to Reduce Number of I/O Pins (CTR 코드를 사용한 I/O 핀 수를 감소 시킬 수 있는 인터페이스 회로)

  • Kim, Jun-Bae;Kwon, Oh-Kyong
    • Journal of the Korean Institute of Telematics and Electronics D
    • /
    • v.36D no.1
    • /
    • pp.47-56
    • /
    • 1999
  • As the density of logic gates of VLSI chips has rapidly increased, more number of I/O pins has been required. This results in bigger package size and higher packager cost. The package cost is higher than the cost of bare chips for high I/O count VLSI chips. As the density of logic gates increases, the reduction method of the number of I/O pins for a given complexity of logic gates is required. In this paper, we propose the novel I/O interface circuit using CTR (Constant-Transition-Rate) code to reduce 50% of the number of I/O pins. The rising and falling edges of the symbol pulse of CTR codes contain 2-bit digital data, respectively. Since each symbol of the proposed CTR codes contains 4-bit digital data, the symbol rate can be reduced by the factor of 2 compared with the conventional I/O interface circuit. Also, the simultaneous switching noise(SSN) can be reduced because the transition rate is constant and the transition point of the symbols is widely distributed. The channel encoder is implemented only logic circuits and the circuit of the channel decoder is designed using the over-sampling method. The proper operation of the designed I/O interface circuit was verified using. HSPICE simulation with 0.6 m CMOS SPICE parameters. The simulation results indicate that the data transmission rate of the proposed circuit using 0.6 m CMOS technology is more than 200 Mbps/pin. We implemented the proposed circuit using Altera's FPGA and confimed the operation with the data transfer rate of 22.5 Mbps/pin.

  • PDF

A Low Power Charge Sharing ROM using Dummy Bit Lines (더미 비트라인을 이용한 저전력 전하공유 롬)

  • 양병도;김이서
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.5
    • /
    • pp.99-105
    • /
    • 2004
  • A shared-capacitor charge-sharing ROM (SCCS-ROM) using dummy bit lines is proposed. The SCCS-ROM reduces the bit line swing voltage using the charge-sharing technique of the conventional charge-sharing ROM (CS-ROM). Although the CS-ROM needs three small capacitors per output bit, the proposed SCCS-ROM shares the capacitors so that it needs only three capacitors. The SCCS-ROM implements the capacitors using dummy bit lines. This not only increases noise immunity but also reduces power. A SCCS-ROM with 8K${\times}$15bits implemented in a 0.35${\mu}{\textrm}{m}$ CMOS process. The SCCS-ROM consumes 8.63㎽ at 100MHz with 3.3V The simulation results show that the SCCS-ROM reduces 8.4% power compared to the CS-ROM.