• Title/Summary/Keyword: Bit-Parallel

Search Result 406, Processing Time 0.025 seconds

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.305-317
    • /
    • 2008
  • As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.

A Study on the 32 bit RISC/DSP Microprocessor Appropriate for Embedded Systems (내장형 시스템에 적합한 32 비트 RISC/DSP 마이크로프로세서에 관한 연구)

  • 유동열;문병인;홍종욱;이태영;이용석
    • Proceedings of the IEEK Conference
    • /
    • 1999.06a
    • /
    • pp.257-260
    • /
    • 1999
  • We have designed a 32-bit RISC microprocessor with 16/32-bit fixed-point DSP functionality. This processor, called YRD-5, combines both general-purpose microprocessor and digital signal processor (DSP) functionality using the reduced instruction set computer (RISC) design principles. It has functional units for arithmetic operation, digital signal processing (DSP) and memory access. They operate in parallel in order to remove stall cycles after DSP and load/store instructions with one or more issue latency cycles. High performance was achieved with these parallel functional units while adopting a sophisticated 5-stage pipeline structure and an improved DSP unit.

  • PDF

A high speed huffman decoder using new ternary CAM (새로운 Ternary CAM을 이용한 고속 허프만 디코더 설계)

  • 이광진;김상훈;이주석;박노경;차균현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.7
    • /
    • pp.1716-1725
    • /
    • 1996
  • In this paper, the huffman decoder which is a part of the decoder in JPEG standard format is designed by using a new Ternary CAM. First, the 256 word * 16 bit-size new bit-word all parallel Ternary CAM system is designed and verified using SPICE and CADENCE Verilog-XL, and then the verified novel Ternary CAM is applied to the new huffman decoder architecture of JPEG. So the performnce of the designed CAM cell and it's block is verified. The new Ternary CAM has various applications because it has search data mask and storing data mask function, which enable bit-wise search and don't care state storing. When the CAM is used for huffman look-up table in huffman decoder, the CAM is partitioned according to the decoding symbol frequency. The scheme of partitioning CAM for huffman table overcomes the drawbacks of all-parallel CAM with much power and load. So operation speed and power consumption are improved.

  • PDF

On-Chip Bus Serialization Method for Low-Power Communications

  • Lee, Jae-Sung
    • ETRI Journal
    • /
    • v.32 no.4
    • /
    • pp.540-547
    • /
    • 2010
  • One of the critical issues in on-chip serial communications is increased power consumption. In general, serial communications tend to dissipate more energy than parallel communications due to bit multiplexing. This paper proposes a low-power bus serialization method. This encodes bus signals prior to serialization so that they are converted into signals that do not greatly increase in transition frequency when serialized. It significantly reduces the frequency by making the best use of word-to-word and bit-by-bit correlations presented in original parallel signals. The method is applied to the revision of an MPEG-4 processor, and the simulation results show that the proposed method surpasses the existing one. In addition, it is cost-effective when implemented as a hardware circuit since its algorithm is very simple.

Parallel BCH Encoding/decoding Method and VLSI Design for Nonvolatile Memory (비휘발성 메모리를 위한 병렬 BCH 인코딩/디코딩 방법 및 VLSI 설계)

  • Lee, Sang-Hyuk;Baek, Kwang-Hyun
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.5
    • /
    • pp.41-47
    • /
    • 2010
  • This paper has proposed parallel BCH, one of error correction coding methods which has been used to NAND flash memory for SSD(solid state disk). To alter error correction capability, the proposed design improved reliability on data block has higher error rate as used frequency increasingly. Decoding parallel process bit width is as two times as encoding parallel process bit width, that could reduce decoding processing time, accordingly resulting in one half reduction over conventional ECC.

Proposal for Decoding-Compatible Parallel Deflate Algorithm by Inserting Control Header Composed of Non-Compressed Blocks (비 압축 블록으로 구성된 제어 헤더 삽입을 통한 압축 해제 호환성 있는 병렬 처리 Deflate 알고리즘 제안)

  • Kim Jung Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.5
    • /
    • pp.207-216
    • /
    • 2023
  • For decoding-compatible parallel Deflate algorithm, this study proposed a new method of the control header being made in such a way that essential information for parallel compression and decompression are stored in the Disposed Bit Area (DBA) of the non-compression block and being inserted into the compressed blocks. Through this, parallel compression and decompression are possible while maintaining perfect compatibility with the existing decoder. After applying this method, the compression time was reduced by up to 71.2% compared to the sequential processing method, and the parallel decompression time was reduced by up to 65.7%. In particular, it is well known that parallel decompression is impossible due to the structural limitations of the Deflate algorithm. However, the decoder equipped with the proposed method enables high-speed parallel decompression at the algorithm level and maintains compatibility, so that parallelly compressed data can be decoded normally by existing decoder programs.

Implementation of Pixel Subword Parallel Processing Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 픽셀 서브워드 병렬처리 명령어 구현)

  • Jung, Yong-Bum;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.3
    • /
    • pp.99-108
    • /
    • 2011
  • Processor technology is currently continued to parallel processing techniques, not by only increasing clock frequency of a single processor due to the high technology cost and power consumption. In this paper, a SIMD (Single Instruction Multiple Data) based parallel processor is introduced that efficiently processes massive data inherent in multimedia. In addition, this paper proposes pixel subword parallel processing instructions for the SIMD parallel processor architecture that efficiently operate on the image and video pixels. The proposed pixel subword parallel processing instructions store and process four 8-bit pixels on the partitioned four 12-bit registers in a 48-bit datapath architecture. This solves the overflow problem inherent in existing multimedia extensions and reduces the use of many packing/unpacking instructions. Experimental results using the same SIMD-based parallel processor architecture indicate that the proposed pixel subword parallel processing instructions achieve a speedup of $2.3{\times}$ over the baseline SIMD array performance. This is in contrast to MMX-type instructions (a representative Intel multimedia extension), which achieve a speedup of only $1.4{\times}$ over the same baseline SIMD array performance. In addition, the proposed instructions achieve $2.5{\times}$ better energy efficiency than the baseline program, while MMX-type instructions achieve only $1.8{\times}$ better energy efficiency than the baseline program.

An Efficient Built-in Self-Test Algorithm for Neighborhood Pattern- and Bit-Line-Sensitive Faults in High-Density Memories

  • Kang, Dong-Chual;Park, Sung-Min;Cho, Sang-Bock
    • ETRI Journal
    • /
    • v.26 no.6
    • /
    • pp.520-534
    • /
    • 2004
  • As the density of memories increases, unwanted interference between cells and the coupling noise between bit-lines become significant, requiring parallel testing. Testing high-density memories for a high degree of fault coverage requires either a relatively large number of test vectors or a significant amount of additional test circuitry. This paper proposes a new tiling method and an efficient built-in self-test (BIST) algorithm for neighborhood pattern-sensitive faults (NPSFs) and new neighborhood bit-line sensitive faults (NBLSFs). Instead of the conventional five-cell and nine-cell physical neighborhood layouts to test memory cells, a four-cell layout is utilized. This four-cell layout needs smaller test vectors, provides easier hardware implementation, and is more appropriate for both NPSFs and NBLSFs detection. A CMOS column decoder and the parallel comparator proposed by P. Mazumder are modified to implement the test procedure. Consequently, these reduce the number of transistors used for a BIST circuit. Also, we present algorithm properties such as the capability to detect stuck-at faults, transition faults, conventional pattern-sensitive faults, and neighborhood bit-line sensitive faults.

  • PDF

Design of an 8-bit Color Adjustor for SDTV Using Verilog HDL (Verilog HDL을 이용한 SDTV용 8bit 색상 보정기의 설계)

  • Jeon, Byoung-Woong;Song, In-Chae
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.801-804
    • /
    • 2005
  • In this paper, we designed an 8-bit color adjustor for SDTV using Verilog HDL. The conversion block requires a lot of multiplication. So we adopted Booth algorithm to reduce amount of operation and processing time. To improve speed, we designed the system output as parallel structure. We synthesized the designed system using Xilinx ISE and verified the operation through simulation using Modelsim.

  • PDF

VLSI Design of 3-Bit Soft Decision Viterbi Decoder (3-Bit Soft Decision Viterbi 복호기의 VLSI 설계)

  • 김기명;송인채
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.863-866
    • /
    • 1999
  • In this paper, we designed a Viterbi decoder with constraint length K=7, code rate R=1/2, encoder generator polynomial (171, 133)$_{8}$. This decoder makes use of 3-bit soft decision. We designed the Viterbi decoder using VHDL. We employed conventional logic circuit instead of ROM for branch metric units(BMUs) to reduce the number of gates. We adopted fully parallel structures for add-compare-select units(ACSUs). The size of the designed decoder is about 200, 000 gates.s.

  • PDF