• Title/Summary/Keyword: Bit-Parallel

Search Result 406, Processing Time 0.027 seconds

ASIC Design of OpenRISC-based Multimedia SoC Platform (OpenRISC 기반 멀티미디어 SoC 플랫폼의 ASIC 설계)

  • Kim, Sun-Chul;Ryoo, Kwang-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.281-284
    • /
    • 2008
  • This paper describes ASIC design of multimedia SoC Platform. The implemented Platform consists of 32-bit OpenRISC1200 Microprocessor, WISHBONE on-chip bus, VGA Controller, Debug Interface, SRAM Interface and UART. The 32-bit OpenRISC1200 processor has 5 stage pipeline and Harvard architecture with separated instruction/data bus. The VGA Controller can display RCB data on a CRT or LCD monitor. The Debug Interface supports a debugging function for the Platform. The SRAM Interface supports 18-bit address bus and 32-bit data bus. The UART provides RS232 protocol, which supports serial communication function. The Platform is design and verified on a Xilinx VERTEX-4 XC4VLX80 FPGA board. Test code is generated by a cross compiler' and JTAG utility software and gdb are used to download the test code to the FPGA board through parallel cable. Finally, the Platform is implemented into a single ASIC chip using Chatered 0.18um process and it can operate at 100MHz clock frequency.

  • PDF

A Study on the Instruction Set Architecture of Multimedia Extension Processor (멀티미디어 확장 프로세서의 명령어 집합 구조에 관한 연구)

  • O, Myeong-Hun;Lee, Dong-Ik;Park, Seong-Mo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.6
    • /
    • pp.420-435
    • /
    • 2001
  • As multimedia technology has rapidly grown recently, many researches to process multimedia data efficiently using general-purpose processors have been studied. In this paper, we proposed multimedia instructions which can process multimedia data effectively, and suggested a processor architecture for those instructions. The processor was described with Verilog-HDL in the behavioral level and simulated with CADENCE$^{TM}$ tool. Proposed multimedia instructions are total 48 instructions which can be classified into 7 groups. Multimedia data have 64-bit format and are processed as parallel subwords of 8-bit 8 bytes, 16-bit 4 half words or 32-bit 2 words. Modeled processor is developed based on the Integer Unit of SPARC V.9. It has five-stage pipeline RISC architecture with Harvard principle.e.

  • PDF

A New Complex-Number Multiplication Algorithm using Radix-4 Booth Recoding and RB Arithmetic, and a 10-bit CMAC Core Design (Radix-4 Booth Recoding과 RB 연산을 이용한 새로운 복소수 승산 알고리듬 및 10-bit CMAC코어 설계)

  • 김호하;신경욱
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.35C no.9
    • /
    • pp.11-20
    • /
    • 1998
  • High-speed complex-number arithmetic units are essential to baseband signal processing of modern digital communication systems such as channel equalization, timing recovery, modulation and demodulation. In this paper, a new complex-number multiplication algorithm is proposed, which is based on redundant binary (RB) arithmetic combined with radix-4 Booth recoding scheme. The proposed algorithm reduces the number of partial product by one-half as compared with the conventional direct method using real-number multipliers and adders. It also leads to a highly parallel architecture and simplified circuit, resulting in high-speed operation and low power dissipation. To demonstrate the proposed algorithm, a prototype complex-number multiplier-accumulator (CMAC) core with 10-bit operands has been designed using 0.8-$\mu\textrm{m}$ N-Well CMOS technology. The designed CMAC core contains about 18,000 transistors on the area of about 1.60 ${\times}$ 1.93 $\textrm{mm}^2$. The functional and speed test results show that it can operate with 120-MHz clock at V$\sub$DD/=3.3-V, and its power consumption is given to about 63-mW.

  • PDF

On a High-speed Implementation of LILI-II Stream Cipher (LILI-II 스트림 암호의 고속화 구현에 관한 연구)

  • 이훈재;문상재
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.8C
    • /
    • pp.1210-1217
    • /
    • 2004
  • LILI-II stream cipher is an upgraded version of the LILI-128, one of candidates in NESSIE. Since the algorithm is a clock-controlled, the speed of the keystream data is degraded structurally in a clock-synchronized hardware logic design. Accordingly, this paper proposes a 4-bit parallel LFSR, where each register bit includes four variable data routines for feedback or shifting within the LFSR. furthermore, the timing of the proposed design is simulated using a Max+plus II from the ALTERA Co., the logic circuit is implemented for an FPGA device (EPF10K20RC240-3), and apply to the Lucent ASIC device (LV160C, 0.13${\mu}{\textrm}{m}$ CMOS & 1.5v technology), and it could achieve a throughput of about 500 Mbps with a 0.13${\mu}{\textrm}{m}$ semiconductor for the maximum path delay below 1.8㎱. Finally, we propose the m-parallel implementation of LILI-II, throughput with 4, 8 or 16 Gbps (m=8, 16 or 32).

An Implementation of Efficient Quicksort Utilizing SIMD-Based VBP Technique (SIMD 기반의 VBP 기법을 적용한 효율적인 퀵정렬의 구현)

  • Hong, Gilseok;Kim, Hongyeon;Kang, Seonghyeon;Min, Jun-Ki
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.8
    • /
    • pp.498-503
    • /
    • 2017
  • SIMD (Single Instruction Multiple Data) is a representative parallelization architecture that processes multiple data loaded in a SIMD register with a single instruction. Quicksort is a sorting algorithm that picks an element as a pivot from the array and reorders the array such that all elements having the values less than the pivot value are located in the left side on the pivot as well as all elements having the value greater than the pivot value are located in the right side on the pivot and then the algorithm performs the same task on both sublist recursively. In this paper, we propose an efficient Quicksort algorithm applying the SIMD instructions which minimally invokes conditional branches to avoid the performance degradation incurred by branch misprediction in a pipeline architecture. In addition, we improve the performance of the Quicksort algorithm by fetching data into a SIMD register as a byte unit to apply VBP (Vertical Bit Parallel) and the early pruning technique.

Performance Analysis of Optical CDMA System with Cross-Layer Concept (계층간 교차 개념을 적용한 광 부호분할 다중접속 시스템의 성능 분석)

  • Kim, Jin-Young;Kim, Eun-Cheol
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.46 no.7
    • /
    • pp.13-23
    • /
    • 2009
  • In this paper, the network performance of a turbo coded optical code division multiple access (CDMA) system with cross-layer, which is between physical and network layers, concept is analyzed and simulated. We consider physical and MAC layers in a cross-layer concept. An intensity-modulated/direct-detection (IM/DD) optical system employing pulse position modulation (PPM) is considered. In order to increase the system performance, turbo codes composed of parallel concatenated convolutional codes (PCCCs) is utilized. The network performance is evaluated in terms of bit error probability (BEP). From the simulation results, it is demonstrated that turbo coding offers considerable coding gain with reasonable encoding and decoding complexity. Also, it is confirmed that the performance of such an optical CDMA network can be substantially improved by increasing e interleaver length and e number of iterations in e decoding process. The results of this paper can be applied to implement the indoor optical wireless LANs.

History-Aware RED for Relieving the Bandwidth Monopoly of a Station Employing Multiple Parallel TCP flows (다수의 병렬 TCP Flow를 가진 스테이션에 의한 대역폭 독점을 감소시키는 History-Aware RED)

  • Jun, Kyung-Koo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.11B
    • /
    • pp.1254-1260
    • /
    • 2009
  • This paper proposes history-aware random early detection (HRED), a modified version of RED, to lessen bandwidth monopoly by a few of stations employing multiple parallel TCP flows. Stations running peer-to-peer file sharing applications such as BitTorrent use multiple TCP flows. If those stations share a link with other stations with only a small number of TCP flows, the stations occupy most of link bandwidth leading to undesirable bandwidth monopoly. HRED like RED determines whether to drop incoming packets according to probability which changes based on queue length. However it adjusts the drop probability based on bandwidth occupying ratio of stations, thus able to impose harder drop penalty on monopoly stations. The results of simulations assuming various scenarios show that HRED is at least 60% more effective than RED in supporting the bandwidth fairness among stations and at least 4% in utilization.

Code Rate 1/2, 2304-b LDPC Decoder for IEEE 802.16e WiMAX (IEEE 802.16e WiMAX용 부호율 1/2, 2304-비트 LDPC 복호기)

  • Kim, Hae-Ju;Shin, Kyung-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.4A
    • /
    • pp.414-422
    • /
    • 2011
  • This paper describes a design of low-density parity-check(LDPC) decoder supporting block length 2,304-bit and code rate 1/2 of IEEE 802.16e mobile WiMAX standard. The designed LDPC decoder employs the min-sum algorithm and partially parallel layered-decoding architecture which processes a sub-matrix of $96{\times}96$ in parallel. By exploiting the properties of the min-sum algorithm, a new memory reduction technique is proposed, which reduces check node memory by 46% compared to conventional method. Functional verification results show that it has average bit-error-rate(BER) of $4.34{\times}10^{-5}$ for AWGN channel with Fb/No=2.1dB. Our LDPC decoder synthesized with a $0.18{\mu}m$ CMOS cell library has 174,181 gates and 52,992 bits memory, and the estimated throughput is about 417 Mbps at 100-MHz@l.8-V.

Bit-Parallel Systolic Divider in Finite Field GF(2m) (유한 필드 GF(2m)상의 비트-패러럴 시스톨릭 나눗셈기)

  • 김창훈;김종진;안병규;홍춘표
    • The KIPS Transactions:PartA
    • /
    • v.11A no.2
    • /
    • pp.109-114
    • /
    • 2004
  • This paper presents a high-speed bit-parallel systolic divider for computing modular division A($\chi$)/B($\chi$) mod G($\chi$) in finite fields GF$(2^m)$. The presented divider is based on the binary GCD algorithm and verified through FPGA implementation. The proposed architecture produces division results at a rate of one every 1 clock cycles after an initial delay of 5m-2. Analysis shows that the proposed divider provides a significant reduction in both chip area and computational delay time compared to previously proposed systolic dividers with the same I/O format. In addition, since the proposed architecture does not restrict the choice of irreducible polynomials and has regularity and modularity, it provides a high flexibility and Scalability with respect to the field size m. Therefore, the proposed divider is well suited to VLSI implementation.

Systolic Arrays for Lattice-Reduction-Aided MIMO Detection

  • Wang, Ni-Chun;Biglieri, Ezio;Yao, Kung
    • Journal of Communications and Networks
    • /
    • v.13 no.5
    • /
    • pp.481-493
    • /
    • 2011
  • Multiple-input multiple-output (MIMO) technology provides high data rate and enhanced quality of service for wireless communications. Since the benefits from MIMO result in a heavy computational load in detectors, the design of low-complexity suboptimum receivers is currently an active area of research. Lattice-reduction-aided detection (LRAD) has been shown to be an effective low-complexity method with near-maximum-likelihood performance. In this paper, we advocate the use of systolic array architectures for MIMO receivers, and in particular we exhibit one of them based on LRAD. The "Lenstra-Lenstra-Lov$\acute{a}$sz (LLL) lattice reduction algorithm" and the ensuing linear detections or successive spatial-interference cancellations can be located in the same array, which is considerably hardware-efficient. Since the conventional form of the LLL algorithm is not immediately suitable for parallel processing, two modified LLL algorithms are considered here for the systolic array. LLL algorithm with full-size reduction-LLL is one of the versions more suitable for parallel processing. Another variant is the all-swap lattice-reduction (ASLR) algorithm for complex-valued lattices, which processes all lattice basis vectors simultaneously within one iteration. Our novel systolic array can operate both algorithms with different external logic controls. In order to simplify the systolic array design, we replace the Lov$\acute{a}$sz condition in the definition of LLL-reduced lattice with the looser Siegel condition. Simulation results show that for LR-aided linear detections, the bit-error-rate performance is still maintained with this relaxation. Comparisons between the two algorithms in terms of bit-error-rate performance, and average field-programmable gate array processing time in the systolic array are made, which shows that ASLR is a better choice for a systolic architecture, especially for systems with a large number of antennas.