• Title/Summary/Keyword: 연산지연

Search Result 451, Processing Time 0.022 seconds

A Low Power FPGA Architecture using Three-dimensional Structure (3차원 구조를 이용한 저전력 FPGA 구조)

  • Kim, Pan-Ki;Lee, Hyoung-Pyo;Kim, Hyun-Pil;Jun, Ho-Yoon;Lee, Yong-Surk
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.12
    • /
    • pp.656-664
    • /
    • 2007
  • Field-Programmable Gate Arrays (FPGAs) are a revolutionary new type of user-programmable integrated circuits that provide fast, inexpensive access to customized VLSI. However, as the target application speed increases, power-consumption and wire-delay on interconnection become more critical factors during programming an FPGA. Especially, the interconnection of the FPGA consumes 65% of the total FPGA power consumption. A previous research show that if the length of interconnection is shirked, power-consumption can be reduced because an interconnection has a lot of effect on power-consumption. For solving this problem that reducing the number of wires routed, the three dimension FPGA is proposed. However, this structure physical wires and an area of switches is increased by making topology complex. This paper propose a novel FPGA architecture that modifies the three dimension FPGA and compare the number of interconnection of Virtex II and 3D FPGA with the proposed FPGA architecture using the FPGA Editor of Xilinx ISE and a global routing and length estimation program.

Performance Comparison of Fast Distributed Video Decoding Methods Using Correlation between LDPCA Frames (LDPCA 프레임간 상관성을 이용한 고속 분산 비디오 복호화 기법의 성능 비교)

  • Kim, Man-Jae;Kim, Jin-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.4
    • /
    • pp.31-39
    • /
    • 2012
  • DVC(Distributed Video Coding) techniques have been attracting a lot of research works since these enable us to implement the light-weight video encoder and to provide good coding efficiency by introducing the feedback channel. However, the feedback channel causes the decoder to increase the decoding complexity and requires very high decoding latency because of numerous iterative decoding processes. So, in order to reduce the decoding delay and then to implement in a real-time environment, this paper proposes several parity bit estimation methods which are based on the temporal correlation, spatial correlation and spatio-temporal correlations between LDPCA frames on each bit plane in the consecutive video frames in pixel-domain Wyner-Ziv video coding scheme and then the performances of these methods are compared in fast DVC scheme. Through computer simulations, it is shown that the adaptive spatio-temporal correlation-based estimation method and the temporal correlation-based estimation method outperform others for the video frames with the highly active contents and the low active contents, respectively. By using these results, the proposed estimation schemes will be able to be effectively used in a variety of different applications.

Modeling and Simulation of the Efficient Certificate Status Validation System on Public Key Infrastructure (공개키 기반 구조에서의 효율적인 인증서 상태 검증 방법의 모델링 및 시뮬레이션)

  • Seo, Hee-Suk;Kim, Tae-Kyoung;Kim, Hee-Wan
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.5
    • /
    • pp.721-728
    • /
    • 2004
  • OCSP (Online Certificate Status Protocol) server which checks the certificate status provides the real time status verification in the PKI (Public Key Infrastructure) system which is the essential system of certificate. However, OCSP server need the message authentication with the server and client, so it has some shortcomings that has slow response time for the demands of many clients concurrently and has complexity of the mathematical process in the public encryption system. In this research, simulation model of the certificate status vertification server is constructed of the DEVS (Discrete EVent system Specification) formalism. This sever model is constructed to practice the authentication with hash function when certificate is checked. Simulation results shows the results of increase of the certificate status verification speed and decrease of the response time to the client.

  • PDF

Analysis on the GPU Performance according to Hierarchical Memory Organization (계층적 메모리 구성에 따른 GPU 성능 분석)

  • Choi, Hongjun;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.3
    • /
    • pp.22-32
    • /
    • 2014
  • Recently, GPGPU has been widely used for general-purpose processing as well as graphics processing by providing optimized hardware for parallel processing. Memory system has big effects on the performance of parallel processing units such as GPU. In the GPU, hierarchical memory architecture is implemented for high memory bandwidth. Moreover, both memory address coalescing and memory request merging techniques are widely used. This paper analyzes the GPU performance according to various memory organizations. According to our simulation results, GPU performance improves by 15.5%, 21.5%, 25.5%, 30.9% as adding 8KB L1, 16KB L1, 32KB L1, 64KB L1 cache, respectively, compared to case without L1 cache. However, experimental results show that some benchmarks decrease performance since memory transaction increases due to data dependency. Moreover, average memory access latency is increased as the depth of hierarchical cache level increases when cache miss occurs significantly.

An Efficient Data-reuse Deblocking Filter Algorithm for H.264/AVC (H.264/AVC 비디오 코덱을 위한 효율적인 자료 재사용 디블록킹 필터 알고리즘)

  • Lee, Hyoung-Pyo;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.44 no.6
    • /
    • pp.30-35
    • /
    • 2007
  • H.264/AVC provides better quality than other algorithms by using a deblocking filter to remove blocking distortion on block boundary of the decoded picture. However, this filtering process includes lots of memory accesses, which cause delay of overall decoding time. In this paper, we propose a data-reuse algorithm to speed up the process for the deblocking filter. To reuse the data, a new filtering order is suggested. By using this order, we reduce the memory access and accelerate the deblocking filter. The modeling of proposed algorithm is compiled under ARM ADS1.2 and simulated with Armulator. The results of the experiment compared with H.264/AVC standard are achieved on average 58.45% and 57.93% performance improvements at execution cycles and memory access cycles, respectively.

Design of Redundant Binary Adder based on Memristor-CMOS (멤리스터-CMOS 기반의 잉여 이진 가산기 설계)

  • Ahn, Yeongyu;Lee, Sang-Jin;Kim, Seokman;Eshraghian, Kamran;Cho, Kyoungrok
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.9
    • /
    • pp.67-74
    • /
    • 2014
  • This paper presents a memristor-CMOS based RBSD adder. Conventional RBSD adders suffer bigger hardware due to the extra logic handling larger number of bits. The purpose of this paper is to improve the silicon surface area and the computation delay of conventional RBSD adders. The proposed method employs memristor-CMOS based circuit. The implementation results shows that the proposed memristor-CMOS based RBSD adder saves the cell area by 45%, and reduces time delay 24% compared to conventional RBSD adders. The proposed RBSD adder design can bring further area saving for large scale designs.

Estimation Techniques for Sampling Frequency Offset in OFDM Systems (OFDM 시스템의 샘플링 주파수 옵셋 추정기법)

  • 전원기;조용수
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.24 no.9B
    • /
    • pp.1795-1805
    • /
    • 1999
  • In an OFDM (Orthogonal Frequency-Division Multiplexing) system, the sampling frequency offset between the transmitter and receiver is known to cause the interchannel interference (ICI), resulting in performance degradation. In this paper, we propose two time-domain techniques to estimate the sampling frequency offset, especially for a high data-rate OFDM system. The first technique estimates the sampling frequency offset by using the phase difference between two received samples with a fixed amount of time interval, corresponding to the transmitted training symbol, under the assumption of perfect symbol and carrier offset synchronization. The second technique estimates the sampling frequency offset and carrier frequency offset jointly, when the two offsets exist together, by using two training symbols with different frequency components and using a sample algebraic calculation. The proposed estimation techniques for sampling frequency offset cause no time delay due to all time-domain processing, and have a good performance due to no ICI effect. The performances of the proposed techniques are demonstrated by various simulations.

  • PDF

A Study on the Rake Finger System Design for the System Performance Improvement in the Mobile Communications (시스템 효율향상을 위한 이동통신망 Rake Finger 시스템 설계에 관한 연구)

  • Lee Seon-Keun;Lim Soon-Ja
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1A
    • /
    • pp.31-36
    • /
    • 2004
  • In this paper, we proposed the new structure of the Rake Finger using Walsh Switch, the shared accumulator, and the pipeline-FWHT algorithm for reducing the signal processing complexity resulting from the increase of the number of data correlator. The function simulation of the proposed architecture is performed by Synopsys tool and the timing simulation is performed by Compass tool. The number of computational operation in the proposed data correlators is 160 additions and the conventional ones is 512 additions when the number of walsh code N=4. As a result, it is reduced about 3.2 times other than the number of computational operation of the conventional ones. Also, the result shows that the data processing time of the proposed Rake Finger architecture is 90,496[ns] and the conventional ones is 110,696[ns]. It is $18.3\%$ faster than the data processing time of the conventional Rake Finger architecture.

VLSI Design of a 2048 Point FFT/IFFT by Sequential Data Processing for Digital Audio Broadcasting System (순차적 데이터 처리방식을 이용한 디지틀 오디오 방송용 2048 Point FFT/IFFT의 VLSI 설계)

  • Choe, Jun-Rim
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.5
    • /
    • pp.65-73
    • /
    • 2002
  • In this paper, we propose and verify an implementation method for a single-chip 2048 complex point FFT/IFFT in terms of sequential data processing. For the sequential processing of 2048 complex data, buffers to store the input data are necessary. Therefore, DRAM-like pipelined commutator architecture is used as a buffer. The proposed structure brings about the 60% chip size reduction compared with conventional approach by using this design method. The 16-point FFT is a basic building block of the entire FFT chip, and the 2048-point FFT consists of the cascaded blocks with five stages of radix-4 and one stage of radix-2. Since each stage requires rounding of the resulting bits while maintaining the proper S/N ratio, the convergent block floating point (CBFP) algorithm is used for the effective internal bit rounding and their method contributed to a single chip design of digital audio broadcasting system.

A Design and Implementation of the Division/square-Root for a Redundant Floating Point Binary Number using High-Speed Quotient Selector (고속 지수 선택기를 이용한 여분 부동 소수점 이진수의 제산/스퀘어-루트 설계 및 구현)

  • 김종섭;조상복
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.37 no.5
    • /
    • pp.7-16
    • /
    • 2000
  • This paper described a design and implementation of the division/square-root for a redundant floating point binary number using high-speed quotient selector. This division/square-root used the method of a redundant binary addition with 25MHz clock speed. The addition of two numbers can be performed in a constant time independent of the word length since carry propagation can be eliminated. We have developed a 16-bit VLSI circuit for division and square-root operations used extensively in each iterative step. It performed the division and square-toot by a redundant binary addition to the shifted binary number every 16 cycles. Also the circuit uses the nonrestoring method to obtain a quotient. The quotient selection logic used a leading three digits of partial remainders in order to be implemented in a simple circuit. As a result, the performance of the proposed scheme is further enhanced in the speed of operation process by applying new quotient selection addition logic which can be parallelly process the quotient decision field. It showed the speed-up of 13% faster than previously presented schemes used the same algorithms.

  • PDF