• Title/Summary/Keyword: Memory Buffer

Search Result 369, Processing Time 0.029 seconds

Power-Minimizing DVFS Algorithm for a Video Decoder with Buffer Constraints (영상 디코더의 제한된 버퍼를 고려한 전력 최소화 DVFS 방식)

  • Jeong, Seung-Ho;Ahn, Hee-June
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.9B
    • /
    • pp.1082-1091
    • /
    • 2011
  • Power-reduction techniques based on DVFS(Dynamic Voltage and Frequency Scaling) are crucial for lengthening operating times of battery powered mobile systems. This paper proposes an optimal DVFS scheduling algorithm for decoders with memory size limitation on display buffer, which is realistic constraints not properly touched in the previous works. Furthermore, we mathematically prove that the proposed algorithm is optimal in the limited display buffer and limited clock frequency model, and also can be used for feasibility check. Simulation results show the proposed algorithm outperformed the previous heuristic algorithms by 7% in average, and the performance of all algorithms using display buffers saturates at about 10 frame size.

Resolving Cycle Extension Overhead Multimedia Data Retrieval

  • Won, Youjip;Cho, Kyungsun
    • Transactions on Control, Automation and Systems Engineering
    • /
    • v.4 no.2
    • /
    • pp.164-168
    • /
    • 2002
  • In this article, we present the novel approach of avoiding temporal insufficiency of data blocks, jitter, which occurs due to the commencement of new session. We propose to make the sufficient amount of data blocks available on memory such that the ongoing session can survive the cycle extension. This technique is called ″pre-buffering″. We examine two different approaches in pre-buffering: (i) loads all required data blocks prior to starting playback and (ii) incrementally accumulates the data blocks in each cycle. We develop an elaborate model to determine the appropriate amount of data blocks necessary to survive the cycle extension and to compute startup latency involved in loading these data blocks. The simulation result shows that limiting the disk bandwidth utilization to 60% can greatly improve the startup latency as well as the buffer requirement for individual streams.

Dual Cache Architecture for Low Cost and High Performance

  • Lee, Jung-Hoon;Park, Gi-Ho;Kim, Shin-Dug
    • ETRI Journal
    • /
    • v.25 no.5
    • /
    • pp.275-287
    • /
    • 2003
  • We present a high performance cache structure with a hardware prefetching mechanism that enhances exploitation of spatial and temporal locality. Temporal locality is exploited by selectively moving small blocks into the direct-mapped cache after monitoring their activity in the spatial buffer. Spatial locality is enhanced by intelligently prefetching a neighboring block when a spatial buffer hit occurs. We show that the prefetch operation is highly accurate: over 90% of all prefetches generated are for blocks that are subsequently accessed. Our results show that the system enables the cache size to be reduced by a factor of four to eight relative to a conventional direct-mapped cache while maintaining similar performance.

  • PDF

A diffusion approximation for time-dependent queue size distribution for M/G/m/N system

  • Park, Bong-Dae;Shin, Yang-Woo
    • Journal of the Korean Mathematical Society
    • /
    • v.32 no.2
    • /
    • pp.211-236
    • /
    • 1995
  • The purpose of this paper is to provide a transient diffusion approximation of queue size distribution for M/G/m/N system. The M/G/m/N system can be expressed as follows. The interarrival times of customers are exponential and the service times of customers have general distribution. The system can hold at most a total of N customers (including the customers in service) and any further arriving customers will be refused entry to the system and will depart immediately without service. The queueing system with finite capacity is more practical model than queueing system with infinite capacity. For example, in the design of a computer system one of the important problems is how much capacity is required for a buffer memory. It its capacity is too little, then overflow of customers (jobs) occurs frequently in heavy traffic and the performance of system deteriorates rapidly. On the other hand, if its capacity is too large, then most buffer memories remain unused.

  • PDF

Analysis and solution to the phase concentration and DC-like component of correlation result in Daejeon correlator (대전 상관기의 상관 결과에 나타난 유사 DC 성분과 위상 집중 현상에 대한 원인 분석과 해결 방법)

  • Roh, Duk-Gyoo;Oh, Se-Jin;Yeom, Jae-Hwan;Oh, Chung-Sik;Jung, Jin-Seung;Chung, Dong-Kyu;Yun, Young-Joo;Oyama, Tomoaki;Ozeki, Kensuke;Onuki, Hirofumi
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.14 no.3
    • /
    • pp.191-204
    • /
    • 2013
  • In this paper, we investigated the correlation outputs of Daejeon correlator at the viewpoints of the buffer memory setting related to the fine delay tracking and the under/overflow issue in FFT modules, in order to eliminate DC-like component and phase concentration to 0 degree. As the ring buffer memory is being used for the fine delay tracking, the DC-like component in correlation outputs is generated by improper setting of data read/write address, and then that address setting method is modified to exclude a polluted FFT segment in correlation processing when crossing the port/stream boundary. The phase concentration to 0 degree at beginning of bandpass is caused by inadequate scaling factors, which may be the origins of under/overflow occurred at internal computation of FFT stage. With the revised method of the ring buffer memory setting and the scaling factors in FFT, we could obtain higher signal-to-noise ratio and flux density, compared to the previous method, through the correlation processing of true observational data.

Parallel Range Query processing on R-tree with Graphics Processing Units (GPU를 이용한 R-tree에서의 범위 질의의 병렬 처리)

  • Yu, Bo-Seon;Kim, Hyun-Duk;Choi, Won-Ik;Kwon, Dong-Seop
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.5
    • /
    • pp.669-680
    • /
    • 2011
  • R-trees are widely used in various areas such as geographical information systems, CAD systems and spatial databases in order to efficiently index multi-dimensional data. As data sets used in these areas grow in size and complexity, however, range query operations on R-tree are needed to be further faster to meet the area-specific constraints. To address this problem, there have been various research efforts to develop strategies for acceleration query processing on R-tree by using the buffer mechanism or parallelizing the query processing on R-tree through multiple disks and processors. As a part of the strategies, approaches which parallelize query processing on R-tree through Graphics Processor Units(GPUs) have been explored. The use of GPUs may guarantee improved performances resulting from faster calculations and reduced disk accesses but may cause additional overhead costs caused by high memory access latencies and low data exchange rate between GPUs and the CPU. In this paper, to address the overhead problems and to adapt GPUs efficiently, we propose a novel approach which uses a GPU as a buffer to parallelize query processing on R-tree. The use of buffer algorithm can give improved performance by reducing the number of disk access and maximizing coalesced memory access resulting in minimizing GPU memory access latencies. Through the extensive performance studies, we observed that the proposed approach achieved up to 5 times higher query performance than the original CPU-based R-trees.

Multiple ASR for efficient defense against brute force attacks (무차별 공격에 효과적인 다중 Address Space Randomization 방어 기법)

  • Park, Soo-Hyun;Kim, Sun-Il
    • The KIPS Transactions:PartC
    • /
    • v.18C no.2
    • /
    • pp.89-96
    • /
    • 2011
  • ASR is an excellent program security technique that protects various data memory areas without run-time overhead. ASR hides the addresses of variables from attackers by reordering variables within a data memory area; however, it can be broken by brute force attacks because of a limited data memory space. In this paper, we propose Multiple ASR to overcome the limitation of previous ASR approaches. Multiple ASR separates a data memory area into original and duplicated areas, and compares variables in each memory area to detect an attack. In original and duplicated data memory areas variables are arranged in the opposite order. This makes it impossible to overwrite the same variables in the different data areas in a single attack. Although programs with Multiple ASR show a relatively high run-time overhead due to duplicated execution, programs with many I/O operations such as web servers, a favorite attack target, show 40~50% overhead. In this paper we develop and test a tool that transforms a program into one with Multiple ASR applied.

A Materials Approach to Resistive Switching Memory Oxides

  • Hasan, M.;Dong, R.;Lee, D.S.;Seong, D.J.;Choi, H.J.;Pyun, M.B.;Hwang, H.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.8 no.1
    • /
    • pp.66-79
    • /
    • 2008
  • Several oxides have recently been reported to have resistance-switching characteristics for nonvolatile memory (NVM) applications. Both binary and ternary oxides demonstrated great potential as resistive-switching memory elements. However, the switching mechanisms have not yet been clearly understood, and the uniformity and reproducibility of devices have not been sufficient for gigabit-NVM applications. The primary requirements for oxides in memory applications are scalability, fast switching speed, good memory retention, a reasonable resistive window, and constant working voltage. In this paper, we discuss several materials that are resistive-switching elements and also focus on their switching mechanisms. We evaluated non-stoichiometric polycrystalline oxides ($Nb_2O_5$, and $ZrO_x$) and subsequently the resistive switching of $Cu_xO$ and heavily Cu-doped $MoO_x$ film for their compatibility with modem transistor-process cycles. Single-crystalline Nb-doped $SrTiO_3$ (NbSTO) was also investigated, and we found a Pt/single-crystal NbSTO Schottky junction had excellent memory characteristics. Epitaxial NbSTO film was grown on an Si substrate using conducting TiN as a buffer layer to introduce single-crystal NbSTO into the CMOS process and preserve its excellent electrical characteristics.

Memory data layout and DMA transfer technique research For efficient data transfer of CNN accelerator (CNN 가속기의 효율적인 데이터 전송을 위한 메모리 데이터 레이아웃 및 DMA 전송기법 연구)

  • Cho, Seok-Jae;Park, Sungkyung;Park, Chester Sungchung
    • Journal of IKEEE
    • /
    • v.24 no.2
    • /
    • pp.559-569
    • /
    • 2020
  • One of the deep-running algorithms, CNN's artificial intelligence application uses off-chip memory to store data on the Convolution Layer. DMA can reduce processor load at every data transfer. It can also reduce application performance degradation by varying the order in which data from the Convolution layer is transmitted to the global buffer of the accelerator. For basic layouts with continuous memory addresses, SG-DMA showed about 3.4 times performance improvement in pre-setting DMA compared to using ordinaly DMA, and for Ideal layouts with discontinuous memory addresses, the ordinal DMA was about 1396 cycles faster than SG-DMA. Experiments have shown that a combination of memory data layout and DMA can reduce the DMA preset load by about 86 percent.

Parallel Implementation of Radon Transform on TMS320C80-based System (TMS320C80시스템에서 Radon 변환의 병렬 구현)

  • 송정호;성효경최흥문
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.727-730
    • /
    • 1998
  • In this paper, we propose an implementation of an efficient parallel Radon transform on TMS320C80-based system. For an N$\times$N SAR image, we can obtain O(NM/p) of the conventional parallel Radon transform, by representing the projection patterns in Radon space variables instead of the image space variables, and pipelining the algorithm, where p is the number of processors and M is the number of projection angles. Also, we can reduce the time for the dynamic load distribution among the nodes and the communication overheads of accessing the global memories, by pipelining the memory and processing operations by using tripple buffer structure. Experimental results show an efficient parallel Radon transform of speedup Sp=3.9 and efficiency E=97.5% for 256$\times$256 image, when implemented on TMS320C80 composed of four parallel slave processors with three memory blocks.

  • PDF