• Title/Summary/Keyword: 직접 메모리 접근

Search Result 59, Processing Time 0.031 seconds

Advanced Victim Cache with Processor Reuse Information (프로세서의 재사용 정보를 이용하는 개선된 고성능 희생 캐쉬)

  • Kwak Jong Wook;Lee Hyunbae;Jhang Seong Tae;Jhon Chu Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.12
    • /
    • pp.704-715
    • /
    • 2004
  • Recently, a single or multi processor system uses the hierarchical memory structure to reduce the time gap between processor clock rate and memory access time. A cache memory system includes especially two or three levels of caches to reduce this time gap. Moreover, one of the most important things In the hierarchical memory system is the hit rate in level 1 cache, because level 1 cache interfaces directly with the processor. Therefore, the high hit rate in level 1 cache is critical for system performance. A victim cache, another high level cache, is also important to assist level 1 cache by reducing the conflict miss in high level cache. In this paper, we propose the advanced high level cache management scheme based on the processor reuse information. This technique is a kind of cache replacement policy which uses the frequency of processor's memory accesses and makes the higher frequency address of the cache location reside longer in cache than the lower one. With this scheme, we simulate our policy using Augmint, the event-driven simulator, and analyze the simulation results. The simulation results show that the modified processor reuse information scheme(LIVMR) outperforms the level 1 with the simple victim cache(LIV), 6.7% in maximum and 0.5% in average, and performance benefits become larger as the number of processors increases.

Energy-Performance Efficient 2-Level Data Cache Architecture for Embedded System (내장형 시스템을 위한 에너지-성능 측면에서 효율적인 2-레벨 데이터 캐쉬 구조의 설계)

  • Lee, Jong-Min;Kim, Soon-Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.5
    • /
    • pp.292-303
    • /
    • 2010
  • On-chip cache memories play an important role in both performance and energy consumption points of view in resource-constrained embedded systems by filtering many off-chip memory accesses. We propose a 2-level data cache architecture with a low energy-delay product tailored for the embedded systems. The L1 data cache is small and direct-mapped, and employs a write-through policy. In contrast, the L2 data cache is set-associative and adopts a write-back policy. Consequently, the L1 data cache is accessed in one cycle and is able to provide high cache bandwidth while the L2 data cache is effective in reducing global miss rate. To reduce the penalty of high miss rate caused by the small L1 cache and power consumption of address generation, we propose an ECP(Early Cache hit Predictor) scheme. The ECP predicts if the L1 cache has the requested data using both fast address generation and L1 cache hit prediction. To reduce high energy cost of accessing the L2 data cache due to heavy write-through traffic from the write buffer laid between the two cache levels, we propose a one-way write scheme. From our simulation-based experiments using a cycle-accurate simulator and embedded benchmarks, the proposed 2-level data cache architecture shows average 3.6% and 50% improvements in overall system performance and the data cache energy consumption.

Performance Analysis of NVMe SSDs and Design of Direct Access Engine on Virtualized Environment (가상화 환경에서 NVMe SSD 성능 분석 및 직접 접근 엔진 개발)

  • Kim, Sewoog;Choi, Jongmoo
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.129-137
    • /
    • 2018
  • NVMe(Non-Volatile Memory Express) SSD(Solid State Drive) is a high-performance storage that makes use of flash memory as a storage cell, PCIe as an interface and NVMe as a protocol on the interface. It supports multiple I/O queues which makes it feasible to process parallel-I/Os on multi-core environments and to provide higher bandwidth than SATA SSDs. Hence, NVMe SSD is considered as a next generation-storage for data-center and cloud computing system. However, in the virtualization system, the performance of NVMe SSD is not fully utilized due to the bottleneck of the software I/O stack. Especially, when it uses I/O stack of the hypervisor or the host operating system like Xen and KVM, I/O performance degrades seriously due to doubled-I/O stack between host and virtual machine. In this paper, we propose a new I/O engine, called Direct-AIO (Direct-Asynchronous I/O) engine, that can access NVMe SSD directly for I/O performance improvements on QEMU emulator. We develop our proposed I/O engine and analyze I/O performance differences between the existed I/O engine and Direct-AIO engine.

Direct Mapping of the Executable Code in Single-tier Memory Operating System using SCM (SCM을 이용한 단일계층 메모리 운영체제에서의 실행 코드 직접 매핑 기법)

  • Park, Jong Woo;Jung, Seung Wan;Yoon, Jun young;Seo, Dae-Wha
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.81-82
    • /
    • 2013
  • 바이트 단위로 접근이 가능하고, 비휘발성을 가지는 SCM(Storage Class Memory)을 이용하여 프로세스의 작업공간으로 활용함과 동시에 파일을 저장하는 형태의 운영체제 기법에 대한 연구가 활발하게 이루어지고 있다. 본 논문에서는 이러한 형태에서 파일이 저장되는 방법을 토대로 프로세스 생성 시 실행 파일의 읽기 전용의 특성을 가지는 실행 코드를 프로세스 공간에 직접 매핑하는 기법에 대하여 제안한다.

Development of the Mobile AddressBook Web Services (모바일 주소록 웹서비스 개발)

  • Ku Yong-Mo;Lee Eun-Jung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2006.05a
    • /
    • pp.443-446
    • /
    • 2006
  • 최근 무선 인터넷 환경이 보편화 되면서 플랫폼과 언어에 독립적인 통신 방법인 웹서비스 기술이 모바일 환경에서도 주목을 받고 있다. 그러나 모바일 웹서비스를 위한 새로운 기술이 소개되고 있지만 아직 모바일 환경에서의 웹서비스 사례는 별로 많지 않다. 본 논문에서는 유무선 통합 환경을 지원하기 위한 주소록 웹서비스 시스템의 설계와 구현을 소개한다. 주소록 웹서비스는 개인 주소록 정보를 가지고 웹서비스를 지원하는 PIMS 주소록 웹서버와 웹서비스를 이용하는 모바일 클라이언트 및 온라인 클라이언트 애플리케이션으로 구성된다. 유선과 무선의 클라이언트 어플리케이션은 다른 접근 기능을 가지도록 설계하여 모바일 웹서비스 클라이언트의 활용 가능성을 확인하였다. 주소록 웹서비스를 구현하기 위해 Apache의 웹서비스 프로젝트인 Axis와 Axis의 J2ME 구현체인 Axis MIRAE 플랫폼을 이용하였다. 특히 모바일 웹서비스 클라이언트는 단말의 제한된 메모리 용량의 문제를 해결하기 위해 자동 생성된 Stub 코드를 사용하지 않고 파서 API를 통해 SOAP 메시지를 직접 생성하고 접근하는 방법을 도입하였다.

  • PDF

Merging Memory Address Space and Block Device using Byte-Addressable NV-RAM (파일 시스템 마운트 단계의 제거: NV-RAM을 이용한 메모리 영역과 파일 시스템 영역의 융합)

  • Shin, Hyung-Jong;Kim, Eun-Ki;Jeon, Byung-Gil;Won, You-Jip
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10b
    • /
    • pp.296-301
    • /
    • 2007
  • 본 논문은 낸드 플래쉬 디바이스의 고질적인 문제인 마운트 지연시간을 바이트 접근성을 가지는 비휘발성 저장소자를 이용하여 해결하는 기법을 다룬다. 낸드 플래쉬 디바이스를 사용하기 위해서는, 마운트시에 낸드 플래쉬 디바이스의 전 영역에 걸쳐 분산되어 저장되어 있는 메타 데이터를 스캔하여, 해당 파일 시스템 파티션의 사용-구성정보 자료를 주기억장치에 생성해야 한다. 이러한 과정은 대용량 낸드 플래쉬 디바이스를 사용하는 경우 매우 긴 시간을 필요로 하게 되어 실제 환경에서는 낸드 플래쉬 디바이스를 채용하기가 어렵다. 본 논문에서는 차세대 비휘발성 저장장치의 바이트 단위의 접근 가능성을 활용한다. 낸드 플래쉬 디바이스 마운트시에 생성되는 최종 자료구조를 직접 NVRAM에 저장함으로써 낸드 플래실 디바이스의 메타 데이터를 스캔 하는 절차를 완전히 제거하였다. 즉, 낸드 플래처 디바이스의 마운트에 필요한 메타 데이터의 In-memory Data Structure를 NVRAM상에 저장하여 두면 이 후 NVRAM상에는 그 정보가 계속 유지되어 있기 때문에 낸드 플래쉬 디바이스의 마운트 동작은 단순히 Memory Pointer Mapping 정도의 간단하고 빠른 동작만으로도 충분하다. 본 논문에서는 비휘발성 메모리 소자가 블록 디바이스와 메모리 영역에 동시에 사상되어 있는 융합 파일 시스템을 성공적으로 개발하였다. 마운트 시간의 측정결과 효율적인 기존의 낸드 플래쉬 파일 시스템인 YAFFS에 비해 파티션의 크기나 파티션내 File의 개수에 관계없이 그 값이 매우 작고 고정적인 수치를 갖는다는 것을 확인하였다.

  • PDF

WWCLOCK: Page Replacement Algorithm Considering Asymmetric I/O Cost of Flash Memory (WWCLOCK: 플래시 메모리의 비대칭적 입출력 비용을 고려한 페이지 교체 알고리즘)

  • Park, Jun-Seok;Lee, Eun-Ji;Seo, Hyun-Min;Koh, Kern
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.12
    • /
    • pp.913-917
    • /
    • 2009
  • Flash memories have asymmetric I/O costs for read and write in terms of latency and energy consumption. However, the ratio of these costs is dependent on the type of storage. Moreover, it is becoming more common to use two flash memories on a system as an internal memory and an external memory card. For this reason, buffer cache replacement algorithms should consider I/O costs of device as well as possibility of reference. This paper presents WWCLOCK(Write-Weighted CLOCK) algorithm which directly uses I/O costs of devices along with recency and frequency of cache blocks to selecting a victim to evict from the buffer cache. WWCLOCK can be used for wide range of storage devices with different I/O cost and for systems that are using two or more memory devices at the same time. In addition to this, it has low time and space complexity comparable to CLOCK algorithm. Trace-driven simulations show that the proposed algorithm reduces the total I/O time compared with LRU by 36.2% on average.

A Reconfigurable Parallel Processor for Efficient Processing of Mobile Multimedia (모바일 멀티미디어의 효율적 처리를 위한 재구성형 병렬 프로세서의 구조)

  • Yoo, Se-Hoon;Kim, Ki-Chul;Yang, Yil-Suk;Roh, Tae-Moon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.10
    • /
    • pp.23-32
    • /
    • 2007
  • This paper proposes a reconfigurable parallel processor architecture which can efficiently implement various multimedia applications, such as 3D graphics, H.264/H.263/MPEG-4, JPEG/JPEG2000, and MP3. The proposed architecture directly connects memories and processors so that memory access time and power consumption are reduced. It supports floating-point operations needed in the geometry stage of 3D graphics. It adopts partitioned SIMD to reduce hardware costs. Conditional execution of instructions is used for easy development of parallel algorithms.

Design of a DMA Controller for Augmented Reality in Embedded System (증강현실을 위한 임베디드 시스템의 DMA 컨트롤러 설계)

  • Jang, Su Yeon;Oh, Jung Hwan;Yoon, Young Hyun;Lee, Seong Mo;Lee, Seung Eun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.7
    • /
    • pp.822-828
    • /
    • 2019
  • An Augmented Reality(AR) provides virtual information with a real environment, and the processor needs to access the memory for the AR system. However, the processor has the heavy workload as the technology improvement leads to increase the size of data. We need a specific module to reduce the workload to overcome the limitation. In this paper, we propose a Direct Memory Access(DMA) controller displaying image instead of the processor. We implemented the proposed DMA controller on a Field Programmable Gate Array(FPGA) and demonstrated the functionality of the DMA controller based on an Avalon Memory Mapped(Avalon-MM) interface. Also, the DMA controller is fabricated by using Magnachip/Hynix 0.35um CMOS technology and verified the feasibility of the embedded system.

An Energy-Delay Efficient System with Adaptive Victim Caches (선택적 희생 캐쉬를 이용한 저전력 고성능 시스템 설계 방안)

  • Kim Cheol Hong;Shim Sunghoon;Jhon Chu Shik;Jhang Seong Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.663-674
    • /
    • 2005
  • We propose a system aimed at achieving high energy-delay efficiency by using adaptive victim caches. Particularly, we investigate methods to improve the hit rates in the first level of memory hierarchy, which reduces the number of accesses to mort power consuming memory structures such as L2 cache. Victim cache is a memory element for reducing conflict misses in a direct-mapped L1 cache. We present two techniques to fill the victim cache with the blocks that have higher probability to be re-reqeusted by processor. Hit-based victim cache ks tilled with the blocks which were referenced frequently by processor. Replacement-based victim cache is filled with the blocks which were evicted from the sets where block replacements had happened frequently According to our simulations, replacement-based victim cache scheme outperforms the conventional victim cache scheme about $2\%$ on average and refutes the power consumption by up to $8\%$.