• Title/Summary/Keyword: Level 2 Cache

Search Result 68, Processing Time 0.023 seconds

Performance evaluation and analysis of TILE-Gx36 many-core processor with PARSEC benchmark (PARSEC을 이용한 TILE-Gx36 다중코어 프로세서의 성능 평가 및 분석)

  • Lee, Boseon;Kim, Han-Yee;Yu, Heonchang;Suh, Taeweon
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.1
    • /
    • pp.107-115
    • /
    • 2014
  • This paper evaluates and analyzes the performance of TILE-Gx36(Gx36), a many-core processor. The PARSEC parallel benchmark suite was used to measure the performance, and Core i7 (i7) and Atom are used for the performance comparison. When experimented with the maximum number of threads that can be executed concurrently on each machine, Gx36 showed a 2.73${\times}$ inferior performance to Core i7 and a 1.93${\times}$ superior performance to Atom. Gx36 has the largest Last Level Cache(LLC) among the compared processors. Nevertheless, it reported the biggest number of LLC misses, which, we strongly believe, is the major culprit for lower performance than expected. Our study suggests that the DDC employed in Gx36 is not a favorable cache structure for the general-purpose high-performance computing. The actual measurement with off-the-shelf machine provides non-biased data for polishing the future many-core architecture.

  • PDF

An Address Translation Technique Large NAND Flash Memory using Page Level Mapping (페이지 단위 매핑 기반 대용량 NAND플래시를 위한 주소변환기법)

  • Seo, Hyun-Min;Kwon, Oh-Hoon;Park, Jun-Seok;Koh, Kern
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.3
    • /
    • pp.371-375
    • /
    • 2010
  • SSD is a storage medium based on NAND Flash memory. Because of its short latency, low power consumption, and resistance to shock, it's not only used in PC but also in server computers. Most SSDs use FTL to overcome the erase-before-overwrite characteristic of NAND flash. There are several types of FTL, but page mapped FTL shows better performance than others. But its usefulness is limited because of its large memory footprint for the mapping table. For example, 64MB memory space is required only for the mapping table for a 64GB MLC SSD. In this paper, we propose a novel caching scheme for the mapping table. By using the mapping-table-meta-data we construct a fully associative cache, and translate the address within O(1) time. The simulation results show more than 80 hit ratio with 32KB cache and 90% with 512KB cache. The overall memory footprint was only 1.9% of 64MB. The time overhead of cache miss was measured lower than 2% for most workload.

A Study on the Performance of Prefetching Web Cache Proxy (Prefetch하는 웹 캐쉬 프록시의 성능에 대한 연구)

  • 백윤철
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.11
    • /
    • pp.1453-1464
    • /
    • 2001
  • Explosive growth of Internet populations results performance degradations of web service. Popular web sites cannot provide proper level of services to many requests, and such poor services cannot give user a satisfaction. Web caching, the remedy of this problem, reduces the amount of network traffic and gives fast response to user. In this paper, we analyze the characteristics of web cache traffics using traces of NLANR(National Laboratory for Applied Network Research) root caches and Education network cache in Seoul National University. Based on this analysis, we suggest a method of prefetching and we discuss the gains of our prefetching. As a result, we find proposed prefetching enhances hit rate up to 3%, response time up to 5%.

  • PDF

Measurement and Analysis on Operating System Supports for Web Proxy Cache (웹 프록시 캐쉬에 대한 운영체제 지원 성능의 측정과 분석)

  • Baek, Yun-Cheol;Chu, Yeon-Jun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.4
    • /
    • pp.450-456
    • /
    • 2002
  • Open source web service softwares are usually running on UNIX-based operating systems. Choosing an operating system can affect web system performance. In this paper, we describe a source - code-level time measurement tool and measure a service time of each system call that Squid - open source web proxy cache - makes. We run Squid 2.4.STABLEl on Linux 2.4.2 and Solaris 8, and we compare the measured time. As a result, Linux 2.4.2 performs better than Solaris 8. It can be used as a guide for selecting system software on building web service using open source software, and also be used as a basis for enhancing the operating system performance of supporting web service software.

An On-chip Multiprocessor Miroprocessor with Shared MMU and Cache

  • Lee, Yong-Hwan;Jeong, Woo-Kyeong;An, Sang-Jun;Lee, Yong-Surk
    • Journal of Electrical Engineering and information Science
    • /
    • v.2 no.4
    • /
    • pp.1-7
    • /
    • 1997
  • A multiprocessor microprocessor named SMPC(scaleable multiprocessor chip) that contains tow IU (integer unit) is presented in this paper. It can execute multiple instructions from several tasks exploiting task-level parallelism that is free from instruction dependencies, and provide high performance and throughput on both single program and multiprogramming environments. the IU is a 32-bit scalar processor expecially designed to boost up the performance of string manipulations which are frequently used in RDBMS(relational data base management system) applications. A memory management unit and a data cache shared by two IUs improve the performance and reduce the chip area required. ETH SMPC is implemented in VLSI circuit by custom design and automated design tools.

  • PDF

Designing a RAID 5 Controller with Two-Level Disk Cache (2단계 디스크 캐시를 이용한 RAID 5 제어기 설계)

  • 허정호;장태무
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04a
    • /
    • pp.25-27
    • /
    • 2003
  • RAID 시스템에서 디스크 캐시는 시스템 성능 향상에 중요한 요소 중 하나이다. 2단계 캐시는 1단계 캐시에 비해 우수한 성능을 보이고 시간적, 공간적 지역성에도 효율적이다. 제안된 캐시 시스템은 2 단계로 구성되어 1단계 캐시는 작은 블록 크기로 구성되어 세트 연관 사상 방식을 이용하고 2단계 캐시는 큰 블록 크기로 구성되어 전 연관 사상 방식을 사용한다. 본 논문에서는 특히 대용량 디스크 캐시에서 디스크입출력 시간을 향상시키고 효율적으로 일관성을 유지할 수 있는 디스크 제어기 상에 위치하는 RAID 5 디스크 캐시의 모델을 제시하여 적중률을 향상시켜 수행속도를 개선시키고자 한다.

  • PDF

A Performance Improvement of Linux TCP/IP Stack based on Flow-Level Parallelism in a Multi-Core System (멀티코어 시스템에서 흐름 수준 병렬처리에 기반한 리눅스 TCP/IP 스택의 성능 개선)

  • Kwon, Hui-Ung;Jung, Hyung-Jin;Kwak, Hu-Keun;Kim, Young-Jong;Chung, Kyu-Sik
    • The KIPS Transactions:PartA
    • /
    • v.16A no.2
    • /
    • pp.113-124
    • /
    • 2009
  • With increasing multicore system, much effort has been put on the performance improvement of its application. Because multicore system has multiple processing devices in one system, its processing power increases compared to the single core system. However in many cases the advantages of multicore can not be exploited fully because the existing software and hardware were designed to be suitable for single core. When the existing software runs on multicore, its performance improvement is limited by the bottleneck of sharing resources and the inefficient use of cache memory on multicore. Therefore, according as the number of core increases, it doesn't show performance improvement and shows performance drop in the worst case. In this paper we propose a method of performance improvement of multicore system by applying Flow-Level Parallelism to the existing TCP/IP network application and operating system. The proposed method sets up the execution environment so that each core unit operates independently as much as possible in network application, TCP/IP stack on operating system, device driver, and network interface. Moreover it distributes network traffics to each core unit through L2 switch. The proposed method allows to minimize the sharing of application data, data structure, socket, device driver, and network interface between each core. Also it allows to minimize the competition among cores to take resources and increase the hit ratio of cache. We implemented the proposed methods with 8 core system and performed experiment. Experimental results show that network access speed and bandwidth increase linearly according to the number of core.

Analysis on the Performance Impact of Partitioned LLC for Heterogeneous Multicore Processors (이종 멀티코어 프로세서에서 분할된 공유 LLC가 성능에 미치는 영향 분석)

  • Moon, Min Goo;Kim, Cheol Hong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.15 no.2
    • /
    • pp.39-49
    • /
    • 2019
  • Recently, CPU-GPU integrated heterogeneous multicore processors have been widely used for improving the performance of computing systems. Heterogeneous multicore processors integrate CPUs and GPUs on a single chip where CPUs and GPUs share the LLC(Last Level Cache). This causes a serious cache contention problem inside the processor, resulting in significant performance degradation. In this paper, we propose the partitioned LLC architecture to solve the cache contention problem in heterogeneous multicore processors. We analyze the performance impact varying the LLC size of CPUs and GPUs, respectively. According to our simulation results, the bigger the LLC size of the CPU, the CPU performance improves by up to 21%. However, the GPU shows negligible performance difference when the assigned LLC size increases. In other words, the GPU is less likely to lose the performance when the LLC size decreases. Because the performance degradation due to the LLC size reduction in GPU is much smaller than the performance improvement due to the increase of the LLC size of the CPU, the overall performance of heterogeneous multicore processors is expected to be improved by applying partitioned LLC to CPUs and GPUs. In addition, if we develop a memory management technique that can maximize the performance of each core in the future, we can greatly improve the performance of heterogeneous multicore processors.

DJFS: Providing Highly Reliable and High-Performance File System with Small-Sized NVRAM

  • Kim, Junghoon;Lee, Minho;Song, Yongju;Eom, Young Ik
    • ETRI Journal
    • /
    • v.39 no.6
    • /
    • pp.820-831
    • /
    • 2017
  • File systems and applications try to implement their own update protocols to guarantee data consistency, which is one of the most crucial aspects of computing systems. However, we found that the storage devices are substantially under-utilized when preserving data consistency because they generate massive storage write traffic with many disk cache flush operations and force-unit-access (FUA) commands. In this paper, we present DJFS (Delta-Journaling File System) that provides both a high level of performance and data consistency for different applications. We made three technical contributions to achieve our goal. First, to remove all storage accesses with disk cache flush operations and FUA commands, DJFS uses small-sized NVRAM for a file system journal. Second, to reduce the access latency and space requirements of NVRAM, DJFS attempts to journal compress the differences in the modified blocks. Finally, to relieve explicit checkpointing overhead, DJFS aggressively reflects the checkpoint transactions to file system area in the unit of the specified region. Our evaluation on TPC-C SQLite benchmark shows that, using our novel optimization schemes, DJFS outperforms Ext4 by up to 64.2 times with only 128 MB of NVRAM.

Enhancing Location Privacy through P2P Network and Caching in Anonymizer

  • Liu, Peiqian;Xie, Shangchen;Shen, Zihao;Wang, Hui
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.5
    • /
    • pp.1653-1670
    • /
    • 2022
  • The fear that location privacy may be compromised greatly hinders the development of location-based service. Accordingly, some schemes based on the distributed architecture in peer-to-peer network for location privacy protection are proposed. Most of them assume that mobile terminals are mutually trusted, but this does not conform to realistic scenes, and they cannot make requirements for the level of location privacy protection. Therefore, this paper proposes a scheme for location attribute-based security authentication and private sharing data group, so that they trust each other in peer-to-peer network and the trusted but curious mobile terminal cannot access the initiator's query request. A new identifier is designed to allow mobile terminals to customize the protection strength. In addition, the caching mechanism is introduced considering the cache capacity, and a cache replacement policy based on deep reinforcement learning is proposed to reduce communications with location-based service server for achieving location privacy protection. Experiments show the effectiveness and efficiency of the proposed scheme.