• Title/Summary/Keyword: data cache

Search Result 487, Processing Time 0.024 seconds

A Real-time Single-Pass Visibility Culling Method Based on a 3D Graphics Accelerator Architecture (실시간 단일 패스 가시성 선별 기법 기반의 3차원 그래픽스 가속기 구조)

  • Choo, Catherine;Choi, Moon-Hee;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.15A no.1
    • /
    • pp.1-8
    • /
    • 2008
  • An occlusion culling method, one of visibility culling methods, excludes invisible objects or triangles which are covered by other objects. As it reduces computation quantity, occlusion culling is an effective method to handle complex scenes in real-time. But an existing common occlusion culling method, such as hardware occlusion query method, sends objects' data twice to GPU and this causes processing overheads once for occlusion culling test and the other is for rendering. And another existing hardware occlusion culling method, VCBP, can test objects' visibility quickly, but it neither test bounding volume nor return test result to application stage. In this paper, we propose a single pass occlusion culling method which uses temporal and spatial coherency, with effective occlusion culling hardware architecture. In our approach, the hardware performs occlusion culling test rapidly with cache on the rasterization stage where triangles are transformed into fragments. At the same time, hardware sends each primitive's visibility information to application stage. As a result, the application stage reduces data transmission quantity by excluding covered objects using the visibility information on previous frame and hierarchical spatial tree. Our proposed method improved maximum 44%, minimum 14% compared with S&W method based on hardware occlusion query. And the performance is increased 25% and 17% respectively, compared to maximum and minimum performance of CHC method which is based on occlusion culling method.

The Efficient Merge Operation in Log Buffer-Based Flash Translation Layer for Enhanced Random Writing (임의쓰기 성능향상을 위한 로그블록 기반 FTL의 효율적인 합병연산)

  • Lee, Jun-Hyuk;Roh, Hong-Chan;Park, Sang-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.161-186
    • /
    • 2012
  • Recently, the flash memory consistently increases the storage capacity while the price of the memory is being cheap. This makes the mass storage SSD(Solid State Drive) popular. The flash memory, however, has a lot of defects. In order that these defects should be complimented, it is needed to use the FTL(Flash Translation Layer) as a special layer. To operate restrictions of the hardware efficiently, the FTL that is essential to work plays a role of transferring from the logical sector number of file systems to the physical sector number of the flash memory. Especially, the poor performance is attributed to Erase-Before-Write among the flash memory's restrictions, and even if there are lots of studies based on the log block, a few problems still exists in order for the mass storage flash memory to be operated. If the FAST based on Log Block-Based Flash often is generated in the wide locality causing the random writing, the merge operation will be occur as the sectors is not used in the data block. In other words, the block thrashing which is not effective occurs and then, the flash memory's performance get worse. If the log-block makes the overwriting caused, the log-block is executed like a cache and this technique contributes to developing the flash memory performance improvement. This study for the improvement of the random writing demonstrates that the log block is operated like not only the cache but also the entire flash memory so that the merge operation and the erase operation are diminished as there are a distinct mapping table called as the offset mapping table for the operation. The new FTL is to be defined as the XAST(extensively-Associative Sector Translation). The XAST manages the offset mapping table with efficiency based on the spatial locality and temporal locality.

A Kernel Module to Support High-Performance Intra-Node Communication for Multi-Core Systems (멀티 코어 시스템을 위한 고속 노드내 통신 지원 모듈)

  • Jin, Hyun-Wook;Kang, Hyun-Goo;Kim, Jong-Soon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.9
    • /
    • pp.407-415
    • /
    • 2007
  • In parallel cluster computing systems, the efficiency of communication between computing nodes is one of important factors that decide overall system performance. Accordingly, many researchers have studied on high-performance inter-node communication. The recently launched multi-core processor, however. increases the importance of intra-node communication as well because the more the number of cores in a node, the more the number of parallel processes running in the same node. Though there have been studies on intra-node communications, these have limited considerations on the state-of-the-art systems. In this paper, we propose a Linux kernel module that minimizes the number of data copy by exploiting the memory mapping mechanism for high-performance intra-node communication. The proposed kernel module supports the Linux kernel version 2.6. The performance measurements over a multi-core system present that the proposed kernel module can achieve lower latency up to 62% and higher throughput up to 144% than an existing kernel module approach. In addition, the measurements reveal that the performance of intra-node communication can vary significantly based on whether the cores that run the communication processes are belong to the same processor package (i.e., sharing the L2 cache).

Service Worker Technology and Standardization (서비스워커 기술 및 표준화 동향)

  • Hwang, Hyun-seo;Kim, Sung-hyun;Jung, Yong-jin;Park, Jong-geun;Kim, Tae-yong;Kim, Tae-hwan;Moon, Il-young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.656-659
    • /
    • 2015
  • Recently, due to the standard of a new browser developed by the Google and Mozilla "Service Worker", future users is expected to be able to make use of favorite Web sites offline. Google's is, Web sites have developed a standard of a new browser so as to always respond to user requests. Service Worker, websites that provide space capable of offline work to the user's browser to store various document information, to provide the necessary resources. Then, in order to greatly reduce the data exchange operations between the browser and the server, the speed of the Web page increases. Not only cooks as native app that can use the Web application offline, in that us to also further enhance the characteristics of an existing Web application that is running without installing destructive high technology. Service worker specifications, use experience of Web application is very can be improved, is an innovative technology indicates the version of the web evolve as the future of the platform. Service Worker is not included in HTML5 standard final, is currently being continued standardization. Future Service Worker technology I expect what kind of thing unfolds when applied to the Web browser.

  • PDF

A Development of Fusion Processor Architecture for Efficient Main Memory Access in CPU-GPU Environment (CPU-GPU환경에서 효율적인 메인메모리 접근을 위한 융합 프로세서 구조 개발)

  • Park, Hyun-Moon;Kwon, Jin-San;Hwang, Tae-Ho;Kim, Dong-Sun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.11 no.2
    • /
    • pp.151-158
    • /
    • 2016
  • The HSA resolves an old problem with existing CPU and GPU architectures by allowing both units to directly access each other's memory pools via unified virtual memory. In a physically realized system, however, frequent data exchanges between CPU and GPU for a virtual memory block result bottlenecks and coherence request overheads. In this paper, we propose Fusion Processor Architecture for efficient access of main memory from both CPU and GPU. It consists of Job Manager, Re-mapper, and Pre-fetcher to control, organize, and distribute work loads and working areas for GPU cores. These components help on reducing memory exchanges between the two processors and improving overall efficiency by eliminating faulty page table requests. To verify proposed algorithm architectures, we develop an emulator based on QEMU, and compare several architectures such as CUDA(Compute Unified Device Architecture), OpenMP, OpenCL. As a result, Proposed fusion processor architectures show 198% faster than others by removing unnecessary memory copies and cache-miss overheads.

Effective Prioritized HRW Mapping in Heterogeneous Web Server Cluster (이질적 웹 서버 클러스터 환경에서 효율적인 우선순위 가중치 맵핑)

  • 김진영;김성천
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.12
    • /
    • pp.708-713
    • /
    • 2003
  • For many years clustered heterogeneous web server architecture has been formed on the internet because the explosive internet services and the various quality of requests. The critical point in cluster environment is the mapping schemes of request to server. and recently this is the main issue of internet architecture. The topic of previous mapping methods is to assign equal loads to servers in cluster using the number of requests. But recent growth of various services makes it hard to depend on simple load balancing to satisfy appropriate latency. So mapping based on requested content to decrease response time and to increase cache hit rates on entire servers - so called “content-based” mapping is highly valuated on the internet recently. This paper proposes Prioritized Highest Random Weight mapping(PHRW mapping) that improves content-based mapping to properly fit in the heterogeneous environment. This mapping scheme that assigns requests to the servers with priority, is very effective on heterogeneous web server cluster, especially effective on decreasing latency of reactive data service which has limit on latency. This paper have proved through algorithm and simulation that proposed PHRW mapping show higher-performance by decrease in latency.

A Dynamic Transaction Routing Algorithm with Primary Copy Authority (주사본 권한을 이용한 동적 트랜잭션 분배 알고리즘)

  • Kim, Ki-Hyung;Cho, Hang-Rae;Nam, Young-Hwan
    • The KIPS Transactions:PartD
    • /
    • v.10D no.7
    • /
    • pp.1067-1076
    • /
    • 2003
  • Database sharing system (DSS) refers to a system for high performance transaction processing. In DSS, the processing nodes are locally coupled via a high speed network and share a common database at the disk level. Each node has a local memory and a separate copy of operating system. To reduce the number of disk accesses, the node caches database pages in its local memory buffer. In this paper, we propose a dynamic transaction routing algorithm to balance the load of each node in the DSS. The proposed algorithm is novel in the sense that it can support node-specific locality of reference by utilizing the primary copy authority assigned to each node; hence, it can achieve better cache hit ratios and thus fewer disk I/Os. Furthermore, the proposed algorithm avoids a specific node being overloaded by considering the current workload of each node. To evaluate the performance of the proposed algorithm, we develop a simulation model of the DSS, and then analyze the simulation results. The results show that the proposed algorithm outperforms the existing algorithms in the transaction processing rate. Especially the proposed algorithm shows better performance when the number of concurrently executed transactions is high and the data page access patterns of the transactions are not equally distributed.