Search | Korea Science

Memory Latency Hiding Techniques (메모리 지연을 감추는 기법들)

Ki, An-Do
- Electronics and Telecommunications Trends
- /
- v.13 no.3 s.51
- /
- pp.61-70
- /
- 1998
The obvious way to make a computer system more powerful is to make the processor as fast as possible. Furthermore, adopting a large number of such fast processors would be the next step. This multiprocessor system could be useful only if it distributes workload uniformly and if its processors are fully utilized. To achieve a higher processor utilization, memory access latency must be reduced as much as possible and even more the remaining latency must be hidden. The actual latency can be reduced by using fast logic and the effective latency can be reduced by using cache. This article discusses what the memory latency problem is, how serious it is by presenting analytical and simulation results, and existing techniques for coping with it; such as write-buffer, relaxed consistency model, multi-threading, data locality optimization, data forwarding, and data prefetching.
https://doi.org/10.22648/ETRI.1998.J.130305 인용 PDF

A new direct-mapped cache with fully associative buffer for low power consumption by using bank-selection mechanism (저 전력을 위한 뱅크 선택 메커니즘과 새로운 동작 메커니즘을 이용한 직접사상 캐쉬 및 버퍼 시스템)

이종성;이정훈;김신덕
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.10a
- /
- pp.223-225
- /
- 2003
본 논문은 서로 다른 두 구조의 캐쉬와 새로운 뱅크선별기를 이용하여, 보다 효율적인 뱅크관리 메커니즘을 응용한 새로운 개념의 캐쉬 구조에 대한 설명을 한다. 크기가 작음에도 불구하고, 낮은 접근 실패율(Miss ratio)와 높은 저전력 효과는 기존의 일반적인 직접사상 캐쉬와 비교했을 때, 성능면에서 월등한 차이를 나타내고 있다. 이러한 결과의 원인은 직접사상 캐쉬와 완전연관 버퍼의 최적화된 구성과. 효과적인 뱅크선별기를 사용하여 적은 전력에도 높은 성능을 발휘하는 새로운 메커니즘을 사용하였기 때문이다. 제안한 구조의 성능은 다양한 크기의 직접사상 캐쉬와 비교하였으며, 접근 실패율, 평균 메모리 접근 시간, 전력소비, Energy * Delay Product 등 모두 4가지의 지표를 사용하였다.
PDF

Reconsidering Performance Measurement when Non-Volatile RAM is used in the Buffer Cache (차세대 비휘발성 메모리가 추가된 버퍼캐쉬에서 성능 측정 방법의 재조명)

Lee Kyuhyung;Choi Jongmoo;Lee Donghee;Noh SamH.
- Proceedings of the Korean Information Science Society Conference
- /
- 2005.07a
- /
- pp.793-795
- /
- 2005
영속적인 데이터 저장이 가능한 차세대 비휘발성 메모리를 휘발성 메모리와 혼용하여 버퍼캐처로 사용하면, 안정성과 성능향상의 효과를 얻을 수 있다. 본 연구에서는 기존의 연구에서 제시한 캐처관리 정책을 시뮬레이터를 이용하여 실험하고 실험 결과를 분석하여 비휘발성 메모리가 추가된 캐처의 새로운 특성을 밝혀냈다. 비휘발성 메모리가 캐쉬에 포함되면 읽기 쓰기의 요청의 종류, 미스(miss)되었을 경우 캐쉬될 블록의 더티(dirty)여부, 읽기 요청이 적중(hit)되었을 때, 적중된 블록의 메모리 종류에 따라 각각의 요청을 처리하기 위한 디스크 접근횟수가 달라지는 특성을 나타낸다. 이 특성 때문에 비휘발성 메모리가 추가된 버퍼캐처는 적중률(hit rate) 보다는 디스크 접근횟수를 측정하는 것이 정확한 성능측정을 가능하게 한다.
PDF

A Enhanced Set-Associative Page Cache Scheme using Pollute Buffer (오염 버퍼를 적용한 집합 연상 페이지 캐시 기법)

An, Deukhyeon;Kim, Jeehong;Eom, Young Ik
- Proceedings of the Korea Information Processing Society Conference
- /
- 2012.11a
- /
- pp.241-242
- /
- 2012
큰 데이터 트래픽을 일으키는 I/O 작업을 수행할 경우에 많은 디스크 접근과 데이터 처리가 발생하며 이는 컴퓨팅 성능의 하락을 일으킨다. 이를 위해 메모리와 디스크 사이에 버퍼 역할을 하는 페이지 캐시 기법이 사용된다. 그러나 LRU 를 사용하는 페이지 캐시의 특성상, 많은 양의 데이터가 한번만 접근되고 다시 사용되지 않는다면 성능상의 큰 효과가 없다. 본 논문에서는 집합 연상 페이지 캐시에 오염 버퍼를 둠으로써, 재사용되지 못하고 페이지 캐시의 크기만 커지는 현상을 최소화시켜 I/O 성능을 개선시킬 수 있는 방법을 제안한다.
https://doi.org/10.3745/PKIPS.y2012m11a.241 인용 PDF

NVM-based Write Amplification Reduction to Avoid Performance Fluctuation of Flash Storage (플래시 스토리지의 성능 지연 방지를 위한 비휘발성램 기반 쓰기 증폭 감소 기법)

Lee, Eunji;Jeong, Minseong;Bahn, Hyokyung
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.16 no.4
- /
- pp.15-20
- /
- 2016
Write amplification is a critical factor that limits the stable performance of flash-based storage systems. To reduce write amplification, this paper presents a new technique that cooperatively manages data in flash storage and nonvolatile memory (NVM). Our scheme basically considers NVM as the cache of flash storage, but allows the original data in flash storage to be invalidated if there is a cached copy in NVM, which can temporarily serve as the original data. This scheme eliminates the copy-out operation for a substantial number of cached data, thereby enhancing garbage collection efficiency. Experimental results show that the proposed scheme reduces the copy-out overhead of garbage collection by 51.4% and decreases the standard deviation of response time by 35.4% on average.
https://doi.org/10.7236/JIIBC.2016.16.4.15 인용 PDF KSCI

A Design of Fractional Motion Estimation Engine with 4×4 Block Unit of Interpolator & SAD Tree for 8K UHD H.264/AVC Encoder (8K UHD(7680×4320) H.264/AVC 부호화기를 위한 4×4블럭단위 보간 필터 및 SAD트리 기반 부화소 움직임 추정 엔진 설계)

Lee, Kyung-Ho;Kong, Jin-Hyeung
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.6
- /
- pp.145-155
- /
- 2013
In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Fractional Motion Estimation in 8K UHD($7680{\times}4320$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $10{\times}10$ reference data for interpolation, we design 2D cache buffer which consists of the $10{\times}10$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The gate count is 436.5Kgates. The proposed H.264/AVC Fractional Motion Estimation can support 8K UHD at 30 frames per second by running at 187MHz.
https://doi.org/10.5573/ieek.2013.50.6.145 인용 PDF KSCI

A File System for User Special Functions using Speed-based Prefetch in Embedded Multimedia Systems (임베디드 멀티미디어 재생기에서 속도기반 미리읽기를 이용한 사용자기능 지원 파일시스템)

Choe, Tae-Young;Yoon, Hyeon-Ju
- Journal of KIISE:Computing Practices and Letters
- /
- v.14 no.7
- /
- pp.625-635
- /
- 2008
Portable multimedia players have some different properties compared to general multimedia file server. Some of those properties are single user ownership, relatively low hardware performance, I/O burst by user special functions, and short software development cycles. Though suitable for processing multiple user requests at a time, the general multimedia file systems are not efficient for special user functions such as fast forwards/backwards. Soml' methods has been proposed to improve the performance and functionality, which the application programs give prediction hints to the file system. Unfortunately, they require the modification of all applications and recompilation. In this paper, we present a file system that efficiently supports user special functions in embedded multimedia systems using file block allocation, buffer-cache, and prefetch. A prefetch algorithm, SPRA (SPeed-based PRefetch Algorithm) predicts the next block using I/O patterns instead of hints from applications and it is resident in the file system, so doesn't affect application development process. From the experimental file system implementation and comparison with Linux readahead-based algorithms, the proposed system shows $4.29%{\sim}52.63%$ turnaround time and 1.01 to 3,09 times throughput in average.
PDF KSCI

A Design of 4×4 Block Parallel Interpolation Motion Compensation Architecture for 4K UHD H.264/AVC Decoder (4K UHD급 H.264/AVC 복호화기를 위한 4×4 블록 병렬 보간 움직임보상기 아키텍처 설계)

Lee, Kyung-Ho;Kong, Jin-Hyeung
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.5
- /
- pp.102-111
- /
- 2013
In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Motion Compensation in 4K UHD($3840{\times}2160$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $9{\times}9$ reference data for interpolation, we design 2D cache buffer which consists of the $9{\times}9$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The maximum operation frequency is 150MHz. The gate count is 161Kgates. The proposed H.264/AVC Motion Compensation can support 4K UHD at 72 frames per second by running at 150MHz.
https://doi.org/10.5573/ieek.2013.50.5.102 인용 PDF KSCI

Implementation and Evaluation of Proxy Caching Mechanisms with Video Qualify Adjustment

Sasabe, Masahiro;Taniguchi, Yoshiaki;Wakamiya, Naoki;Murata, Masayuki;Miyahara, Hideo
- Proceedings of the IEEK Conference
- /
- 2002.07a
- /
- pp.121-124
- /
- 2002
The proxy mechanism widely used in WWW systems offers low-delay data delivery by means of "proxy server". By applying the proxy mechanisms to the video streaming system, we expect that high-quality and low-delay video distribution can be accomplished without introducing extra load on the system. In addition, it is effective to adapt the quality of cached video data appropriately in the proxy if user requests are diverse due to heterogeneity in terms of the available bandwidth, end-system performance, and user′s preferences on the perceived video quality. We have proposed proxy caching mechanisms to accomplish the high-quality and highly-interactive video streaming services. In our proposed system, a video stream is divided into blocks for efficient use of the cache buffer. The proxy server is assumed to be able to adjust the quality of a cached or retrieved video block to the request through video filters. In this paper, to verify the practicality of our mechanisms, we implemented them on a real system and conducted experiments. Through evaluations from several performance aspects, it was shown that our proposed mechanisms can provide users with a low-latency and high-quality video streaming service in a heterogeneous environment.
PDF

A New File System for Multimedia Data Stream (멀티미디어 데이터 스트림을 위한 파일 시스템의 설계 및 구현)

Lee, Minsuk;Song, Jin-Seok
- IEMEK Journal of Embedded Systems and Applications
- /
- v.1 no.2
- /
- pp.90-103
- /
- 2006
There are many file systems in various operating systems. Those are usually designed for server environments, where the common cases are usually 'multiple active users', 'great many small files' And they assume a big main memory to be used as buffer cache. So the existing file systems are not suitable for resource hungry embedded systems that process multimedia data streams. In this study, we designed and implemented a new file system which efficiently stores and retrieves multimedia data steams. The proposed file system has a very simple disk layout, which guarantees a quick disk initialization and file system recovery. And we introduced a new indexing-scheme, called the time-based indexing scheme, with the file system. With the indexing scheme, the file system maintains the relation between time and the location for all the multimedia streams. The scheme is useful in searching and playing the compressed multimedia streams by locating exact frame position with given time, resulting in reduction of CPU processing and power consumption. The proposed file system and its APIs utilizing the time-based indexing schemes were implemented firstly on a Linux environment, though it is operating system independent. In the performance evaluation on a real DVR system, which measured the execution time of multi-threaded reading and writing, we found the proposed file system is maximum 38.7% faster than EXT2 file system.
PDF

Search Result 131, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)