Search | Korea Science

Instructions and Data Prefetch Mechanism using Displacement History Buffer (변위 히스토리 버퍼를 이용한 명령어 및 데이터 프리페치 기법)

Jeong, Yong Su;Kim, JinHyuk;Cho, Tae Hwan;Choi, SangBang
- Journal of the Institute of Electronics and Information Engineers
- /
- v.52 no.10
- /
- pp.82-94
- /
- 2015
In this paper, we propose hardware prefetch mechanism with an efficient cache replacement policy by giving priority to the trigger block in which a spatial region and producing a spatial region by using the displacement field. It could be taken into account the sequence of the program since a history is based on the trigger block of history record, and it could be quickly prefetching the instructions or data address by adding a stored value to the trigger address and displacement field since a history is stored as a displacement value. Also, we proposed a method of replacing at random by the cache replacement policy from the low priority block when the cache area is full after giving priority to the trigger block. We analyzed using the memory simulator program gem5 and PARSEC benchmark to assess the performance of the hardware prefetcher. As a result, compared to the existing hardware prefecture to generate the spatial region using a bit vector, L1 data cache miss rate was reduced about 44.5% on average and an average of 26.1% of L1 instruction misses occur. In addition, IPC (Instruction Per Cycle) showed an improvement of about 23.7% on average.
https://doi.org/10.5573/ieie.2015.52.10.082 인용 PDF KSCI

Efficient DRAM Buffer Access Scheduling Techniques for SSD Storage System (SSD 스토리지 시스템을 위한 효율적인 DRAM 버퍼 액세스 스케줄링 기법)

Park, Jun-Su;Hwang, Yong-Joong;Han, Tae-Hee
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.48 no.7
- /
- pp.48-56
- /
- 2011
Recently, new storage device SSD(Solid State Disk) based on NAND flash memory is gradually replacing HDD(Hard Disk Drive) in mobile device and thus a variety of research efforts are going on to find the cost-effective ways of performance improvement. By increasing the NAND flash channels in order to enhance the bandwidth through parallel processing, DRAM buffer which acts as a buffer cache between host(PC) and NAND flash has become the bottleneck point. To resolve this problem, this paper proposes an efficient low-cost scheme to increase SSD performance by improving DRAM buffer bandwidth through scheduling techniques which utilize DRAM multi-banks. When both host and NAND flash multi-channels request access to DRAM buffer concurrently, the proposed technique checks their destination and then schedules appropriately considering properties of DRAMs. It can reduce overheads of bank active time and row latency significantly and thus optimizes DRAM buffer bandwidth utilization. The result reveals that the proposed technique improves the SSD performance by 47.4% in read and 47.7% in write operation respectively compared to conventional methods with negligible changes and increases in the hardware.
PDF KSCI

3D Texture-Based Volume Graphic Architecture using Visibility-Ordered Division Rendering Algorithm (가시 순차적 분할 렌더링 알고리즘을 이용한 3차원 텍스쳐 기반의 볼륨 그래픽 구조)

김정우;이원종;박우찬;김형래;한탁돈
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.10c
- /
- pp.706-708
- /
- 2002
3차원 텍스쳐 기반의 볼륨 렌더링 기법은 추가적인 하드웨어가 필요 없기 때문에 개발비용이 적다는 장점이 있지만 다각형 기반 렌더링에 최적화 된 범용 그래픽 하드웨어를 그대로 사용하기 때문에 성능이 낮다는 단점이 있다. 이에 본 논문에서는 병렬 구조의 고성능 볼륨 렌더링 시스템에서 사용되던 볼륨 정보 분한 기법을 범용 그래픽 하드웨어에 적용하는 새로운 3차원 텍스쳐 기반 볼륨 그래픽 구조를 제안한다. 제안하는 구조를 통해 볼륨 정보를 분할하여 처리하면, 번용 그래픽 하드웨어가 갖고 있던 물리적 메모리 크기의 한계성을 극복할 수 있다. 또한 전체 해상도의 알파 블렌딩이 아닌 분할된 볼륨 정보 하나가 차지하는 크기만큼의 작은 해상도로 알파 블렌딩을 수행함으로서 렌더링 단계와 프레임 버퍼간의 데이터 전송량을 1/30로 줄이고 픽셀 캐시의 적중률을 99.9%에 근접하게 높일 수 있다.
PDF

Comparison of performance between MariaDB and PostgreSQL in terms of CPU overhead (CPU 오버헤드 분석을 통한 MariaDB와 PostgreSQL 성능 비교)

Lee, Dong-Ho;Song, Min-Chang;Cho, Young-Tae;Kim, Seung-Won
- Proceedings of the Korea Information Processing Society Conference
- /
- 2018.05a
- /
- pp.297-299
- /
- 2018
IT기업뿐만 아니라 다양한 기업들이 빅데이터, 인공지능, 블록체인 등 많은 양의 컴퓨터 자원 (CPU, RAM 등)을 요구하는 기술들을 서비스화 하고 있다. 따라서 한정된 차원으로 효율적인 서비스를 운영하는 것도 주요 이슈가 되고 있다. 본 논문에서는 오픈소스 RDBMS 인 MariaDB와 PostgreSQL을 프로파일링하여 CPU 자원 효율성 관점에서 비교한다. 연구 결과 인터넷 서비스 환경에서 MariaDB가 PostgreSQL보다 버퍼 풀로 인해 페이지 캐시 참조율이 낮고, page fault 수가 적어 CPU 오버헤드가 더 작다는 것을 입증하였다.
https://doi.org/10.3745/PKIPS.y2018m05a.297 인용 PDF

A Study on Optimizing LRU lock for Improving Parallel I/O Throughout in Manycore CPU Systems (매니코어 CPU 시스템에서의 병렬 I/O 성능 향상을 위한 LRU 최적화 기법 연구)

Byun, Eun-Kyu;Bang, Jiwoo;Gu, Gibeom;Oh, Kwang-Jin
- Proceedings of the Korea Information Processing Society Conference
- /
- 2022.11a
- /
- pp.2-4
- /
- 2022
매니코어 CPU 시스템에서의 병렬 I/O 는 현재의 리눅스 시스템의 LRU 관리 방법의 한계로 확장성에 문제를 가지고 있다. 본 연구에서는 이 문제를 해결했던 하기 위한 개선된 FinerLRU 를 제안한다. LRU 락을 최대 코어 개수만큼 증가시키고 세분화된 Lock 관리를 통해 버퍼 캐시를 사용하는 파일 시스템의 병렬 I/O 성능을 향상시킨다. 리눅스 5.18.11 에 제안한 방법을 구현하였으며, 64 개의 물리적 코어와 256 개의 논리적 코어를 가지는 Intel Knights Landing 프로세서를 이용한 실험을 통해 두 배 가량의 성능 향상을 얻을 수 있음을 확인하였다.
https://doi.org/10.3745/PKIPS.y2022m11a.2 인용 PDF

A File System for User Special Functions using Speed-based Prefetch in Embedded Multimedia Systems (임베디드 멀티미디어 재생기에서 속도기반 미리읽기를 이용한 사용자기능 지원 파일시스템)

Choe, Tae-Young;Yoon, Hyeon-Ju
- Journal of KIISE:Computing Practices and Letters
- /
- v.14 no.7
- /
- pp.625-635
- /
- 2008
Portable multimedia players have some different properties compared to general multimedia file server. Some of those properties are single user ownership, relatively low hardware performance, I/O burst by user special functions, and short software development cycles. Though suitable for processing multiple user requests at a time, the general multimedia file systems are not efficient for special user functions such as fast forwards/backwards. Soml' methods has been proposed to improve the performance and functionality, which the application programs give prediction hints to the file system. Unfortunately, they require the modification of all applications and recompilation. In this paper, we present a file system that efficiently supports user special functions in embedded multimedia systems using file block allocation, buffer-cache, and prefetch. A prefetch algorithm, SPRA (SPeed-based PRefetch Algorithm) predicts the next block using I/O patterns instead of hints from applications and it is resident in the file system, so doesn't affect application development process. From the experimental file system implementation and comparison with Linux readahead-based algorithms, the proposed system shows $4.29%{\sim}52.63%$ turnaround time and 1.01 to 3,09 times throughput in average.
PDF KSCI

A Transaction Level Simulator for Performance Analysis of Solid-State Disk (SSD) in PC Environment (PC향 SSD의 성능 분석을 위한 트랜잭션 수준 시뮬레이터)

Kim, Dong;Bang, Kwan-Hu;Ha, Seung-Hwan;Chung, Sung-Woo;Chung, Eui-Young
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.45 no.12
- /
- pp.57-64
- /
- 2008
In this paper, we propose a system-level simulator for the performance analysis of a Solid-State Disk (SSD) in PC environment by using TLM (Transaction Level Modeling) method. Our method provides quantitative analysis for a variety of architectural choices of PC system as well as SSD. Also, it drastically reduces the analysis time compared to the conventional RTL (Register Transfer Level) modeling method. To show the effectiveness of the proposed simulator, we performed several explorations of PC architecture as well as SSD. More specifically, we measured the performance impact of the hit rate of a cache buffer which temporarily stores the data from PC. Also, we analyzed the performance variation of SSD for various NAND Flash memories which show different response time with our simulator. These experimental results show that our simulator can be effectively utilized for the architecture exploration of SSD as well as PC.
PDF KSCI

An Optimization Technique in Memory System Performance for RealTime Embedded Systems (실시간 임베디드 시스템을 위한 메모리 시스템 성능 최적화 기법)

Yongin Kwon;Doosan Cho;Jongwon Lee;Yongjoo Kim;Jonghee Youn;Sanghyun Park;Yunheung Paek
- Proceedings of the Korea Information Processing Society Conference
- /
- 2008.11a
- /
- pp.882-884
- /
- 2008
통상 하드웨어 캐시의 크기보다 수십에서 수백배 큰 크기의 데이타를 랜덤하게 접근하는 경우 낮은 메모리 접근 지역성(locality)에 기인하여 캐시 메모리 성능이 급격히 저하되는 문제를 야기한다. 예를 들면, 현재 보편적으로 사용되고 있는 차량용 General Positioning System (GPS) 프로그램의 경우 최대 32개의 위성으로부터 데이터를 받아 수신단의 위치를 계산하는 부분이 핵심 모듈중의 하나 이며, 이는 전체 성능의 50% 이상을 차지한다. 이러한 모듈에서는 위성 신호를 실시간으로 받아 버퍼 메모리에 저장하며, 이때 필요한 데이터가 순차적으로 저장되지 못하기 때문에 랜덤하게 데이터를 읽어 사용하게 된다. 결과적으로 낮은 지역성에 기인하여 실시간 (realtime)안에 데이터 처리를 하기 어려운 문제에 직면하게 된다. 통상의 통신 응용의 알고리즘 상에 내재된(inherited) 낮은 메모리 접근 지역성을 개선하는 것은 알고리즘 상에서의 접근을 요구한다. 이는 높은 비용이 필요함으로 본 연구에서는 사용되는 데이터 구조를 변환하여 지역성을 높이는 방향으로 접근하였다. 결과적으로 핵심 모듈에서 2배, 전체 시스템 성능에서 14%를 개선할 수 있었다.
https://doi.org/10.3745/PKIPS.y2008m011a.882 인용 PDF

Performance Evaluation of Catalog Management Schemes for Distributed Main Memory Databases (분산 주기억장치 데이터베이스에서 카탈로그 관리 기법의 성능평가)

Jeong, Han-Ra;Hong, Eui-Kyeong;Kim, Myung
- Journal of Korea Multimedia Society
- /
- v.8 no.4
- /
- pp.439-449
- /
- 2005
Distributed main memory database management systems (DMM-DBMSs) store the database in main memories of the participating sites. They provide high performance through fast access to the local databases and high speed communication among the sites. Recently, a lot of research results on DMM- DBMSs has been reported. However, to the best of our knowledge, there is no known research result on the performance of the catalog management schemes for DMM-DBMSs. In this work, we evaluated the performance of the partitioned catalog management schemes through experimental analysis. First, we classified the partitioned catalog management schemes into three categories : Partitioned Catalogs Without Caching (PCWC), Partitioned Catalogs With Incremental Caching (PCWIC), and Partitioned Catalogs With Full Caching (PCWFC). Experiments were conducted by varying the number of sites, the number of terminals per site, buffer size, write query ratio, and local query ratio. Experiments show that PCWFC outperforms the other two schemes in all cases. It also means that the performance of PCWIC gradually increases as time goes by. It should be noted that PCWFC does not guarantee high performance for disk-based distributed DBMSs in cases when the workload of individual site is high, catalog write ratio is high, or remote data objects are accessed very frequently. Main reason that PCWFC outperforms for DMM-DBMSs is that query compilation and remote catalog access can be done in a very high speed, even when the catalogs of the remote data objects are frequently updated.
PDF

Implementation and Performance Analysis of Event Processing and Buffer Managing Techniques for DDS (고성능 데이터 발간/구독 미들웨어의 이벤트, 버퍼 처리 기술 및 성능 분석)

Yoon, Gunjae;Choi, Hoon
- Journal of KIISE
- /
- v.44 no.5
- /
- pp.449-459
- /
- 2017
Data Distribution Service (DDS) is a communication middleware that supports a flexible, scalable and real-time communication capability. This paper describes several techniques to improve the performance of DDS middleware. Detailed events for the internal behavior of the middleware are defined. A DDS message is disassembled into several submessages of independent, meaningful units for event-driven structuring in order to reduce the processing complexity. The proposed technique of history cache management is also described. It utilizes the fact that status access and random access to the history cache occur more frequently in the DDS. These methods have been implemented in the EchoDDS, the DDS implementation developed by our team, and it showed improved performance.
https://doi.org/10.5626/JOK.2017.44.5.449 인용 KSCI

Search Result 68, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)