• 제목/요약/키워드: Memory Latency

검색결과 362건 처리시간 0.018초

Memory Latency Penalty를 개선한 SIMT 기반 Stream Processor의 Memory Operation System Architecture 설계 (An Implementation of a Memory Operation System Architecture for Memory Latency Penalty Reduction in SIMT Based Stream Processor)

  • 이광엽
    • 전기전자학회논문지
    • /
    • 제18권3호
    • /
    • pp.392-397
    • /
    • 2014
  • 본 논문은 Memory Latency Penalty를 개선한 SIMT Architecture 기반 Stream Processor의 Memory Operation System Architecture를 제안한다. 제안하는 구조는 Non-Blocking Cache Architecture를 적용하여 기존의 Blocking Cache Architecture에서 발생하는 Cache Miss Penalty를 개선하였고 다양한 알고리즘의 처리속도를 비교하여 제안하는 Memory Operation System Architecture를 적용한 Stream Processor의 성능 향상을 검증하였다. 실험은 각 알고리즘의 Memory 명령어의 비율에 따라 향상된 성능을 측정하여 Stream Processor의 성능이 최소 8.2%에서 최대 46.5%까지 향상됨을 확인하였다.

A Memory-efficient Hand Segmentation Architecture for Hand Gesture Recognition in Low-power Mobile Devices

  • Choi, Sungpill;Park, Seongwook;Yoo, Hoi-Jun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • 제17권3호
    • /
    • pp.473-482
    • /
    • 2017
  • Hand gesture recognition is regarded as new Human Computer Interaction (HCI) technologies for the next generation of mobile devices. Previous hand gesture implementation requires a large memory and computation power for hand segmentation, which fails to give real-time interaction with mobile devices to users. Therefore, in this paper, we presents a low latency and memory-efficient hand segmentation architecture for natural hand gesture recognition. To obtain both high memory-efficiency and low latency, we propose a streaming hand contour tracing unit and a fast contour filling unit. As a result, it achieves 7.14 ms latency with only 34.8 KB on-chip memory, which are 1.65 times less latency and 1.68 times less on-chip memory, respectively, compare to the best-in-class.

동시에 실행되는 워크로드 조합에 따른 GPGPU 성능 분석 (Analysis of the GPGPU Performance for Various Combinations of Workloads Executed Concurrently)

  • 김동환;엄현상
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제23권3호
    • /
    • pp.165-170
    • /
    • 2017
  • GPGPU의 높은 연산 처리 능력을 활용하여 길고 복잡한 계산을 하려는 시도가 많이 있다. GPGPU 프로그램의 특성상 host와 device 사이에 메모리 복사가 필요하다. 해당 메모리 복사 latency가 길 경우 프로그램의 성능에 많은 영향을 준다. 그래서 GPGPU를 활용한 프로그래밍은 최적화에 따른 성능 차이가 크다. 여러 개의 GPGPU 프로그램을 동시에 실행시키면 메모리 복사와 GPGPU 컴퓨팅이 중첩이 되어 메모리 복사 latency hiding 효과를 기대할 수 있다. 이 논문에서는 메모리 복사 latency hiding을 분석한다. 또 메모리 복사의 성능을 높이기 위해 pinned memory를 사용했을 경우의 제약 조건에 따른 성능 예측 모델링 및 알고리즘을 제안하고 이를 바탕으로 실행할 워크로드를 선택하면 41%의 성능 향상을 보인다.

Latency Hiding based Warp Scheduling Policy for High Performance GPUs

  • Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권4호
    • /
    • pp.1-9
    • /
    • 2019
  • LRR(Loose Round Robin) warp scheduling policy for GPU architecture results in high warp-level parallelism and balanced loads across multiple warps. However, traditional LRR policy makes multiple warps execute long latency operations at the same time. In cases that no more warps to be issued under long latency, the throughput of GPUs may be degraded significantly. In this paper, we propose a new warp scheduling policy which utilizes latency hiding, leading to more utilized memory resources in high performance GPUs. The proposed warp scheduler prioritizes memory instruction based on GTO(Greedy Then Oldest) policy in order to provide reduced memory stalls. When no warps can execute memory instruction any more, the warp scheduler selects a warp for computation instruction by round robin manner. Furthermore, our proposed technique achieves high performance by using additional information about recently committed warps. According to our experimental results, our proposed technique improves GPU performance by 12.7% and 5.6% over LRR and GTO on average, respectively.

메모리 파일시스템에서 메모리 매핑을 이용한 파일 입출력의 오버헤드 분석 (Analyzing the Overhead of the Memory Mapped File I/O for In-Memory File Systems)

  • 최정식;한환수
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제22권10호
    • /
    • pp.497-503
    • /
    • 2016
  • 비휘발성 메모리 같은 차세대 저장장치의 등장으로 저장장치 지연시간은 거의 사라질 것이다. 예전에는 저장장치 지연시간이 가장 큰 문제였기 때문에 소프트웨어의 효율성은 중요한 문제가 아니었다. 하지만 이제는 소프트웨어 오버헤드가 해결해야 할 문제점으로 나타나고 있다. 소프트웨어 오버헤드를 최소화하기 위해 많은 연구자들은 메모리 매핑을 이용한 파일 입출력 기법을 제안하고 있다. 메모리 맵 파일 입출력 기법을 사용하면 기존 운영체제의 복잡한 파일 입출력 스택을 피할 수 있을 뿐 아니라 빈번한 사용자/커널 모드 변환도 최소화할 수 있다. 또한 다수의 메모리 복사 오버헤드도 최소화 할 수 있다. 하지만 메모리 맵 파일 입출력 기법에도 해결해야 할 문제점이 존재한다. 메모리 맵 파일 입출력 메커니즘도 느린 블록 디바이스를 효율적으로 관리하기 위해 설계된 기존 운영체제의 일부이기 때문이다. 본 논문에서는 메모리 맵 파일 입출력의 오버헤드 문제점을 설명하고 실험을 통해 그 문제점을 확인한다.

메모리 지연을 감추는 기법들 (Memory Latency Hiding Techniques)

  • 기안도
    • 전자통신동향분석
    • /
    • 제13권3호통권51호
    • /
    • pp.61-70
    • /
    • 1998
  • The obvious way to make a computer system more powerful is to make the processor as fast as possible. Furthermore, adopting a large number of such fast processors would be the next step. This multiprocessor system could be useful only if it distributes workload uniformly and if its processors are fully utilized. To achieve a higher processor utilization, memory access latency must be reduced as much as possible and even more the remaining latency must be hidden. The actual latency can be reduced by using fast logic and the effective latency can be reduced by using cache. This article discusses what the memory latency problem is, how serious it is by presenting analytical and simulation results, and existing techniques for coping with it; such as write-buffer, relaxed consistency model, multi-threading, data locality optimization, data forwarding, and data prefetching.

정상 및 기억손상 유도 동물의 수동회피반응에 대한 홍삼 사포닌의 효과 (Effects of Red Ginseng Saponin on Normal and Scopolamine-induced Memory Impairment of Mice in Passive Avoidance Task)

  • 진승하;경종수
    • Journal of Ginseng Research
    • /
    • 제20권1호
    • /
    • pp.7-14
    • /
    • 1996
  • This study was performed to examine the effect of red ginseng total saponin and extract on memory in mice using one trial step-down type passive avoidance method. Red ginseng total saponins (No. 1: PD/PT ratio=1.24, No. 2: PD/PT ratio=1.47) were prepared with the different mixing ratio by using the parts of red ginseng. In single administration of total saponin No. 1 (100 mg/ kg, bw) or No. 2 (50 mg/kg, bw) increased the latency time as compared with control group but was not statistically significant. Treatment of total saponin No. 1 (50 mg/kg, bw) for 10 days produced an increase of latency time but was not statistically significant. In scopolamine-induced memory deficient mice total saponin No. 1 (50 mg/kg, bw) and No. 2 (100 mg/kg, bw) significantly improved the latency time. These results show that red ginseng total saponin may improve the memory of sco-polamine-induced memory deficient mice and have nootropic activity.

  • PDF

Low-latency SAO Architecture and its SIMD Optimization for HEVC Decoder

  • Kim, Yong-Hwan;Kim, Dong-Hyeok;Yi, Joo-Young;Kim, Je-Woo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제3권1호
    • /
    • pp.1-9
    • /
    • 2014
  • This paper proposes a low-latency Sample Adaptive Offset filter (SAO) architecture and its Single Instruction Multiple Data (SIMD) optimization scheme to achieve fast High Efficiency Video Coding (HEVC) decoding in a multi-core environment. According to the HEVC standard and its Test Model (HM), SAO operation is performed only at the picture level. Most realtime decoders, however, execute their sub-modules on a Coding Tree Unit (CTU) basis to reduce the latency and memory bandwidth. The proposed low-latency SAO architecture has the following advantages over picture-based SAO: 1) significantly less memory requirements, and 2) low-latency property enabling efficient pipelined multi-core decoding. In addition, SIMD optimization of SAO filtering can reduce the SAO filtering time significantly. The simulation results showed that the proposed low-latency SAO architecture with significantly less memory usage, produces a similar decoding time as a picture-based SAO in single-core decoding. Furthermore, the SIMD optimization scheme reduces the SAO filtering time by approximately 509% and increases the total decoding speed by approximately 7% compared to the existing look-up table approach of HM.

Gen-Z memory pool system implementation and performance measurement

  • Kwon, Won-ok;Sok, Song-Woo;Park, Chan-ho;Oh, Myeong-Hoon;Hong, Seokbin
    • ETRI Journal
    • /
    • 제44권3호
    • /
    • pp.450-461
    • /
    • 2022
  • The Gen-Z protocol is a memory semantic protocol between the memory and CPU used in computer architectures with large memory pools. This study presents the implementation of the Gen-Z hardware system configured using Gen-Z specification 1.0 and reports its performance. A hardware prototype of a DDR4 Gen-Z memory pool with an optimized character, a block device driver, and a file system for the Gen-Z hardware was designed. The Gen-Z IP was targeted to the FPGA, and a 512 GB Gen-Z memory pool was configured on an ×86 server. In the experiments, the latency and throughput of the Gen-Z memory were measured and compared with those of the local memory, SATA SSD, and NVMe using character or block device interfaces. The Gen-Z hardware exhibited superior throughput and latency performance compared with SATA SSD and NVMe at block sizes under 4 kB. The MySQL and File IO benchmark of Gen-Z showed good write performance in all block sizes and threads. Besides, it showed low latency in RocksDB's fillseq dbbench using the ext4 direct access filesystem.

Impairments of Learning and Memory Following Intracerebroventricular Administration of AF64A in Rats

  • Lim, Dong-Koo;Oh, Youm-Hee;Kim, Han-Soo
    • Archives of Pharmacal Research
    • /
    • 제24권3호
    • /
    • pp.234-239
    • /
    • 2001
  • Three types of learning and memory tests (Morris water maze, active and passive avoidance) were performed in rats following intracerebroventricular infusion of ethylcholine aziridium (AF64A). In Morris water maze, AF64A-treated rats showed the delayed latencies to find the platform iron 6th day after the infusion. In pretrained rats, AF64A caused the significant delay of latency at 7th days but not 8th day. In the active avoidance for the pretrained rats, the escape latency was significantly delayed in AF64A-treatment. The percentages of avoidance in AF64A-treated rats were less increased than those in the control. Especially, the percentage of no response in the AF64A-treated rats was markedly increased in the first half trials. In the passive avoidance, AF64A-treated rats shortened the latency 1.5 h after the electronic shock, but not 24 h. AF64A also caused the pretrained rats to shorten the latency 7th day after the infusion, but not 8th day. These results indicate that AF64A might impair the learning and memory. However, these results indicate that the disturbed memory by AF64A might rapidly recover after the first retrain. Furthermore, these results suggest that AF64A may be a useful agent for the animal model of learning for Spatial cognition .

  • PDF