Search | Korea Science

Evaluation of the Data Migration between CPU Memory and GPU Memory for a NVIDIA Pascal GPU Using Unified Memory (통합 메모리를 사용하는 NVIDIA 파스칼 GPU에서의 CPU 메모리와 GPU 메모리 간 데이터 통신 분석)

Shin, Philkyue;Hong, Seongsoo
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2018.07a
- /
- pp.7-10
- /
- 2018
통합 메모리는 CPU 메모리와 GPU 메모리 간의 데이터 통신을 개발자에게 투명하게 내재적으로 수행하는 소프트웨어 런타임 환경으로 개발자에게 CPU 메모리와 GPU 메모리가 통합된 하나의 메모리로 보이게 해준다. 통합 메모리는 장점에도 불구하고 아직 널리 사용되지 못하고 있는데 그 이유는 내재적으로 수행되는 데이터 통신의 오버헤드가 큰 것으로 알려져 있기 때문이다. 하지만 이 데이터 통신이 구체적으로 어떻게 이루어지고 오버헤드는 어떻게 발생하는지 분석한 연구는 아직 존재하지 않는다. 우리는 NVIDIA 사의 최신 GPU 마이크로아키텍처 중 하나인 파스칼을 사용하는 GPU를 대상으로 하여, 통합 메모리를 사용할 시 데이터 통신이 이루어지는 조건과 GPU 응용의 수행시간에 데이터 통신이 끼치는 영향을 실험을 통해 분석한다. 실험 결과 통합 메모리의 오버헤드는 두 가지 원인 때문에 발생한다. 첫째, 통합 메모리를 사용하면 CPU 또는 GPU가 데이터에 접근할 때마다 이 데이터는 CPU 또는 GPU 메모리로 옮겨지고 옮겨진 데이터는 제거된다. 따라서 재사용할 데이터도 제거되어 추가적인 데이터 통신이 발생하고, 이 데이터 통신의 지연시간은 GPU 응용의 수행시간에 더해진다. 둘째, 통합 메모리를 사용하면 데이터 통신과 커널들이 서로 다른 스트림에 할당되어도 동시에 수행되지 못한다. 따라서 GPU 응용의 수행시간은 동시에 수행되던 데이터 통신과 커널의 수행시간만큼 증가한다.
PDF

A Performance Study on CPU-GPU Data Transfers of Unified Memory Device (통합메모리 장치에서 CPU-GPU 데이터 전송성능 연구)

Kwon, Oh-Kyoung;Gu, Gibeom
- KIPS Transactions on Computer and Communication Systems
- /
- v.11 no.5
- /
- pp.133-138
- /
- 2022
Recently, as GPU performance has improved in HPC and artificial intelligence, its use is becoming more common, but GPU programming is still a big obstacle in terms of productivity. In particular, due to the difficulty of managing host memory and GPU memory separately, research is being actively conducted in terms of convenience and performance, and various CPU-GPU memory transfer programming methods are suggested. Meanwhile, recently many SoC (System on a Chip) products such as Apple M1 and NVIDIA Tegra that bundle CPU, GPU, and integrated memory into one large silicon package are emerging. In this study, data between CPU and GPU devices are used in such an integrated memory device and performance-related research is conducted during transmission. It shows different characteristics from the existing environment in which the host memory and GPU memory in the CPU are separated. Here, we want to compare performance by CPU-GPU data transmission method in NVIDIA SoC chips, which are integrated memory devices, and NVIDIA SMX-based V100 GPU devices. For the experimental workload for performance comparison, a two-dimensional matrix transposition example frequently used in HPC applications was used. We analyzed the following performance factors: the difference in GPU kernel performance according to the CPU-GPU memory transfer method for each GPU device, the transfer performance difference between page-locked memory and pageable memory, overall performance comparison, and performance comparison by workload size. Through this experiment, it was confirmed that the NVIDIA Xavier can maximize the benefits of integrated memory in the SoC chip by supporting I/O cache consistency.
https://doi.org/10.3745/KTCCS.2022.11.5.133 인용 PDF KSCI

Evaluation of Unified Index for Moving Object Databases (이동체 데이터베이스를 위한 통합 색인의 성능 평가)

Jung, Ji-Won;Ahn, Kyung-Hwan;Hong, Bong-Hee
- 한국공간정보시스템학회:학술대회논문집
- /
- 2004.12a
- /
- pp.98-104
- /
- 2004
이동체 데이터베이스에서 이동체의 과거 궤적을 저장하기 위해 메인 메모리 기반 색인을 이용하면 시간이 지남에 따라 데이터의 방대함으로 인해 주어진 메모리 용량이 부족하게 될 수 도 있다. 이를 해결하기 위해서는 메인 메모리에 상주하는 색인의 일부를 계속해서 디스크로 이주하는 정책이 필요하다. 이런 이주 정책을 지원하는 메인 메모리 이동체 색인이 통합 색인이다. 기존 통합 색인의 색인 이주 정책인 이동 서브트리 정책은 시간 축으로 가장 오래된 엔트리를 seed 노드로 선정하여 이동 서브트리를 구성한다. 이때 항상 시간적으로 가장 오래된 노드만을 디스크로 옮김으로써 과거에 대한 질의 시 비효율적이라는 문제점을 가진다. 본 논문에서는 이주를 위한 서브트리 구성에 필요한 seed 노드를 선택하기 위해, 질의 및 삽입 시에 참조되는 단말 노드들을 유지하는 LRU 버퍼를 이용한 색인 이주 정책을 제시한다. 이를 바탕으로 메인 메모리 기반 색인의 장점과 메모리 용량 부족의 문제를 해결한 통합색인을 구현하고, 다양한 성능 평가를 통하여 제시된 이주 정책이 기존의 이주 정책에 비해 삽입 성능뿐만 아니라 영역 질의에서도 우수함을 보인다.
PDF

A Study on Mechanism for Topology-aware based Granted Memory Management (토폴로지 인지 기반 공여 메모리 관리 메커니즘 연구)

Kim, Young-Ho;Ahn, Shin-Young;Lim, Eun-Ji;Cha, Gyu-Il
- Proceedings of the Korea Information Processing Society Conference
- /
- 2014.11a
- /
- pp.95-98
- /
- 2014
본 논문에서는 고속 저지연 네트워크로 연결된 다수의 분산 메모리 공여 노드를 통해 분산 통합 메모리 서비스를 제공하는 메모리 가상화 시스템에서, 대용량 메모리와 다수의 호스트 채널 어댑터(HCA)를 장착한 공여 노드의 프로세서, 물리 메모리, 그리고 HCA의 연결구조와 정보로부터 토폴로지 구조를 추출하고, 프로세서 중심으로 자원 연관성 정보를 나타내는 토폴로지 맵을 생성한다. 토폴로지 맵을 기반으로 공여 메모리의 초기화, 등록, 할당 및 메모리 데이터 전송 등을 수행하는 공여 메모리 관리 메커니즘을 제안한다. 이를 통해 대용량 분산 통합 메모리를 이용하는 빅데이터 처리 환경에서 참조 데이터 대한 메모리의 응답 시간 및 접근 지연 시간을 최소화시킬 수 있다.
https://doi.org/10.3745/PKIPS.y2014m11a.95 인용 PDF

A Study for Effective Management of Memory to MicroC/OS-II (MicroC/OS-II에서의 효율적인 메모리 관리에 관한 연구)

Young-Sik JEON;Shin Heu
- Proceedings of the Korea Information Processing Society Conference
- /
- 2008.11a
- /
- pp.798-801
- /
- 2008
MicroC/OS-II에서는 연속된 메모리 공간으로 구성된 파티션에서 고정 크기의 메모리 블록을 할당할 수 있는 방법을 제공하며, 이 파티션은 사용 가능한 메모리 블록의 개수를 유지하고, 모두 같은 크기를 갖는 메모리 블록을 단일 연결 리스트의 형태로 관리 한다. 이런 형태의 메모리 관리 시스템은 메모리 단편화 현상이 잘 일어나지 않지만 이런 단순한 구조로 메모리 공간을 통합 관리, 블록을 할당하고 반환하는데 필요한 검사등을 효율적으로 수행할 수 없다. 본 논문에서는 MicroC/OS-II에서의 단편화문제를 해결하는 방법에 더 나아가 효율적으로 메모리를 통합하고 관리하는 방법에 대해 제안하고 자 한다.
https://doi.org/10.3745/PKIPS.y2008m011a.798 인용 PDF

A Performance Study on CPU-GPU Data Transfers of NVIDIA Tegra and Tesla GPUs (NVIDIA Tegra와 Tesla GPU에서의 CPU-GPU 데이터 전송성능 연구)

Kwon, Oh-Kyoung;Gu, Gibeom
- Proceedings of the Korea Information Processing Society Conference
- /
- 2021.11a
- /
- pp.39-42
- /
- 2021
최근 HPC, 인공지능에서 GPU 성능이 향상되면서 사용이 보편화되고 있지만 GPU 프로그래밍은 난이도 측면에서 여전히 큰 장애물이다. 특히 호스트(host) 메모리와 GPU 메모리를 따로 관리해야 하는 어려움 때문에 편의성과 성능 측면에서 연구가 활발히 진행되고 있으며, 다양한 CPU-GPU 메모리 전송프로그래밍 방법들이 제시되고 있다. 본 연구는 NVIDIA Tegra 장치들과 NVIDIA SMX 기반 V100 GPU 카드에서 CPU-GPU 데이터 전송 기법별로 성능비교를 하고자 한다. 특히 NVIDIA Tegra 장치는 CPU와 GPU 통합메모리를 제공하고 있어서 CPU-GPU 메모리 전송방법의 관점에서 기존 GPU 장치와 다른 성능 특징을 보여준다. 성능비교를 위한 실험 워크로드는 HPC 응용프로그램에서 빈번하게 사용하는 2차원 행렬 전치 예제를 사용하였다. 실험을 통해 각 GPU 장치별로 CPU-GPU 메모리 전송 방법에 따른 GPU 커널 성능차이, 페이지 잠긴 메모리와 페이지 가능 메모리의 전송 성능차이, 마지막으로 전체 성능비교를 하였다.
https://doi.org/10.3745/PKIPS.y2021m11a.39 인용 PDF

Co-Validation Environment for Memory Card Compatibility Test (메모리 카드 호환성 테스트를 위한 통합 검증 환경)

Sung, Min-Young
- Journal of the Korea Society of Computer and Information
- /
- v.13 no.3
- /
- pp.57-63
- /
- 2008
As diverse memory cards based on NAND flash memory are getting popularity with consumer electronics such as digital camera, camcorder and MP3 player the compatibility problems between a newly developed memory card and existent host systems have become a main obstacle to time-to-market delivery of product. The common practice for memory card compatibility test is to use a real host system as a test bed. As an improved solution, an FPGA-based prototyping board can be used for emulating host systems. However, the above approaches require a long set-up time and have limitations in representing various host and device systems. In this paper, we propose a co-validation environment for compatibility test between memory card and host system using formal modeling based on Esterel language and co-simulation methodology. Finally, we demonstrate the usefulness of the proposed environment with a case study of real memory card development.
PDF

Processing-in-Memory Architecture for Enhanced Convolutional Neural Network Performance (합성곱 신경망 성능 향상을 위한 메모리 내 연산 구조)

Kun-Mo Jeong;Ho-Yun Youm;Han-Jun Kim
- Proceedings of the Korea Information Processing Society Conference
- /
- 2024.05a
- /
- pp.61-64
- /
- 2024
최근 고성능 컴퓨팅 장치의 수요 증가와 함께, 메모리 내에 연산을 가능하게 하는 하드웨어 구조가 새로이 발표되고 있다. 본 논문은 기존 DRAM 에 계산 유닛을 통합하는 새로운 메모리 내 연산 구조를 제안한다. 특히, 데이터 집약적인 합성곱 신경망 작업을 위해 최적화된 이 구조는 기존 메모리 구조를 사용하면서도 기존 구조에 분기를 추가함으로서 CNN 연산의 속도와 에너지 효율을 향상시킨다. VGG19, AlexNet, ResNet-50 과 같은 다양한 CNN 모델을 활용한 실험 결과, PINN 아키텍처는 기존 연구에 비해 최대 2.95 배까지의 성능 향상을 달성할 수 있음을 확인하였다. 이러한 결과는 PINN 기술이 저장 및 연산 성능의 한계를 극복하고, 머신 러닝과 같은 고급 어플리케이션의 요구를 충족시킬 수 있는 방안임을 시사한다.
https://doi.org/10.3745/PKIPS.y2024m05a.61 인용 PDF

Memory Exploration utilizing Scheduling Effects in High-level Synthesis (상위 단계 합성에서의 스케줄링 효과를 이용한 메모리 탐색)

서재원;김태환
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.04a
- /
- pp.1-3
- /
- 2002
본 논문에서는 상위 단계 합성(high-level synthesis)에서의 메모리 탐색(exploration) 문제를 푸는 데 있어, 현존하는 메모리 합성 시스템들이 간과했던 한 가지 중요한 성질인 메모리 탐색에서의 스케줄링 효과(scheduling effect)를 말하고자 한다. 그리고 이 성질을 충분히 활용할 수 있는 새로운 형태의 통합된 알고리즘을 제안한다. 이 알고리즘은 메모리 구성(configuration)과 스케줄을 동시에 고려한다는 것을 가장 큰 특징으로 하는데, 몇 개의 벤치마크 필터 회로에 대한 실험을 통해 제안된 탐색 기법이 빠른 시간 안에 최적에 가까운 메모리 구성을 찾는다는 것을 보일 수 있었다.
PDF

A Study of the Merging Layers of the Storage System for Flash-Based DBMS (플래시 메모리용 DBMS를 위한 스토리지 시스템의 계층 통합에 대한 연구)

Sim, Hyo-Gi;Yoon, Kyoung-Hon;Park, Sung-Min;Jung, Ho-Young;Cha, Jae-Hyuk;Kang, Soo-Yong
- Journal of Digital Contents Society
- /
- v.8 no.4
- /
- pp.593-600
- /
- 2007
Small computer systems such as mobile devices adopt NAND flash memories as their storage media. However, DBMS running on such systems are optimized to hard disks. When small computer systems use DBMS they usually use additional system layer, like FTL, that emulates flash memories as normal hard disks and DBMS cannot control flash memories directly. In this paper, we propose unified storage system that DBMS controls flash memories directly. We implemented the system in a real environment and proved the proposed system outperforms legacy systems.
PDF

Search Result 179, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)