• Title/Summary/Keyword: knights landing

Search Result 6, Processing Time 0.024 seconds

Using the On-Package Memory of Manycore Processor for Improving Performance of MPI Intra-Node Communication (MPI 노드 내 통신 성능 향상을 위한 매니코어 프로세서의 온-패키지 메모리 활용)

  • Cho, Joong-Yeon;Jin, Hyun-Wook;Nam, Dukyun
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.124-131
    • /
    • 2017
  • The emerging next-generation manycore processors for high-performance computing are equipped with a high-bandwidth on-package memory along with the traditional host memory. The Multi-Channel DRAM (MCDRAM), for example, is the on-package memory of the Intel Xeon Phi Knights Landing (KNL) processor, and theoretically provides a four-times-higher bandwidth than the conventional DDR4 memory. In this paper, we suggest a mechanism to exploit MCDRAM for improving the performance of MPI intra-node communication. The experiment results show that the MPI intra-node communication performance can be improved by up to 272 % compared with the case where the DDR4 is utilized. Moreover, we analyze not only the performance impact of different MCDRAM-utilization mechanisms, but also that of core affinity for processes.

Enhancing the Performance of Multiple Parallel Applications using Heterogeneous Memory on the Intel's Next-Generation Many-core Processor (인텔 차세대 매니코어 프로세서에서의 다중 병렬 프로그램 성능 향상기법 연구)

  • Rho, Seungwoo;Kim, Seoyoung;Nam, Dukyun;Park, Geunchul;Kim, Jik-Soo
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.878-886
    • /
    • 2017
  • This paper discusses performance bottlenecks that may occur when executing high-performance computing MPI applications in the Intel's next generation many-core processor called Knights Landing(KNL), as well as effective resource allocation techniques to solve this problem. KNL is composed of a host processor to enable self-booting in addition to an existing accelerator consisting of a many-core processor, and it was released with a new type of on-package memory with improved bandwidth on top of existing DDR4 based memory. We empirically verified an improvement of the execution performance of multiple MPI applications and the overall system utilization ratio by studying a resource allocation method optimized for such new many-core processor architectures.

Optimizing LRU Lock Management in the Linux Kernel for Improving Parallel Write Throughout in Many-Core CPU Systems (매니코어 CPU 시스템의 병렬 쓰기 성능 향상을 위한 리눅스 커널의 LRU 관리 최적화 기법)

  • Eun-Kyu Byun;Gibeom Gu;Kwang-Jin Oh;Jiwoo Bang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.7
    • /
    • pp.209-216
    • /
    • 2023
  • Modern HPC systems are equipped with many-core CPUs with dozens of cores. When performing parallel I/O in such a system, there is a limit to scalability due to the problem of the LRU lock management policy of the Linux system. The study proposes an improved FinerLRU to solve this problem. Our new FinerLRU improves the parallel write performance of file systems using the buffer cache through granular lock management by increasing the number of LRU locks upto the maximum number of cores. The proposed method was implemented in Linux 5.18.11, and the performance was measured on two types of CPUs, Intel Icelake Xeon and Intel Knights landing, with different characteristics, and it was found that a performance improvement of about two times can be obtained in both types of systems.

A Study on Optimizing LRU lock for Improving Parallel I/O Throughout in Manycore CPU Systems (매니코어 CPU 시스템에서의 병렬 I/O 성능 향상을 위한 LRU 최적화 기법 연구)

  • Byun, Eun-Kyu;Bang, Jiwoo;Gu, Gibeom;Oh, Kwang-Jin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.2-4
    • /
    • 2022
  • 매니코어 CPU 시스템에서의 병렬 I/O 는 현재의 리눅스 시스템의 LRU 관리 방법의 한계로 확장성에 문제를 가지고 있다. 본 연구에서는 이 문제를 해결했던 하기 위한 개선된 FinerLRU 를 제안한다. LRU 락을 최대 코어 개수만큼 증가시키고 세분화된 Lock 관리를 통해 버퍼 캐시를 사용하는 파일 시스템의 병렬 I/O 성능을 향상시킨다. 리눅스 5.18.11 에 제안한 방법을 구현하였으며, 64 개의 물리적 코어와 256 개의 논리적 코어를 가지는 Intel Knights Landing 프로세서를 이용한 실험을 통해 두 배 가량의 성능 향상을 얻을 수 있음을 확인하였다.

A study on how to generate GPU usage statistics for each task in a cluster system operated by shared node policy (공유노드 정책으로 운영 중인 클러스터 시스템에서 작업별 GPU 사용 통계 생성 방안에 대한 연구)

  • Kwon, Min-Woo;Yoon, JunWeon;Hong, TaeYoung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.37-39
    • /
    • 2022
  • KISTI(한국과학기술정보연구원)는 슈퍼컴퓨터 5호기 메인시스템인 Nurion과 보조시스템인 Neuron을 연구자들에게 서비스하고 있다. Neuron은 메인시스템인 Nurion이 Intel Knights Landing 프로세서가 장착된 클러스터로 구성됨에 따라 인공지능, 빅데이터에 관한 연구 인프라 수요를 충족시키기 위해 GPU를 장착한 이기종 클러스터로 구성되어 있다. Neuron은 연구자들에게 효율적으로 계산 자원을 배분하기 위해 SLURM 작업배치스케줄러의 공유노드 정책을 이용하여 한 개의 계산노드에서 다수 개의 작업이 수행될 수 있는 환경으로 서비스되고 있다. 본 논문에서는 공유노드 정책으로 운영 중인 클러스터 시스템에서 작업별로 GPU 사용 통계 데이터를 생성하는 기법을 소개한다.

A Study of Dark Photon at the Electron-Positron Collider Experiments Using KISTI-5 Supercomputer

  • Park, Kihong;Cho, Kihyeon
    • Journal of Astronomy and Space Sciences
    • /
    • v.38 no.1
    • /
    • pp.55-63
    • /
    • 2021
  • The universe is well known to be consists of dark energy, dark matter and the standard model (SM) particles. The dark matter dominates the density of matter in the universe. The dark matter is thought to be linked with dark photon which are hypothetical hidden sector particles similar to photons in electromagnetism but potentially proposed as force carriers. Due to the extremely small cross-section of dark matter, a large amount of data is needed to be processed. Therefore, we need to optimize the central processing unit (CPU) time. In this work, using MadGraph5 as a simulation tool kit, we examined the CPU time, and cross-section of dark matter at the electron-positron collider considering three parameters including the center of mass energy, dark photon mass, and coupling constant. The signal process pertained to a dark photon, which couples only to heavy leptons. We only dealt with the case of dark photon decaying into two muons. We used the simplified model which covers dark matter particles and dark photon particles as well as the SM particles. To compare the CPU time of simulation, one or more cores of the KISTI-5 supercomputer of Nurion Knights Landing and Skylake and a local Linux machine were used. Our results can help optimize high-energy physics software through high-performance computing and enable the users to incorporate parallel processing.