• 제목/요약/키워드: deep learning compiler

검색결과 4건 처리시간 0.02초

인공지능 프로세서 컴파일러 개발 동향 (Trends of Compiler Development for AI Processor)

  • 김진규;김혜지;조용철;김현미;여준기;한진호;권영수
    • 전자통신동향분석
    • /
    • 제36권2호
    • /
    • pp.32-42
    • /
    • 2021
  • The rapid growth of deep-learning applications has invoked the R&D of artificial intelligence (AI) processors. A dedicated software framework such as a compiler and runtime APIs is required to achieve maximum processor performance. There are various compilers and frameworks for AI training and inference. In this study, we present the features and characteristics of AI compilers, training frameworks, and inference engines. In addition, we focus on the internals of compiler frameworks, which are based on either basic linear algebra subprograms or intermediate representation. For an in-depth insight, we present the compiler infrastructure, internal components, and operation flow of ETRI's "AI-Ware." The software framework's significant role is evidenced from the optimized neural processing unit code produced by the compiler after various optimization passes, such as scheduling, architecture-considering optimization, schedule selection, and power optimization. We conclude the study with thoughts about the future of state-of-the-art AI compilers.

NEST-C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators

  • Jeman Park;Misun Yu;Jinse Kwon;Junmo Park;Jemin Lee;Yongin Kwon
    • ETRI Journal
    • /
    • 제46권5호
    • /
    • pp.851-864
    • /
    • 2024
  • Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.

PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units

  • Misun Yu;Yongin Kwon;Jemin Lee;Jeman Park;Junmo Park;Taeho Kim
    • ETRI Journal
    • /
    • 제45권2호
    • /
    • pp.318-328
    • /
    • 2023
  • Recently, embedded systems, such as mobile platforms, have multiple processing units that can operate in parallel, such as centralized processing units (CPUs) and neural processing units (NPUs). We can use deep-learning compilers to generate machine code optimized for these embedded systems from a deep neural network (DNN). However, the deep-learning compilers proposed so far generate codes that sequentially execute DNN operators on a single processing unit or parallel codes for graphic processing units (GPUs). In this study, we propose PartitionTuner, an operator scheduler for deep-learning compilers that supports multiple heterogeneous PUs including CPUs and NPUs. PartitionTuner can generate an operator-scheduling plan that uses all available PUs simultaneously to minimize overall DNN inference time. Operator scheduling is based on the analysis of DNN architecture and the performance profiles of individual and group operators measured on heterogeneous processing units. By the experiments for seven DNNs, PartitionTuner generates scheduling plans that perform 5.03% better than a static type-based operator-scheduling technique for SqueezeNet. In addition, PartitionTuner outperforms recent profiling-based operator-scheduling techniques for ResNet50, ResNet18, and SqueezeNet by 7.18%, 5.36%, and 2.73%, respectively.

지시문을 활용한 동형암호 프로그램 코드 분할 컴파일러 (Annotation-guided Code Partitioning Compiler for Homomorphic Encryption Program)

  • 김동관;이용우;천선영;최희림;이재호;염호윤;김한준
    • 정보처리학회 논문지
    • /
    • 제13권7호
    • /
    • pp.291-298
    • /
    • 2024
  • 클라우드 컴퓨팅이 널리 사용되면서, 데이터 유출에 대한 관심도 같이 증가하고 있다. 동형암호는 데이터를 암호화된 채로 클라우드 서버에서 연산을 수행함으로써 해당 문제를 해결할 수 있다. 그러나, 프로그램 전체를 동형암호로 연산하는 것은 큰 오버헤드를 가지고 있다. 프로그램의 일부분만 동형암호를 사용하는 것은 오버헤드를 줄일 수 있지만, 사용자가 직접 프로그램의 코드를 분할하는 것은 시간이 오래 걸리는 작업이고 또한 에러를 발생시킬 수 있다. 이 연구는 지시문을 활용하여 동형암호 프로그램의 코드를 분할하는 컴파일러인 Heapa를 제시하였다. 사용자가 프로그램에 클라우드 컴퓨팅 영역에 대한 코드를 지시문으로 삽입하면 Heapa는 클라우드 서버와 호스트사이의 통신 및 암호화를 적용시킨 계획을 세우고, 분할된 프로그램을 생성한다. Heapa는 영역 단위의 지시문뿐만 아니라 연산 단위의 지시문도 사용가능하여 프로그램을 더 세밀한 단계로 분할 가능하다. 이 연구에선 6개의 머신러닝 및 딥러닝 어플리케이션을 통해 컴파일러의 성능을 측정했으며, Heapa는 기존 동형암호를 활용한 클라우드 컴퓨팅보다 3.61배 개선된 성능을 보여주었다.