• Title/Summary/Keyword: Memory Useage

Search Result 1, Processing Time 0.018 seconds

Lightweight of ONNX using Quantization-based Model Compression (양자화 기반의 모델 압축을 이용한 ONNX 경량화)

  • Chang, Duhyeuk;Lee, Jungsoo;Heo, Junyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.1
    • /
    • pp.93-98
    • /
    • 2021
  • Due to the development of deep learning and AI, the scale of the model has grown, and it has been integrated into other fields to blend into our lives. However, in environments with limited resources such as embedded devices, it is exist difficult to apply the model and problems such as power shortages. To solve this, lightweight methods such as clouding or offloading technologies, reducing the number of parameters in the model, or optimising calculations are proposed. In this paper, quantization of learned models is applied to ONNX models used in various framework interchange formats, neural network structure and inference performance are compared with existing models, and various module methods for quantization are analyzed. Experiments show that the size of weight parameter is compressed and the inference time is more optimized than before compared to the original model.