• Title/Summary/Keyword: 추론 가속화

Search Result 22, Processing Time 0.026 seconds

Analysis of CNN Inference Using Xilinx DPU (Xilinx DPU를 사용한 CNN 추론 분석)

  • Kim, Chaeyoung;Suh, Taeweon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.60-62
    • /
    • 2019
  • 지능형 IoT 애플리케이션들을 효과적으로 사용하기 위해서는 추론 엔진을 Edge device로 포팅하는 것이 필수적이다. 그러나 컴퓨팅 자원이 제한적인 Edge 환경에서 computational cost가 상당히 큰 CNN 추론을 실시간으로 하는 것은 쉽지 않다. 이에, CNN 추론의 하드웨어 가속화의 필요성이 제기되어 활발한 연구가 진행되고 있으며, Xilinx, Intel 등에서도 하드웨어 가속화를 도와주는 툴을 개발하여 지속적으로 업그레이드하고 있다. 본 연구에서는 CIFAR-10 데이터베이스의 테스트 이미지 10,000개를 Xilinx 사의 CNN 추론 엔진인 DPU를 사용하여 Zynq UltraScale+ 보드에서 추론해보고, DPU 아키텍처에 따른 결과를 비교·분석했다. 병렬처리 수준을 높게 한 DPU는 그렇지 않은 DPU보다 소비전력 및 자원 사용량이 3배 이상 높았지만, 1.65배 좋은 성능을 보여 Trade-off 관계를 확인할 수 있었다.

An Evaluation of Inference Acceleration for Drone-based Real-time Object Detection (드론 기반 실시간 객체 식별을 위한 추론 가속화 평가)

  • Kwon, Seung-Sang;Moon, Yong-Hyuk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.408-410
    • /
    • 2022
  • 최근 데이터 획득 위치에 가장 근접하고, 저 수준의 계산력을 제공하는 엣지 기기를 중심으로 직접 딥러닝 추론을 수행하고자 하는 요구가 증가하고 있다. 본 논문에서는 드론에서 촬영한 교통 영상 데이터를 기반으로, 다수의 차량 종류 및 보행자를 식별하는 모델을 Jetson Nano 에 탑재하여 기본 성능을 측정한다. 더불어, 자원제약형 기기 환경에서 TensorRT 와 Deepstream 을 활용하여 객체 식별 모델의 연산 경량화 및 추론 가속화 성능을 극대화하기 위한 구현 및 실험을 수행하여 Anchor-based 및 Anchor-free 객체 식별 모델의 정확도와 실시간 대응력을 평가하고 논의한다.

A Study on Realtime Drone Object Detection Using On-board Deep Learning (온-보드에서의 딥러닝을 활용한 드론의 실시간 객체 인식 연구)

  • Lee, Jang-Woo;Kim, Joo-Young;Kim, Jae-Kyung;Kwon, Cheol-Hee
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.49 no.10
    • /
    • pp.883-892
    • /
    • 2021
  • This paper provides a process for developing deep learning-based aerial object detection models that can run in realtime on onboard. To improve object detection performance, we pre-process and augment the training data in the training stage. In addition, we perform transfer learning and apply a weighted cross-entropy method to reduce the variations of detection performance for each class. To improve the inference speed, we have generated inference acceleration engines with quantization. Then, we analyze the real-time performance and detection performance on custom aerial image dataset to verify generalization.

Penalized variable selection in mean-variance accelerated failure time models (평균-분산 가속화 실패시간 모형에서 벌점화 변수선택)

  • Kwon, Ji Hoon;Ha, Il Do
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.411-425
    • /
    • 2021
  • Accelerated failure time (AFT) model represents a linear relationship between the log-survival time and covariates. We are interested in the inference of covariate's effect affecting the variation of survival times in the AFT model. Thus, we need to model the variance as well as the mean of survival times. We call the resulting model mean and variance AFT (MV-AFT) model. In this paper, we propose a variable selection procedure of regression parameters of mean and variance in MV-AFT model using penalized likelihood function. For the variable selection, we study four penalty functions, i.e. least absolute shrinkage and selection operator (LASSO), adaptive lasso (ALASSO), smoothly clipped absolute deviation (SCAD) and hierarchical likelihood (HL). With this procedure we can select important covariates and estimate the regression parameters at the same time. The performance of the proposed method is evaluated using simulation studies. The proposed method is illustrated with a clinical example dataset.

Buffering analysis of CNN module based on RISC-V platform (RISC-V 플랫폼 기반 CNN 모듈의 버퍼링 분석)

  • Kim, Jin-Young;Lim, Seung-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.9-11
    • /
    • 2021
  • 최근 임베디드 엣지 컴퓨팅 디바이스에서 AI와 같은 인공지은 연산을 수행하여 AI 추론 연산의 가속화 및 분산화가 많이 이루어지고 있다. 엣지 디바이스는 임베디드 프로세서를 기반으로 AI의 가속 연산을 위해서 내부에 딥러닝 가속기를 포함하여 가속화시키는 시스템 구성을 하고 있다. 딥러닝 가속기는 복잡한 Neural Network 연산을 위한 데이터 이동이 많으며 외부 메모리와 내부 딥러닝 가속기간의 효율적인 데이터 이동 및 버퍼링이 필요하다. 본 연구에서는 엣지 디바이스 딥러닝 가속기 내부의 버퍼 구조를 모델링하고, 버퍼의 크기에 따른 버퍼링 효과를 분석해 보았다. 딥러닝 가속기 버퍼 구조는 RISC-V 프로세서 기반 가상 플랫폼에 구현되었다. 이를 통해서 딥러닝 모델에 따른 딥러닝 가속기 버퍼의 사용성을 분석할 수 있다.

Nonparametric Inference for Accelerated Life Testing (가속화 수명 실험에서의 비모수적 추론)

  • Kim Tai Kyoo
    • Journal of Korean Society for Quality Management
    • /
    • v.32 no.4
    • /
    • pp.242-251
    • /
    • 2004
  • Several statistical methods are introduced 1=o analyze the accelerated failure time data. Most frequently used method is the log-linear approach with parametric assumption. Since the accelerated failure time experiments are exposed to many environmental restrictions, parametric log-linear relationship might not be working properly to analyze the resulting data. The models proposed by Buckley and James(1979) and Stute(1993) could be useful in the situation where parametric log-linear method could not be applicable. Those methods are introduced in accelerated experimental situation under the thermal acceleration and discussed through an illustrated example.

Accelerated reasoning method for fuzzy control (퍼지제어를 위한 가속화 추론 방법)

  • 남세규;정인수
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1993.10a
    • /
    • pp.1058-1062
    • /
    • 1993
  • A fuzzy reasoning method is proposed for the implementation of control systems based on non-fuzzy microprocessors. The essence of the proposed method is to search the local active miles instead of the global rule base. Thus the reasoning is conveniently performed on a master cell as a fuzzy accelerating kernel, which is transformed from an active fuzzy cell. The interpolative reasoning is simplified via adopting the algebraic product of fulfillment for the conditional connective AND and the weighted average for the rule sentence connective ALSO.

  • PDF

Multi-DNN Acceleration Techniques for Embedded Systems with Tucker Decomposition and Hidden-layer-based Parallel Processing (터커 분해 및 은닉층 병렬처리를 통한 임베디드 시스템의 다중 DNN 가속화 기법)

  • Kim, Ji-Min;Kim, In-Mo;Kim, Myung-Sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.842-849
    • /
    • 2022
  • With the development of deep learning technology, there are many cases of using DNNs in embedded systems such as unmanned vehicles, drones, and robotics. Typically, in the case of an autonomous driving system, it is crucial to run several DNNs which have high accuracy results and large computation amount at the same time. However, running multiple DNNs simultaneously in an embedded system with relatively low performance increases the time required for the inference. This phenomenon may cause a problem of performing an abnormal function because the operation according to the inference result is not performed in time. To solve this problem, the solution proposed in this paper first reduces the computation by applying the Tucker decomposition to DNN models with big computation amount, and then, make DNN models run in parallel as much as possible in the unit of hidden layer inside the GPU. The experimental result shows that the DNN inference time decreases by up to 75.6% compared to the case before applying the proposed technique.

Classification of the Architectures of Web based Expert Systems (웹기반 전문가시스템의 구조 분류)

  • Lim, Gyoo-Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.4
    • /
    • pp.1-16
    • /
    • 2007
  • According to the expansion of the Internet use and the utilization of e-business, there are an increasing number of studies of intelligent-based systems for the preparation of ubiquitous environment. In addition, expert systems have been developed from Stand Alone types to web-based Client-Server types, which are now used in various Internet environments. In this paper, we investigated the environment of development for web-based expert systems, we classified and analyzed them according to type, and suggested general typical models of web-based expert systems and their architectures. We classified the web-based expert systems with two perspectives. First, we classified them into the Server Oriented model and Client Oriented model based on the Load Balancing aspect between client and server. Second, based on the degree of knowledge and inference-sharing, we classified them into the No Sharing model, Server Sharing model, Client Sharing model and Client-Server Sharing model. By combining them we derived eight types of web-based expert systems. We also analyzed the location problems of Knowledge Bases, Fact Bases, and Inference Engines on the Internet, and analyzed the pros & cons, the technologies, the considerations, and the service types for each model. With the framework proposed from this study, we can develop more efficient expert systems in future environments.

  • PDF

Deep Learning Based On-Device Augmented Reality System using Multiple Images (다중영상을 이용한 딥러닝 기반 온디바이스 증강현실 시스템)

  • Jeong, Taehyeon;Park, In Kyu
    • Journal of Broadcast Engineering
    • /
    • v.27 no.3
    • /
    • pp.341-350
    • /
    • 2022
  • In this paper, we propose a deep learning based on-device augmented reality (AR) system in which multiple input images are used to implement the correct occlusion in a real environment. The proposed system is composed of three technical steps; camera pose estimation, depth estimation, and object augmentation. Each step employs various mobile frameworks to optimize the processing on the on-device environment. Firstly, in the camera pose estimation stage, the massive computation involved in feature extraction is parallelized using OpenCL which is the GPU parallelization framework. Next, in depth estimation, monocular and multiple image-based depth image inference is accelerated using the mobile deep learning framework, i.e. TensorFlow Lite. Finally, object augmentation and occlusion handling are performed on the OpenGL ES mobile graphics framework. The proposed augmented reality system is implemented as an application in the Android environment. We evaluate the performance of the proposed system in terms of augmentation accuracy and the processing time in the mobile as well as PC environments.