Search | Korea Science

AB9: A neural processor for inference acceleration

Cho, Yong Cheol Peter;Chung, Jaehoon;Yang, Jeongmin;Lyuh, Chun-Gi;Kim, HyunMi;Kim, Chan;Ham, Je-seok;Choi, Minseok;Shin, Kyoungseon;Han, Jinho;Kwon, Youngsu
- ETRI Journal
- /
- v.42 no.4
- /
- pp.491-504
- /
- 2020
We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.
https://doi.org/10.4218/etrij.2020-0134 인용 PDF KSCI

Trends in AI Processor Technology (인공지능프로세서 기술 동향)

Lee, M.Y.;Chung, J.;Lee, J.H.;Han, J.H.;Kwon, Y.S.
- Electronics and Telecommunications Trends
- /
- v.35 no.3
- /
- pp.66-75
- /
- 2020
As the increasing expectations of a practical AI (Artificial Intelligence) service makes AI algorithms more complicated, an efficient processor to process AI algorithms is required. To meet this requirement, processors optimized for parallel processing, such as GPUs (Graphics Processing Units), have been widely employed. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted. This paper briefly introduces an AI processor especially for inference acceleration, developed by the Electronics and Telecommunications Research Institute, South Korea., and other global vendors for mobile and server platforms. However, the GPU has a generalized structure for various applications, so it is not optimized for the AI algorithm. Therefore, research on the development of AI processors optimized for AI algorithm processing has been actively conducted.
https://doi.org/10.22648/ETRI.2020.J.350307 인용 PDF

Acceleration of 2D Image Based Flow Visualization using GPU (GPU를 이용한 2차원 영상 기반 유동 가시화 기법의 가속)

Lee, Joong-Youn
- Proceedings of the Korea Contents Association Conference
- /
- 2007.11a
- /
- pp.543-546
- /
- 2007
Flow visualization is one of visualization techniques and it means a visual expression of vector data using 2D or 3D graphics. It aims for human to easily find and understand a special feature of the vector data. The Image Based Flow Visualization (IBFV) is one of the fastest technique in the dense integration based flow visualization techniques. In this paper, IBFV is accelerated and implemented using commodity GPU. Especially, mesh advection is accelerated at the vertex program.
PDF

Trends of Hardware Acceleration Technology in Wed Browser (HW 가속 기반 웹 고속화 기술동향)

Lee, J.H.;Cho, H.W.;Kim, D.H.;Lee, H.S.;Yoon, S.J.;Ryu, C.;Cho, C.S.
- Electronics and Telecommunications Trends
- /
- v.31 no.4
- /
- pp.65-76
- /
- 2016
특정 제조사의 단말 또는 운영체제에 의존성이 없는 플랫폼 독립적인 웹은 높은 이식성, 소프트웨어의 재활용, 개발 생산성, 풍부한 개발자 존재, 유지 보수 면에서 장점을 가지나, 화려한 UI/UX를 제공하는 네이티브 응용에 비해 낮은 성능으로 웹 기반의 응용 개발 및 보급이 크게 활성화되지 못했다. 한편 데스크톱은 물론 모바일 단말의 멀티코어 기반 Graphic Processing Unit(GPU), CPU 탑재 등 HW 고사양화와 웹 응용에서도 HW 가속 기능을 활용할 수 있는 표준 제공으로 성능 제약을 극복할 수 있게 되었다. 본고에서는 GPU 발전동향을 살펴보고, 고속 렌더링 및 병렬 연산처리를 요구하는 웹 응용이 GPU기반 HW 가속 기능을 활용할 수 있는 크로노스 그룹의 그래픽 가속(Web Graphics Library: WebGL) 및 컴퓨팅(Web Computing Language: WebCL) 지원 표준 규격을 정리한다. 또한, 최근 차세대 GPU Application Programming Interface(API)로 발표된 Vulkan에 대해 알아보고, 웹 고속화 기술에 적용 가능성에 대해 전망한다.
PDF

Acceleration Hardware Technology of 3D Graphics (3D 그래픽스 가속 하드웨어 기술)

Cho, S.H.;Park, S.M.;Eum, N.W.
- Electronics and Telecommunications Trends
- /
- v.22 no.5
- /
- pp.69-77
- /
- 2007
3D 그래픽스 관련 산업의 눈부신 성장은 GPU 기술의 발전을 기반으로 이루어졌다. GPU는 기존의 고정된 기능의 파이프라인을 벗어나 프로그램 가능한 형태로 발전하였으며 GPU의 프로그램 능력과 성능의 꾸준한 향상이 이루어지고 있다. 최근에는 GPU 내부의 연산 집중도의 불균형을 해결하기 위한 연구와 GPU의 연산능력을 다른 응용분야에 이용하기 위한 연구가 진행중에 있다. GPU를 이용한 3D 그래픽스 응용프로그램 개발을 위해서 산업 표준의 API들이 존재하는데 데스크톱용 API에서 필수 기능만을 골라 간략화한 모바일 기기용 프로파일 또한 정의되고 있다. 모바일 기기에 사용되는 GPU도 프로그램 가능한 구조로 진화하고 있으며 대중화되기 위해서는 전력소모를 낮추기 위한 노력이 필요하다.
https://doi.org/10.22648/ETRI.2007.J.220507 인용 PDF

Acceleration of Radial Gradient Paint Processor for Mobile Device (모바일 기기에서의 방사형 그라디언트 페인트 가속)

Kim, Jin-Woo;Park, Jin-Hong;Han, Tack-Don
- Proceedings of the Korea Information Processing Society Conference
- /
- 2011.04a
- /
- pp.530-533
- /
- 2011
방사형 그라디언트 페인트(radial gradient paint)는 벡터 그래픽스(vector graphics)에서 적은 정보로 다양한 효과를 적용시킬 수 있는 방법이다. 기본적으로 이 방법은 곱하기, 나누기, 제곱근 등의 복잡한 연산이 필요하기 때문에 모바일 같은 저성능 환경에 적합하지 않았다. 하지만 최근 모바일 기기들은 SIMD 연산 지원 및 고성능의 GPU 탑재 등으로 성능이 향상됨에 따라 이러한 문제를 해결할 수 있게 되었다. 본 논문은 ARM의 SIMD연산인 NEON을 이용하여 최대 2.6배의 성능을 가속시켰으며 GPU의 쉐이더를 이용하여 4.9배의 성능을 가속하였다.
https://doi.org/10.3745/PKIPS.y2011m04a.530 인용 PDF

Position Based Triangulation for High Performance Particle Based Fluid Simulation (위치 기반 삼각화를 이용한 입자 기반 유체 시뮬레이션 가속화 기법)

Hong, Manki;Im, Jaeho;Kim, Chang-Hun;Byun, Hae Won
- Journal of the Korea Computer Graphics Society
- /
- v.23 no.1
- /
- pp.25-32
- /
- 2017
This paper proposes a novel acceleration method for particle based large scale fluid simulation. Traditional particle-based fluid simulation has been implemented by interacting with physical quantities of neighbor particles through the Smoothed Particle Hydrodynamics(SPH) technique[1]. SPH method has the characteristic that there is no visible change compared to the computation amount in a part where the particle movement is small, such as a calm surface or inter-fluid. This becomes more prominent as the number of particles increases. Previous work has attempted to reduce the amount of spare computation by adaptively dividing each part of the fluid. In this paper, we propose a technique to calculate the motion of the entire particles by using the physical quantities of the near sampled particles by sampling the particles inside the fluid at regular intervals and using them as reference points of the fluid motion. We propose a technique to adaptively generate a triangle map based on the position of the sampled particles in order to efficiently search for nearby particles, and we have been able to interpolate the physical quantities of particles using the barycentric coordinate system. The proposed acceleration technique does not perform any additional correction for two classes of fluid particles. Our technique shows a large improvement in speed as the number of particles increases. The proposed technique also does not interfere with the fine movement of the fluid surface particles.
https://doi.org/10.15701/kcgs.2017.23.1.25 인용 PDF KSCI

A study on the ZF-buffer algorithm for Ray-tracing Acceleration (광선추적법의 속도개선을 위한 ZF-버퍼 알고리즘 연구)

Kim, Sehyun;Yoon, Kyung-hyun
- Journal of the Korea Computer Graphics Society
- /
- v.6 no.1
- /
- pp.29-36
- /
- 2000
In this work, we propose ZF-buffer algorithm in order to accelerate the intersection test of ray-tracing algorithm. ZF-buffer is used in the preprocessing of ray-tracing and records the pointer that points to a parent face of a depth value(z value) of an object determined in Z-buffer. As a result, the face which intersects with the first ray can be determined easily by using the pointer stored in F-buffer. Though ZF-buffer and vista-buffer resemble each other, the difference between the two methods is that what ZF-buffer records is not bounding volume but the pointer of a displayable face. We applied the ZF-buffer algorithm for the first ray to Utah teapot which consists of 9216 polygons. By comparing the elapse time of our method with vista-buffer algorithm, we can acquire improvement in speed that it is 3 times faster than vista-buffer algorithm. We expanded our algorithm to the second ray.
PDF

Geometric Processing for Freeform Surfaces Based on High-Precision Torus Patch Approximation (토러스 패치 기반의 정밀 근사를 이용한 자유곡면의 기하학적 처리)

Park, Youngjin;Hong, Q Youn;Kim, Myung-Soo
- Journal of the Korea Computer Graphics Society
- /
- v.25 no.3
- /
- pp.93-103
- /
- 2019
We introduce a geometric processing method for freeform surfaces based on high-precision torus patch approximation, a new spatial data structure for efficient geometric operations on freeform surfaces. A torus patch fits the freeform surface with flexibility: it can handle not only positive and negative curvature but also a zero curvature. It is possible to precisely approximate the surface regardless of the convexity/concavity of the surface. Unlike the traditional method, a torus patch easily bounds the surface normal, and the offset of the torus becomes a torus again, thus helps the acceleration of various geometric operations. We have shown that the torus patch's approximation accuracy of the freeform surface is high by measuring the upper bound of the two-sided Hausdorff distance between the freeform surface and set of torus patches. Using the method, it can be easily processed to detect an intersection curve between two freeform surfaces and find the offset surface of the freeform surface.
https://doi.org/10.15701/kcgs.2019.25.3.93 인용 PDF KSCI

Quad Tree Based 2D Smoke Super-resolution with CNN (CNN을 이용한 Quad Tree 기반 2D Smoke Super-resolution)

Hong, Byeongsun;Park, Jihyeok;Choi, Myungjin;Kim, Changhun
- Journal of the Korea Computer Graphics Society
- /
- v.25 no.3
- /
- pp.105-113
- /
- 2019
Physically-based fluid simulation takes a lot of time for high resolution. To solve this problem, there are studies that make up the limitation of low resolution fluid simulation by using deep running. Among them, Super-resolution, which converts low-resolution simulation data to high resolution is under way. However, traditional techniques require to the entire space where there are no density data, so there are problems that are inefficient in terms of the full simulation speed and that cannot be computed with the lack of GPU memory as input resolution increases. In this paper, we propose a new method that divides and classifies 2D smoke simulation data into the space using the quad tree, one of the spatial partitioning methods, and performs Super-resolution only required space. This technique accelerates the simulation speed by computing only necessary space. It also processes the divided input data, which can solve GPU memory problems.
https://doi.org/10.15701/kcgs.2019.25.3.105 인용 PDF KSCI

Search Result 49, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)