• Title/Summary/Keyword: NVIDIA

Search Result 163, Processing Time 0.024 seconds

The Oscillation Frequency of CML-based Multipath Ring Oscillators

  • Song, Sanquan;Kim, Byungsub;Xiong, Wei
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.6
    • /
    • pp.671-677
    • /
    • 2015
  • A novel phase interpolator (PI) based linear model of multipath ring oscillator (MPRO) is described in this paper. By modeling each delay cell as an ideal summer followed by a single pole RC filter, the oscillation frequency is derived for a 4-stage differential MPRO. It is analytically proved that the oscillation frequency increases with the growth of the forwarding factor ${\alpha}$, which is also confirmed quantitatively through simulation. Based on the proposed model, it is shown that the power to frequency ratio keeps constant as the speed increases. Running at the same speed, a 4-stage MPRO can outperform the corresponding single-stage ring oscillator (SPRO) with 27% power saving, making MPRO with a large forwarding factor ${\alpha}$ an attractive option for lower power applications.

Numerical Computing on Graphics Hardware

  • 임인성
    • 한국가시화정보학회:학술대회논문집
    • /
    • 2004.04a
    • /
    • pp.57-63
    • /
    • 2004
  • 최근 일반 범용 PC 에 장착되고 있는 ATI 나 NVIDIA 등의 그래픽스 가속기의 성능은 수년전과 비교할 때 비교가 안 될 정도의 빠른 속도를 자랑하고 있다. 이러한 속도 향상과 함께 급격하게 일어나고 있는 변화 중의 하나는 바로 기존의 고정된 기능의 그래픽스 파이프라인(fixed-function graphics pipeline)과는 달리 프로그래머가 가속기의 기능을 자유자재로 프로그래밍할 수 있도록 해주는 프로그래밍이 가능한 파이프라인(programmable graphics pipeline)의 출현이라 할 수 있다. 이러한 가속기에 장착되고 있는 GPU (Graphics Processing Unit)는 간단한 형태의 SIMD 프로세서라 할 수 있는데, 특히 GPU 의 한 부분인 픽셀 쉐이더는 그 처리 속도가 매우 높기 때문에 이를 통하여 기존의 수치 알고리즘을 병렬화 하려는 시도가 활발히 일어나고 있다. 본 강연에서는 다양한 수치 계산을 그래픽스 가속기를 사용하여 해결하려는 시도에 대하여 간단히 살펴본다.

  • PDF

Characterization of one Time-Sequential Stereoscopic 3D Display - Part I: Temporal Analysis -

  • Pierre, Boher;Thierry, Leroux;Collomb-Patton, Veronique
    • Journal of Information Display
    • /
    • v.11 no.2
    • /
    • pp.57-62
    • /
    • 2010
  • A method of characterizing time-sequential stereoscopic 3D displays based on the measurement of the temporal behavior of the systems vs. the grey levels is proposed. An Nvidia 3D vision kit with a 3D-ready SAMSUNG 2233RZ LCD display is characterized in the paper. OPTISCOPE SA especially designed for the precise measurements of the luminance and temporal behavior of LCD displays was used. The transmittance and response time of the shutter glasses was first evaluated. Then the grey-to-grey response times of the display were measured. The 2D and 3D behaviors of the display were then compared. Finally, the temporal behavior of the complete system was modeled, and the grey-level variations on one view were deduced as a function of the synchronization and level of the other eye. The main sources of imperfection were identified and quantified, and a full computation of the system performances was done.

The GPU-based Parallel Processing Algorithm for Fast Inspection of Semiconductor Wafers (반도체 웨이퍼 고속 검사를 위한 GPU 기반 병렬처리 알고리즘)

  • Park, Youngdae;Kim, Joon Seek;Joo, Hyonam
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.19 no.12
    • /
    • pp.1072-1080
    • /
    • 2013
  • In a the present day, many vision inspection techniques are used in productive industrial areas. In particular, in the semiconductor industry the vision inspection system for wafers is a very important system. Also, inspection techniques for semiconductor wafer production are required to ensure high precision and fast inspection. In order to achieve these objectives, parallel processing of the inspection algorithm is essentially needed. In this paper, we propose the GPU (Graphical Processing Unit)-based parallel processing algorithm for the fast inspection of semiconductor wafers. The proposed algorithm is implemented on GPU boards made by NVIDIA Company. The defect detection performance of the proposed algorithm implemented on the GPU is the same as if by a single CPU, but the execution time of the proposed method is about 210 times faster than the one with a single CPU.

Computationally Efficient Implementation of a Hamming Code Decoder Using Graphics Processing Unit

  • Islam, Md Shohidul;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of Communications and Networks
    • /
    • v.17 no.2
    • /
    • pp.198-202
    • /
    • 2015
  • This paper presents a computationally efficient implementation of a Hamming code decoder on a graphics processing unit (GPU) to support real-time software-defined radio, which is a software alternative for realizing wireless communication. The Hamming code algorithm is challenging to parallelize effectively on a GPU because it works on sparsely located data items with several conditional statements, leading to non-coalesced, long latency, global memory access, and huge thread divergence. To address these issues, we propose an optimized implementation of the Hamming code on the GPU to exploit the higher parallelism inherent in the algorithm. Experimental results using a compute unified device architecture (CUDA)-enabled NVIDIA GeForce GTX 560, including 335 cores, revealed that the proposed approach achieved a 99x speedup versus the equivalent CPU-based implementation.

A Study on demosaicking using DCGAN (DCGAN을 활용한 디모자이킹에 관한 연구)

  • Jang, Young-chae;Anisetti, Macro;Jeon, Gwanggil
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.792-794
    • /
    • 2018
  • 본 연구에서는 일반적으로 R,G,B 색 평면의 높은 상관관계를 이용하여 컬러 복원을 시도하던 기존의 방법의 문제점을 정의하고, DCGAN을 활용한 디모자이킹에 관한 연구를 소개한다. 약 2000장의 $256{\times}256$ 이미지를 학습데이터를 이용하였다. 보다 나은 결과를 위하여 R,G,B 색상 채널에 따라 각각의 네트워크를 구성하고 학습하였다. 제안 방법은 Intel Core i7-7770 CPU(3.60GHz), 16GB Memory,NVIDIA GeForce GTX1080Ti 구성의 Laptop에서 진행하였고, 평균 PSNR 22.5dB 정도의 성능을 보인다.

Matrix Addition & Scalar Multiplication on the GPU (GPU 기반 행렬 덧셈 및 스칼라 곱셈 알고리즘)

  • Park, Sangkun
    • Journal of Institute of Convergence Technology
    • /
    • v.8 no.1
    • /
    • pp.15-20
    • /
    • 2018
  • Recently a GPU has acquired programmability to perform general purpose computation fast by running thousands of threads concurrently. This paper presents a parallel GPU computation algorithm for dense matrix-matrix addition and scalar multiplication using OpenGL compute shader. It can play a very important role as a fundamental building block for many high-performance computing applications. Experimental results on NVIDIA Quad 4000 show that the proposed algorithm runs 21 times faster than CPU algorithm and achieves performance of 16 GFLOPS in single precision for dense matrices with size 4,096. Such performance proves that our algorithm is practical for real applications.

State-of-the-Art AI Computing Hardware Platform for Autonomous Vehicles (자율주행 인공지능 컴퓨팅 하드웨어 플랫폼 기술 동향)

  • Suk, J.H.;Lyuh, C.G.
    • Electronics and Telecommunications Trends
    • /
    • v.33 no.6
    • /
    • pp.107-117
    • /
    • 2018
  • In recent years, with the development of autonomous driving technology, high-performance artificial intelligence computing hardware platforms have been developed that can process multi-sensor data, object recognition, and vehicle control for autonomous vehicles. Most of these hardware platforms have been developed overseas, such as NVIDIA's DRIVE PX, Audi's zFAS, Intel GO, Mobile Eye's EyeQ, and BAIDU's Apollo Pilot. In Korea, however, ETRI's artificial intelligence computing platform has been developed. In this paper, we discuss the specifications, structure, performance, and development status centering on hardware platforms that support autonomous driving rather than the overall contents of autonomous driving technology.

Performance Comparison of Parallel Programming Frameworks in Digital Image Transformation

  • Shin, Woochang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.3
    • /
    • pp.1-7
    • /
    • 2019
  • Previously, parallel computing was mainly used in areas requiring high computing performance, but nowadays, multicore CPUs and GPUs have become widespread, and parallel programming advantages can be obtained even in a PC environment. Various parallel programming frameworks using multicore CPUs such as OpenMP and PPL have been announced. Nvidia and AMD have developed parallel programming platforms and APIs for program developers to take advantage of multicore GPUs on their graphics cards. In this paper, we develop digital image transformation programs that runs on each of the major parallel programming frameworks, and measure the execution time. We analyze the characteristics of each framework through the execution time comparison. Also a constant K indicating the ratio of program execution time between different parallel computing environments is presented. Using this, it is possible to predict rough execution time without implementing a parallel program.

Onboard dynamic RGB-D simultaneous localization and mapping for mobile robot navigation

  • Canovas, Bruce;Negre, Amaury;Rombaut, Michele
    • ETRI Journal
    • /
    • v.43 no.4
    • /
    • pp.617-629
    • /
    • 2021
  • Although the actual visual simultaneous localization and mapping (SLAM) algorithms provide highly accurate tracking and mapping, most algorithms are too heavy to run live on embedded devices. In addition, the maps they produce are often unsuitable for path planning. To mitigate these issues, we propose a completely closed-loop online dense RGB-D SLAM algorithm targeting autonomous indoor mobile robot navigation tasks. The proposed algorithm runs live on an NVIDIA Jetson board embedded on a two-wheel differential-drive robot. It exhibits lightweight three-dimensional mapping, room-scale consistency, accurate pose tracking, and robustness to moving objects. Further, we introduce a navigation strategy based on the proposed algorithm. Experimental results demonstrate the robustness of the proposed SLAM algorithm, its computational efficiency, and its benefits for on-the-fly navigation while mapping.