• Title/Summary/Keyword: CPU Time

Search Result 939, Processing Time 0.022 seconds

Three-dimensional Wave Propagation Modeling using OpenACC and GPU (OpenACC와 GPU를 이용한 3차원 파동 전파 모델링)

  • Kim, Ahreum;Lee, Jongwoo;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.20 no.2
    • /
    • pp.72-77
    • /
    • 2017
  • We calculated 3D frequency- and Laplace-domain wavefields using time-domain modeling and Fourier transform or Laplace transform. We adopted OpenACC and GPU for an efficient parallel calculation. The OpenACC makes it easy to use GPU accelerators by adding directives in conventional C, C++, and Fortran programming languages. Accordingly, one doesn't have to learn new GPGPU programming languages such as CUDA or OpenCL to use GPU. An OpenACC program allocates GPU memory, transfers data between the host CPU and GPU devices and performs GPU operations automatically or following user-defined directives. We compared performance of 3D wave propagation modeling programs using OpenACC and GPU to that using single-core CPU through numerical tests. Results using a homogeneous model and the SEG/EAGE salt model show that the OpenACC programs are approximately 53 and 30 times faster than those using single-core CPU.

Analysis of Tensor Processing Unit and Simulation Using Python (텐서 처리부의 분석 및 파이썬을 이용한 모의실행)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.3
    • /
    • pp.165-171
    • /
    • 2019
  • The study of the computer architecture has shown that major improvements in price-to-energy performance stems from domain-specific hardware development. This paper analyzes the tensor processing unit (TPU) ASIC which can accelerate the reasoning of the artificial neural network (NN). The core device of the TPU is a MAC matrix multiplier capable of high-speed operation and software-managed on-chip memory. The execution model of the TPU can meet the reaction time requirements of the artificial neural network better than the existing CPU and the GPU execution models, with the small area and the low power consumption even though it has many MAC and large memory. Utilizing the TPU for the tensor flow benchmark framework, it can achieve higher performance and better power efficiency than the CPU or CPU. In this paper, we analyze TPU, simulate the Python modeled OpenTPU, and synthesize the matrix multiplication unit, which is the key hardware.

Parallel Algorithm of Conjugate Gradient Solver using OpenGL Compute Shader

  • Va, Hongly;Lee, Do-keyong;Hong, Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.1-9
    • /
    • 2021
  • OpenGL compute shader is a shader stage that operate differently from other shader stage and it can be used for the calculating purpose of any data in parallel. This paper proposes a GPU-based parallel algorithm for computing sparse linear systems through conjugate gradient using an iterative method, which perform calculation on OpenGL compute shader. Basically, this sparse linear solver is used to solve large linear systems such as symmetric positive definite matrix. Four well-known matrix formats (Dense, COO, ELL and CSR) have been used for matrix storage. The performance comparison from our experimental tests using eight sparse matrices shows that GPU-based linear solving system much faster than CPU-based linear solving system with the best average computing time 0.64ms in GPU-based and 15.37ms in CPU-based.

Development of the sediment transport model using GPU arithmetic (GPU 연산을 활용한 유사이송 예측모형 개발)

  • Noh, Junsu;Son, Sangyoung
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.7
    • /
    • pp.431-438
    • /
    • 2023
  • Many shorelines are facing the beach erosion. Considering the climate change and the increment of coastal population, the erosion problem could be accelerated. To address this issue, developing a sediment transport model for rapidly predicting terrain change is crucial. In this study, a sediment transport model based on GPU parallel arithmetic was introduced, and it was supposed to simulate the terrain change well with a higher computing speed compared to the CPU based model. We also aim to investigate the model performance and the GPU computational efficiency. We applied several dam break cases to verified model, and we found that the simulated results were close to the observed results. The computational efficiency of GPU was defined by comparing operation time of CPU based model, and it showed that the GPU based model were more efficient than the CPU based model.

An Application-Specific and Adaptive Power Management Technique for Portable Systems (휴대장치를 위한 응용프로그램 특성에 따른 적응형 전력관리 기법)

  • Egger, Bernhard;Lee, Jae-Jin;Shin, Heon-Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.8
    • /
    • pp.367-376
    • /
    • 2007
  • In this paper, we introduce an application-specific and adaptive power management technique for portable systems that support dynamic voltage scaling (DVS). We exploit both the idle time of multitasking systems running soft real-time tasks as well as memory- or CPU-bound code regions. Detailed power and execution time profiles guide an adaptive power manager (APM) that is linked to the operating system. A post-pass optimizer marks candidate regions for DVS by inserting calls to the APM. At runtime, the APM monitors the CPU's performance counters to dynamically determine the affinity of the each marked region. for each region, the APM computes the optimal voltage and frequency setting in terms of energy consumption and switches the CPU to that setting during the execution of the region. Idle time is exploited by monitoring system idle time and switching to the energy-wise most economical setting without prolonging execution. We show that our method is most effective for periodic workloads such as video or audio decoding. We have implemented our method in a multitasking operating system (Microsoft Windows CE) running on an Intel XScale-processor. We achieved up to 9% of total system power savings over the standard power management policy that puts the CPU in a low Power mode during idle periods.

A real-time acoustic echo canceller implemented on the multimedia PC (멀티미디어 PC상에 구현된 실시간 음향 반향제거기)

  • Cha, Youn-Cheul;Yoo, Jae-Ha;Youn, Dae-Hee
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.11
    • /
    • pp.184-193
    • /
    • 1998
  • In this paper, a real-time acoustic echo canceller is implemented using only PC's CPU without extra help from a DSP chip. The adaptive digital filter is designed efficiently so that it can be implemented in real-time and has a proper cancellation performance. It is proposed that a new double talk detector consumes a small computational complexity and guarantees the fast detection and robust operation. The real-time acoustic echo canceller consists of the full-duplex sound card and 166 MHz Pentium PC, and requires less than 10% CPU time.

  • PDF

ART : An Implementation on the Active_object RunTime Systems Applicable for the Embedded Systems (ART : 임베디드 시스템에 적용 가능한 능동객체 실행시간 지원 시스템의 구현)

  • Park, Yoon-Young;Lim, Dong-Sun;Jung, Bu-Geum;Lee, Kyung-Oh;Park, Jung-Ho
    • The KIPS Transactions:PartA
    • /
    • v.10A no.4
    • /
    • pp.295-304
    • /
    • 2003
  • Active object is an Independent runnable unit which is scheduled by CPU in creation time. In this paper, we define the active object and suggest ART(Active object RunTime support systems) which controls creation and execution of the active object. ART can Provide users locational transparency and support easy method call mechanism. We also designed a communication model among active objects and implemented a communication method to make the distributed programing possible. The implementation target platform of ART is an embedded system which has only limited resources and runs in the distributed computing environment.

A study on game physics engine focused on real time physics (물리 엔진에 관한 고찰 : 실시간 물리 기술을 중심으로)

  • Ha, You-Jong;Park, Kyoung-Ju
    • Journal of Korea Game Society
    • /
    • v.9 no.5
    • /
    • pp.43-52
    • /
    • 2009
  • This paper analyzes the four game physics engines in terms of real time techniques. Real time physics is the technology that simplifies the physics-based simulation to apply for the real time applications such as game. Our study includes two commercial physics engines, Havok's Physics SDK and NVIDIA's PhysX SDK, and two open source projects, Open Dynamics Engine and Bullet physics engine. As a result, most of them covers rigid body dynamics and some include either deformable body simulation or fluids simulation, or both. For real time simulation, they adopt the simplified numerical methods, the effective in collision detection/response, and also use the parallel processing hardwares, i.e., multi core CPU, Physics processing unit(PPU), or graphics processing unit(GPU).

  • PDF

Real-time Implementation of Multi-channel AMR Speech Coder (멀티채널 AMR 음성부호화기의 실시간 구현)

  • 지덕구;박만호;김형중;윤병식;최송인
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.8
    • /
    • pp.19-23
    • /
    • 2001
  • DSP-based implementation is pervasive in wireless communication parts for systems and handsets according to developing high-speed and low-power programmable Digital Signal Processor (DSP). In this paper, we present a real-time implementation of multi-channel Adaptive Multi-rate (AMR) speech coder. The real-time implementation of an AMR algorithm is achieved using 32-bit fixed-point TMS320C6202 DSP chip that operates at 250 MHz. We performed cross compile, linear assembly optimization and TMS320C62xx assembly optimization for real-time implementation. Furthermore, speech data input/output function and communication function with external CPU is included in an AMR speech coder. The AMR Speech coder developed using DSP EVM board was evaluated in ETRI IMT-2000 Test-bed system.

  • PDF

HIGHER ORDER ITERATIONS FOR MOORE-PENROSE INVERSES

  • Srivastava, Shwetabh;Gupta, D.K.
    • Journal of applied mathematics & informatics
    • /
    • v.32 no.1_2
    • /
    • pp.171-184
    • /
    • 2014
  • A higher order iterative method to compute the Moore-Penrose inverses of arbitrary matrices using only the Penrose equation (ii) is developed by extending the iterative method described in [1]. Convergence properties as well as the error estimates of the method are studied. The efficacy of the method is demonstrated by working out four numerical examples, two involving a full rank matrix and an ill-conditioned Hilbert matrix, whereas, the other two involving randomly generated full rank and rank deficient matrices. The performance measures are the number of iterations and CPU time in seconds used by the method. It is observed that the number of iterations always decreases as expected and the CPU time first decreases gradually and then increases with the increase of the order of the method for all examples considered.