• Title/Summary/Keyword: CPU Time

Search Result 940, Processing Time 0.027 seconds

3D Holographic Image Recognition by Using Graphic Processing Unit

  • Lee, Jeong-A;Moon, In-Kyu;Liu, Hailing;Yi, Faliu
    • Journal of the Optical Society of Korea
    • /
    • v.15 no.3
    • /
    • pp.264-271
    • /
    • 2011
  • In this paper we examine and compare the computational speeds of three-dimensional (3D) object recognition by use of digital holography based on central unit processing (CPU) and graphic processing unit (GPU) computing. The holographic fringe pattern of a 3D object is obtained using an in-line interferometry setup. The Fourier matched filters are applied to the complex image reconstructed from the holographic fringe pattern using a GPU chip for real-time 3D object recognition. It is shown that the computational speed of the 3D object recognition using GPU computing is significantly faster than that of the CPU computing. To the best of our knowledge, this is the first report on comparisons of the calculation time of the 3D object recognition based on the digital holography with CPU vs GPU computing.

Efficient Workload Distribution of Photomosaic Using OpenCL into a Heterogeneous Computing Environment (이기종 컴퓨팅 환경에서 OpenCL을 사용한 포토모자이크 응용의 효율적인 작업부하 분배)

  • Kim, Heegon;Sa, Jaewon;Choi, Dongwhee;Kim, Haelyeon;Lee, Sungju;Chung, Yongwha;Park, Daihee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.8
    • /
    • pp.245-252
    • /
    • 2015
  • Recently, parallel processing methods with accelerator have been introduced into a high performance computing and a mobile computing. The photomosaic application can be parallelized by using inherent data parallelism and accelerator. In this paper, we propose a way to distribute the workload of the photomosaic application into a CPU and GPU heterogeneous computing environment. That is, the photomosaic application is parallelized using both CPU and GPU resource with the asynchronous mode of OpenCL, and then the optimal workload distribution rate is estimated by measuring the execution time with CPU-only and GPU-only distribution rates. The proposed approach is simple but very effective, and can be applied to parallelize other applications on a CPU and GPU heterogeneous computing environment. Based on the experimental results, we confirm that the performance is improved by 141% into a heterogeneous computing environment with the optimal workload distribution compared with using GPU-only method.

Design Considerations on Large-scale Parallel Finite Element Code in Shared Memory Architecture with Multi-Core CPU (멀티코어 CPU를 갖는 공유 메모리 구조의 대규모 병렬 유한요소 코드에 대한 설계 고려 사항)

  • Cho, Jeong-Rae;Cho, Keunhee
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.30 no.2
    • /
    • pp.127-135
    • /
    • 2017
  • The computing environment has changed rapidly to enable large-scale finite element models to be analyzed at the PC or workstation level, such as multi-core CPU, optimal math kernel library implementing BLAS and LAPACK, and popularization of direct sparse solvers. In this paper, the design considerations on a parallel finite element code for shared memory based multi-core CPU system are proposed; (1) the use of optimized numerical libraries, (2) the use of latest direct sparse solvers, (3) parallelism using OpenMP for computing element stiffness matrices, and (4) assembly techniques using triplets, which is a type of sparse matrix storage. In addition, the parallelization effect is examined on the time-consuming works through a large scale finite element model.

Performance Evaluation of Interconnection Network in Microservers (마이크로서버의 내부 연결망 성능평가)

  • Oh, Myeong-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.6
    • /
    • pp.91-97
    • /
    • 2021
  • A microserver is a type of a computing server, in which two or more CPU nodes are implemented on a separate computing board, and a plurality of computing boards are integrated on a main board. In building a cluster system, the microserver has advantages in several points such as energy efficiency, area occupied, and ease of management compared to the existing method of mounting legacy servers in multiple racks. In addition, since the microserver uses a fast interconnection network between CPU nodes, performance improvement for data transfers is expected. The proposed microserver can mount a total of 16 computing boards with 4 CPU nodes on the main board, and uses Serial-RapidIO (SRIO) as an interconnection network. In order to analyze the performance of the proposed microserver in terms of the interconnection network which is a core performance issue of the microserver, we compare and quantify the performance of commercial microservers. As a result of the test, it showed up to about 7 times higher bandwidth improvement when transmitting data using the interconnection network. In addition, with CloudSuite benchmark programs used in actual cloud computing, maximum 60% reduction in execution time was obtained compared to commercial microservers with similar CPU performance specification.

Analysis of Implementing Mobile Heterogeneous Computing for Image Sequence Processing

  • BAEK, Aram;LEE, Kangwoon;KIM, Jae-Gon;CHOI, Haechul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.4948-4967
    • /
    • 2017
  • On mobile devices, image sequences are widely used for multimedia applications such as computer vision, video enhancement, and augmented reality. However, the real-time processing of mobile devices is still a challenge because of constraints and demands for higher resolution images. Recently, heterogeneous computing methods that utilize both a central processing unit (CPU) and a graphics processing unit (GPU) have been researched to accelerate the image sequence processing. This paper deals with various optimizing techniques such as parallel processing by the CPU and GPU, distributed processing on the CPU, frame buffer object, and double buffering for parallel and/or distributed tasks. Using the optimizing techniques both individually and combined, several heterogeneous computing structures were implemented and their effectiveness were analyzed. The experimental results show that the heterogeneous computing facilitates executions up to 3.5 times faster than CPU-only processing.

Design of digital controller of six degree of freedom industrial robot using 16 bit CPU and modula-2 language (16 bit CPU와 Modula-2 언어를 사용한 6측 산업용 로보트의 디지탈 제어기 제작에 관한 연구)

  • 이주장;김양한;윤형우
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1987.10b
    • /
    • pp.10-13
    • /
    • 1987
  • The main work of this paper are the manufacture of six degree of freedom industrial robot control hardware of 16 bit CPU and the development of five motion control software. The work would draw on KIT of Robotics Laboratory whose extensive experience in these areas; in particular the 68000 assembler and Modula-2 languages, and existing robot control systems. We found that this controller is good for the robot controller of PID types. But, for the use of self-tuning algorithms and real time calculations we need 32 bit CPU robot controller such as MC 68020 microprocessor.

  • PDF

Implementation of Integrated CPU-GPU for Efficient Uniform Memory Access Method and Verification System (CPU-GPU간 긴밀성을 위한 효율적인 공유메모리 접근 방법과 검증 시스템 구현)

  • Park, Hyun-moon;Kwon, Jinsan;Hwang, Tae-ho;Kim, Dong-Sun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.2
    • /
    • pp.57-65
    • /
    • 2016
  • In this paper, we propose a system for efficient use of shared memory between CPU and GPU. The system, called Fusion Architecture, assures consistency of the shared memory and minimizes cache misses that frequently occurs on Heterogeneous System Architecture or Unified Virtual Memory based systems. It also maximizes the performance for memory intensive jobs by efficient allocation of GPU cores. To test between architectures on various scenarios, we introduce the Fusion Architecture Analyzer, which compares OpenMP, OpenCL, CUDA, and the proposed architecture in terms of memory overhead and process time. As a result, Proposed fusion architectures show that the Fusion Architecture runs benchmarks 55% faster and reduces memory overheads by 220% in average.

Effective Scheduling) Algorithm of Process for Real Time Operating System (실시간 운영체제를 위한 프로세스의 효율적인 스케줄링 알고리즘)

  • 정선아;이지영
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10c
    • /
    • pp.373-375
    • /
    • 2002
  • 본 논문은 실시간 운영체제에서 프로세스의 효율적인 관리를 위한 스케줄링 알고리즘을 제안한다. 따라서 CPU의 활용도를 높이고 스케줄링 시간과 인터럽트 시간을 줄임으로서 자원을 효율적으로 관리할 수 있다. 본 논문에서 제안하는 방법으로는 다중 큐에 PIT(Process Information Table)를 두어 각각의 큐에 프로세스가 들어오면 우선순위에 따라 CPU를 할당하는 방법이다. 기존의 다중 큐와는 달리 우선순위 프로세스를 보다 정확하고 빨리 찾아내어 외부 또는 내부의 인터럽트에 응답 할 수 있게 하였다. 또한 우선순위에 밀려 실행하지 못하는 프로세스는 일정 시간이 경과하면 CPU를 선점할 수 있다. 그러므로 CPU는 활용도가 높아지고 유휴 시간은 짧아지게 된다. 본 논문은 일반 펜티엄 PC에서 실험하였으며 현재 사용되는 RTOS(VxWorks, QNX)와 비교하여 다소 우수함을 보였다.

  • PDF

HUGE DIRECT NUMERICAL SIMULATION OF TURBULENT COMBUSTION - TOWARD PERFECT SIMULATION OF IC ENGINE -

  • Tanahashi, Mamoru;Seo, Takehiko;Sato, Makoto;Tsunemi, Akihiko;Miyauchi, Toshio
    • Journal of computational fluids engineering
    • /
    • v.13 no.4
    • /
    • pp.114-125
    • /
    • 2008
  • Current state and perspective of DNS of turbulence and turbulent combustion are discussed with feature trend of the fastest supercomputer in the world. Based on the perspective of DNS of turbulent combustion, possibility of perfect simulations of IC engine is shown. In 2020, the perfect simulation will be realized with 30 billion grid points by 1EXAFlops supercomputer, which requires 4 months CPU time. The CPU time will be reduced to about 4 days if several developments were achieved in the current fundamental researches. To shorten CPU time required for DNS of turbulent combustion, two numerical methods are introduced to full-explicit full-compressible DNS code. One is compact finite difference filter to reduce spatial resolution requirements and numerical oscillations in small scales, and another is well-known point-implicit scheme to avoid quite small time integration of the order of nanosecond for fully explicit DNS. Availability and accuracy of these numerical methods have been confirmed carefully for auto-ignition, planar laminar flame and turbulent premixed flames. To realize DNS of IC engine with realistic kinetic mechanism, several DNS of elemental combustion process in IC engines has been conducted.

Huge Direct Numerical Simulation of Turbulent Combustion-Toward Perfect Simulation of IC Engine-

  • Tanahashi, Mamoru
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2008.03a
    • /
    • pp.359-366
    • /
    • 2008
  • Current state and perspective of DNS of turbulence and turbulent combustion are discussed with feature trend of the fastest supercomputer in the world. Based on the perspective of DNS of turbulent combustion, possibility of perfect simulations of IC engine is shown. In 2020, the perfect simulation will be realized with 30 billion grid points by 1EXAFlops supercomputer, which requires 4 months CPU time. The CPU time will be reduced to about 4 days if several developments were achieved in the current fundamental researches. To shorten CPU time required for DNS of turbulent combustion, two numerical methods are introduced to full-explicit full-compressible DNS code. One is compact finite difference filter to reduce spatial resolution requirements and numerical oscillations in small scales, and another is well-known point-implicit scheme to avoid quite small time integration of the order of nanosecond for fully explicit DNS. Availability and accuracy of these numerical methods have been confirmed carefully for auto-ignition, planar laminar flame and turbulent premixed flames. To realize DNS of IC engine with realistic kinetic mechanism, several DNS of elemental combustion process in IC engines has been conducted.

  • PDF