• Title/Summary/Keyword: parallel computer processing

Search Result 651, Processing Time 0.101 seconds

A Parallel Processing Method for Partial Nodes in R*-tree Using GPU (GPU를 활용한 R*-tree에서의 부분 노드 병렬 처리 방법)

  • Kim, Seong;Oh, Byoung-Woo
    • Spatial Information Research
    • /
    • v.20 no.6
    • /
    • pp.139-144
    • /
    • 2012
  • The R*-tree manages hierarchical nodes for efficient access of spatial data. We propose a method that maintains partial nodes of R*-tree in the GPU memory to improve efficiency using parallel processing. The proposed method attempts to load as many nodes as possible to the GPU memory. The new nodes are inserted to manage the rest of R*-tree nodes in the main memory. The experimental result shows that the proposed method is more efficient than the main memory based R*-tree.

Implementation of Underwater Simulation of a Net using OpenMP (OpenMP 병렬프로그램을 이용한 그물의 수중형상 시뮬레이션 구현)

  • Park, Myeong-Chul;Park, Seok-Gyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.2
    • /
    • pp.11-17
    • /
    • 2008
  • The net shape effects by the various vectors in underwater. Each particle of the net calculating the effect of all vectors augments an accuracy and reality. But, the time complexity becomes larger because of huge calculation. The previous techniques reduced a physics reality. And embodied the underwater virtual reality which augments visual reality with simulation. In this paper, parallel processing the particles, it embodied the simulation which is satisfied a physical reality and time reality. The parallel processing used the OpenMP, and the reality graphic expression used the OpenGL. The simulation which this paper Proposes will be the possibility becoming the fundamental data for a model analysis or a specialist system from game and marine field.

  • PDF

Design of an efficient routing algorithm on the WK-recursive network

  • Chung, Il-Yong
    • Smart Media Journal
    • /
    • v.11 no.9
    • /
    • pp.39-46
    • /
    • 2022
  • The WK-recursive network proposed by Vecchia and Sanges[1] is widely used in the design and implementation of local area networks and parallel processing architectures. It provides a high degree of regularity and scalability, which conform well to a design and realization of distributed systems involving a large number of computing elements. In this paper, the routing of a message is investigated on the WK-recursive network, which is key to the performance of this network. We present an efficient shortest path algorithm on the WK-recursive network, which is simpler than Chen and Duh[2] in terms of design complexity.

Algorithmic GPGPU Memory Optimization

  • Jang, Byunghyun;Choi, Minsu;Kim, Kyung Ki
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.4
    • /
    • pp.391-406
    • /
    • 2014
  • The performance of General-Purpose computation on Graphics Processing Units (GPGPU) is heavily dependent on the memory access behavior. This sensitivity is due to a combination of the underlying Massively Parallel Processing (MPP) execution model present on GPUs and the lack of architectural support to handle irregular memory access patterns. Application performance can be significantly improved by applying memory-access-pattern-aware optimizations that can exploit knowledge of the characteristics of each access pattern. In this paper, we present an algorithmic methodology to semi-automatically find the best mapping of memory accesses present in serial loop nest to underlying data-parallel architectures based on a comprehensive static memory access pattern analysis. To that end we present a simple, yet powerful, mathematical model that captures all memory access pattern information present in serial data-parallel loop nests. We then show how this model is used in practice to select the most appropriate memory space for data and to search for an appropriate thread mapping and work group size from a large design space. To evaluate the effectiveness of our methodology, we report on execution speedup using selected benchmark kernels that cover a wide range of memory access patterns commonly found in GPGPU workloads. Our experimental results are reported using the industry standard heterogeneous programming language, OpenCL, targeting the NVIDIA GT200 architecture.

Development of High Performance Massively Parallel Processing Simulator for Semiconductor Etching Process (건식 식각 공정을 위한 초고속 병렬 연산 시뮬레이터 개발)

  • Lee, Jae-Hee;Kwon, Oh-Seob;Ban, Yong-Chan;Won, Tae-Young
    • Journal of the Korean Institute of Telematics and Electronics D
    • /
    • v.36D no.10
    • /
    • pp.37-44
    • /
    • 1999
  • This paper report the implementation results of Monte Carlo numerical calculation for ion distributions in plasma dry etching chamber and of the surface evolution simulator using cell removal method for topographical evolution of the surface exposed to etching ion. The energy and angular distributions of ion across the plasma sheath were calculated by MC(Monte Carlo) algorithm. High performance MPP(Massively Parallel Processing) algorithm developed in this paper enables efficient parallel and distributed simulation with an efficiency of more than 95% and speedup of 16 with 16 processors. Parallelization of surface evolution simulator based on cell removal method reduces simulation time dramatically to 15 minutes and increases capability of simulation required enormous memory size of 600Mb.

  • PDF

Applying Parallel Processing Technique in Parallel Circuit Testing Application for improve Circuit Test Ability in Circuit manufacturing

  • Prabhavat, Sittiporn;Nilagupta, Pradondet
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.792-793
    • /
    • 2005
  • Circuit testing process is very important in IC Manufacturing there are two ways in research for circuit testing improvement. These are ATPG Tool Design and Test simulation application. We are interested in how to use parallel technique such as one-side communication, parallel IO and dynamic Process with data partition for circuit testing improvement and we use one-side communication technique in this paper. The parallel ATPG Tool can reduce the test pattern sets of the circuit that is designed in laboratory for make sure that the fault is not occur. After that, we use result for parallel circuit test simulation to find fault between designed circuit and tested circuit. From the experiment, We use less execution time than non-parallel Process. And we can set more parameter for less test size. Previous experiment we can't do it because some parameter will affect much waste time. But in the research, if we use the best ATPG Tool can optimize to least test sets and parallel circuit testing application will not work. Because there are too little test set for circuit testing application. In this paper we use a standard sequential circuit of ISCAS89.

  • PDF

FPGA Design of a Parallel Canny Edge Detector with Optimized Local Buffers (로컬 버퍼 최적화를 통한 병렬 처리 캐니 경계선 검출기의 FPGA 설계)

  • Ingi Min;Suhyun Sim;Seungwon Hwang;Sunhee Kim
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.4
    • /
    • pp.59-65
    • /
    • 2023
  • Edge detection in image processing and computer vision is one of the most fundamental operations. Canny edge detection algorithm has excellent performance and is currently widely used. However, it is difficult to process the algorithm in real-time because the algorithm is complex. In this study, the equations required in the algorithm were simplified to facilitate hardware implementation, and the calculation speed was increased by using a parallel structure. In particular, the size and management of local buffers were selected in consideration of parallel processing and filter size so that data could be processed without bottlenecks. It was designed in verilog and implemented in FPGA to verify operation and performance.

  • PDF

Fault-tolerant Scheduling of Real-time Parallel Tasks with Energy Efficiency on Multicore Processors (멀티코어 프로세서 상에서 에너지 효율을 고려한 실시간 병렬 작업들의 결함 포용 스케쥴링)

  • Lee, Kwanwoo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.6
    • /
    • pp.173-178
    • /
    • 2014
  • By exploiting parallel processing, the proposed scheduling scheme enhances energy saving capability of multicore processors for real-time tasks while satisfying deadline and fault tolerance constraints. The scheme searches for a near minimum-energy schedule within a polynomial time, because finding the minimum-energy schedule on multicore processors is a NP-hard problem. The scheme consumes manifestly less energy than the state-of-the-arts method even with low parallel processing speedup as well as with high parallel processing speedup, and saves the energy consumption up to 86%.

High-speed Fuzzy Inference System in Integrated GUI Environment

  • Lee, Sang-Gu
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.4 no.1
    • /
    • pp.50-55
    • /
    • 2004
  • We propose an intgrated Gill environment system having only integer fuzzy operations in the consequent part and the defuzzification stage. In this paper, we also propose an integrated Gill environment system with 4 parallel fuzzy processing units to be operated in parallel on the classification of the sensed image data. In this, we solve the problems of taking longer times as the fuzzy real computations of [0, 1] by using the integer pixel conversion algorithm to convert lines of each fuzzy linguistic term to the closest integer pixels. This procedure is performed automatically in the GUI application program. As a Gill environment, PCI transmission, image data pre-processing, integer pixel mapping and fuzzy membership tuning are considered. This system can be operated in parallel manner for MIMO or MISO systems.

Integer-Pel Motion Estimation for HEVC on Compute Unified Device Architecture (CUDA)

  • Lee, Dongkyu;Sim, Donggyu;Oh, Seoung-Jun
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.6
    • /
    • pp.397-403
    • /
    • 2014
  • A new video compression standard called High Efficiency Video Coding (HEVC) has recently been released onto the market. HEVC provides higher coding performance compared to previous standards, but at the cost of a significant increase in encoding complexity, particularly in motion estimation (ME). At the same time, the computing capabilities of Graphics Processing Units (GPUs) have become more powerful. This paper proposes a parallel integer-pel ME (IME) algorithm for HEVC on GPU using the Compute Unified Device Architecture (CUDA). In the proposed IME, concurrent parallel reduction (CPR) is introduced. CPR performs several parallel reduction (PR) operations concurrently to solve two problems in conventional PR; low thread utilization and high thread synchronization latency. The proposed encoder reduces the portion of IME in the encoder to almost zero with a 2.3% increase in bitrate. In terms of IME, the proposed IME is up to 172.6 times faster than the IME in the HEVC reference model.