• Title/Summary/Keyword: parallel computer processing

Search Result 651, Processing Time 0.029 seconds

Simulation of Deformable Objects using GLSL 4.3

  • Sung, Nak-Jun;Hong, Min;Lee, Seung-Hyun;Choi, Yoo-Joo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.8
    • /
    • pp.4120-4132
    • /
    • 2017
  • In this research, we implement a deformable object simulation system using OpenGL's shader language, GLSL4.3. Deformable object simulation is implemented by using volumetric mass-spring system suitable for real-time simulation among the methods of deformable object simulation. The compute shader in GLSL 4.3 which helps to access the GPU resources, is used to parallelize the operations of existing deformable object simulation systems. The proposed system is implemented using a compute shader for parallel processing and it includes a bounding box-based collision detection solution. In general, the collision detection is one of severe computing bottlenecks in simulation of multiple deformable objects. In order to validate an efficiency of the system, we performed the experiments using the 3D volumetric objects. We compared the performance of multiple deformable object simulations between CPU and GPU to analyze the effectiveness of parallel processing using GLSL. Moreover, we measured the computation time of bounding box-based collision detection to show that collision detection can be processed in real-time. The experiments using 3D volumetric models with 10K faces showed the GPU-based parallel simulation improves performance by 98% over the CPU-based simulation, and the overall steps including collision detection and rendering could be processed in real-time frame rate of 218.11 FPS.

A Study on Parallel Processing System for Automatic Segmentation of Moving Object in Image Sequences

  • Lee, Hyung;Park, Jong-Won
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.429-432
    • /
    • 2000
  • The new MPEG-4 video coding standard enables content-based functionalities. In order to support the philosophy of the MPEG-4 visual standard, each frame of video sequences should be represented in terms of video object planes (VOP’s). In other words, video objects to be encoded in still pictures or video sequences should be prepared before the encoding process starts. Therefore, it requires a prior decomposition of sequences into VOP’s so that each VOP represents a moving object. A parallel processing system is required an automatic segmentation to be processed in real-time, because an automatic segmentation is time consuming. This paper addresses the parallel processing: system for an automatic segmentation for separating moving object from the background in image sequences. The proposed parallel processing system comprises of processing elements (PE’s) and a multi-access memory system (MAMS). Multi-access memory system is a memory controller to perform parallel memory access with the variety of types: horizontal, vertical, and block access way. In order to realize these ways, a multi-access memory system consists of a memory module selection module, data routing modules, and an address calculation and routing module. The proposed system is simulated and evaluated by the CADENCE Verilog-XL hardware simulation package.

  • PDF

A Ray-Tracing Algorithm Based On Processor Farm Model (프로세서 farm 모델을 이용한 광추적 알고리듬)

  • Lee, Hyo Jong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.2 no.1
    • /
    • pp.24-30
    • /
    • 1996
  • The ray tracing method, which is one of many photorealistic rendering techniques, requires heavy computational processing to synthesize images. Parallel processing can be used to reduce the computational processing time. A parallel algorithm for the ray tracing has been implemented and executed for various images on transputer systems. In order to develop a scalable parallel algorithm, a processor farming technique has been exploited. Since each image is divided and distributed to each farming processor, the scalability of the parallel system and load balancing are achieved naturally in the proposed algorithm. Efficiency of the parallel algorithm is obtained up to 95% for nine processors. However, the best size of a distributed task is much higher in simple images due to less computational requirement for every pixel. Efficiency degradation is observed for large granularity tasks because of load unbalancing caused by the large task. Overall, transputer systems behave as good scalable parallel processing system with respect to the cost-performance ratio.

  • PDF

Design and Analysis of User's Libraries for Parallel Computing based on the Internet (인터넷 기반의 병렬 컴퓨팅을 위한 사용자 라이브러리 설계 및 성능 분석)

  • Sin, Pil-Seop;Jeong, Jun-Mok;Maeng, Hye-Seon;Hong, Won-Gi;Kim, Sin-Deok
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.11
    • /
    • pp.2932-2945
    • /
    • 1999
  • As the Internet and Java technology have been growing up, parallel processing approach to utilize those idle resources connected to the Internet has become quite attractive. In this paper, JICE(Java Internet Computing Environment) was implemented as a parallel computing platform based on the Internet using multithreading and RMI mechanisms provided by Java. The basic model of JICE is constructed as three components, such as a client, a set of workers, and a broker. A worker communicates with other workers via a globally shared memory system. It provides users with master-slave programming model and a collection of library functions. The basic model of JICE is also extended as a multimanaging system. This multimanaging system is evaluated by analysis to show its effectiveness. According to numerical analysis and experiments with several benchmarks, it is shown that the performance of basic model depends on the shared memory reference ratio and user's library is a quite promising.

  • PDF

TBBench: A Micro-Benchmark Suite for Intel Threading Building Blocks

  • Marowka, Ami
    • Journal of Information Processing Systems
    • /
    • v.8 no.2
    • /
    • pp.331-346
    • /
    • 2012
  • Task-based programming is becoming the state-of-the-art method of choice for extracting the desired performance from multi-core chips. It expresses a program in terms of lightweight logical tasks rather than heavyweight threads. Intel Threading Building Blocks (TBB) is a task-based parallel programming paradigm for multi-core processors. The performance gain of this paradigm depends to a great extent on the efficiency of its parallel constructs. The parallel overheads incurred by parallel constructs determine the ability for creating large-scale parallel programs, especially in the case of fine-grain parallelism. This paper presents a study of TBB parallelization overheads. For this purpose, a TBB micro-benchmarks suite called TBBench has been developed. We use TBBench to evaluate the parallelization overheads of TBB on different multi-core machines and different compilers. We report in detail in this paper on the relative overheads and analyze the running results.

PASS: A Parallel Speech Understanding System

  • Chung, Sang-Hwa
    • Journal of Electrical Engineering and information Science
    • /
    • v.1 no.1
    • /
    • pp.1-9
    • /
    • 1996
  • A key issue in spoken language processing has become the integration of speech understanding and natural language processing(NLP). This paper presents a parallel computational model for the integration of speech and NLP. The model adopts a hierarchically-structured knowledge base and memory-based parsing techniques. Processing is carried out by passing multiple markers in parallel through the knowledge base. Speech-specific problems such as insertion, deletion, and substitution have been analyzed and their parallel solutions are provided. The complete system has been implemented on the Semantic Network Array Processor(SNAP) and is operational. Results show an 80% sentence recognition rate for the Air Traffic Control domain. Moreover, a 15-fold speed-up can be obtained over an identical sequential implementation with an increasing speed advantage as the size of the knowledge base grows.

  • PDF

Optimal Economic Load Dispatch using Parallel Genetic Algorithms in Large Scale Power Systems (병렬유전알고리즘을 응용한 대규모 전력계통의 최적 부하배분)

  • Kim, Tae-Kyun;Kim, Kyu-Ho;Yu, Seok-Ku
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.4
    • /
    • pp.388-394
    • /
    • 1999
  • This paper is concerned with an application of Parallel Genetic Algorithms(PGA) to optimal econmic load dispatch(ELD) in power systems. The ELD problem is to minimize the total generation fuel cost of power outputs for all generating units while satisfying load balancing constraints. Genetic Algorithms(GA) is a good candidate for effective parallelization because of their inherent principle of evolving in parallel a population of individuals. Each individual of a population evaluates the fitness function without data exchanges between individuals. In application of the parallel processing to GA, it is possible to use Single Instruction stream, Multiple Data stream(SIMD), a kind of parallel system. The architecture of SIMD system need not data communications between processors assigned. The proposed ELD problem with C code is implemented by SIMSCRIPT language for parallel processing which is a powerfrul, free-from and versatile computer simulation programming language. The proposed algorithms has been tested for 38 units system and has been compared with Sequential Quadratic programming(SQP).

  • PDF

Design to Chip with Multi-Access Memory System and Parallel Processor for 16 Processing Elements of Image Processing Purpose (영상처리용 16개의 처리기를 위한 다중접근기억장치 및 병렬처리기의 칩 설계)

  • Lim, Jae-Ho;Park, Seong-Mi;Park, Jong-Won
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.11
    • /
    • pp.1401-1408
    • /
    • 2011
  • This dissertation present a chip with Multi-Access Memory System(MAMS) and parallel processor for 16 Processing Elements of image processing purpose. MAMS is a kind of parallel access memory system and can simultaneously access to random pixel datas with eight types. It is possible to set a interval about pixel datas to access, too. The parallel processor built-in MAMS actually has been realized in 2003 but its performance fell short of a real time process for high-definition images. I designed a improved parallel processing system by means of addition and expansion of Memory Modules and Processing Elements of previous one. It is feasible to perform a Morphological Closing at the speed of 3 times of the previous one and 6 times of serial system.

Design of A Multimedia Bitstream ASIP for Multiple CABAC Standards

  • Choi, Seung-Hyun;Lee, Seong-Won
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.4
    • /
    • pp.292-298
    • /
    • 2017
  • The complexity of image compression algorithms has increased in order to improve image compression efficiency. One way to resolve high computational complexity is parallel processing. However, entropy coding, which is lossless compression, does not fit into the parallel processing form because of the correlation between consecutive symbols. This paper proposes a new application-specific instruction set processor (ASIP) platform by adding new context-adaptive binary arithmetic coding (CABAC) instructions to the existing platform to quickly process a variety of entropy coding. The newly added instructions work without conflicts with all other existing instructions of the platform, providing the flexibility to handle many coding standards with fast processing speeds. CABAC software is implemented for High Efficiency Video Coding (HEVC) and the performance of the proposed ASIP platform was verified with a field programmable gate array simulation.

Integrated GUI Environment of Parallel Fuzzy Inference System for Pattern Classification of Remote Sensing Images

  • Lee, Seong-Hoon;Lee, Sang-Gu;Son, Ki-Sung;Kim, Jong-Hyuk;Lee, Byung-Kwon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.2 no.2
    • /
    • pp.133-138
    • /
    • 2002
  • In this paper, we propose an integrated GUI environment of parallel fuzzy inference system fur pattern classification of remote sensing data. In this, as 4 fuzzy variables in condition part and 104 fuzzy rules are used, a real time and parallel approach is required. For frost fuzzy computation, we use the scan line conversion algorithm to convert lines of each fuzzy linguistic term to the closest integer pixels. We design 4 fuzzy processor unit to be operated in parallel by using FPGA. As a GUI environment, PCI transmission, image data pre-processing, integer pixel mapping and fuzzy membership tuning are considered. This system can be used in a pattern classification system requiring a rapid inference time in a real-time.