• Title/Summary/Keyword: Parallel rendering

Search Result 69, Processing Time 0.027 seconds

Reconfigurable Architecture Design for H.264 Motion Estimation and 3D Graphics Rendering of Mobile Applications (이동통신 단말기를 위한 재구성 가능한 구조의 H.264 인코더의 움직임 추정기와 3차원 그래픽 렌더링 가속기 설계)

  • Park, Jung-Ae;Yoon, Mi-Sun;Shin, Hyun-Chul
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.1
    • /
    • pp.10-18
    • /
    • 2007
  • Mobile communication devices such as PDAs, cellular phones, etc., need to perform several kinds of computation-intensive functions including H.264 encoding/decoding and 3D graphics processing. In this paper, new reconfigurable architecture is described, which can perform either motion estimation for H.264 or rendering for 3D graphics. The proposed motion estimation techniques use new efficient SAD computation ordering, DAU, and FDVS algorithms. The new approach can reduce the computation by 70% on the average than that of JM 8.2, without affecting the quality. In 3D rendering, midline traversal algorithm is used for parallel processing to increase throughput. Memories are partitioned into 8 blocks so that 2.4Mbits (47%) of memory is shared and selective power shutdown is possible during motion estimation and 3D graphics rendering. Processing elements are also shared to further reduce the chip area by 7%.

Massive Terrain Rendering Method Using RGBA Channel Indexing of Wavelet Coefficients (웨이블릿 압축 계수의 RGBA채널 인덱싱을 이용한 대용량 지형 렌더링 기법)

  • Kim, Tae-Gwon;Lee, Eun-Seok;Shin, Byeong-Seok
    • Journal of Korea Game Society
    • /
    • v.13 no.5
    • /
    • pp.55-62
    • /
    • 2013
  • Since large terrain data can not be loaded on the GPU or CPU memory at once, out-of-core methods which read necessary part from the secondary storage such as a hard disk are commonly used. However, long delay may occur due to limited bandwidth while loading the data from the hard disk to memory. We propose efficient rendering method of large terrain data, which compresses the data with wavelet technique and save its coefficients in RGBA channel of an image us, then decompresses that in rendering stage. Entire process is performed in GPU using Direct Compute. By reducing the amount of data transfer, performing wavelet computations in parallel and doing decompression quickly on the GPU, our method can reduce rendering time effectively.

A New Network Bandwidth Reduction Method of Distributed Rendering System for Scalable Display (확장형 디스플레이를 위한 분산 렌더링 시스템의 네트워크 대역폭 감소 기법)

  • Park, Woo-Chan;Lee, Won-Jong;Kim, Hyung-Rae;Kim, Jung-Woo;Han, Tack-Don;Yang, Sung-Bong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.10
    • /
    • pp.582-588
    • /
    • 2002
  • Scalable displays generate large and high resolution images and provide an immersive environment. Recently, scalable displays are built on the networked clusters of PCs, each of which has a fast graphics accelerator, memory, CPU, and storage. However, the distributed rendering on clusters is a network bound work because of limited network bandwidth. In this paper, we present a new algorithm for reducing the network bandwidth and implement it with a conventional distributed rendering system. This paper describes the algorithm called geometry tracking that avoids the redundant geometry transmission by indexing geometry data. The experimental results show that our algorithm reduces the network bandwidth up to 42%.

Better Bounds in Networks based on Randomly-Wired Expander

  • Park Byoung-Soo;Cho Tae-Kyung;Kim Tae-Woo
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.325-329
    • /
    • 2004
  • Linear size expanders have been studied in many fields for the practical use, which is possibility to connect large numbers of device chips in parallel communication systems. One major limitation on the efficiency of parallel computer designs has been the prohibitively high cost of parallel communication between processors and memories. Linear size expanders can be used to construct theoretically optimal interconnection networks. In current, the defined constructions have large constant factors, thus rendering them impractical for reasonable sized networks. This paper presents an improvement on constructing concentrators using an $(n,\;k,\;2rs/(r^2-s^2))expander,$ which realizes the reduction of the size in a superconcentrator by a constant factor.

  • PDF

Enhanced Image Mapping Method for Computer-Generated Integral Imaging System (집적 영상 시스템을 위한 향상된 이미지 매핑 방법)

  • Lee, Bin-Na-Ra;Cho, Yong-Joo;Min, Sung-Wook;Park, Kyung-Shin
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02a
    • /
    • pp.535-540
    • /
    • 2006
  • 집적 영상(Integral Imaging) 시스템은 관찰자가 특수안경의 착용 없이 육안으로 3 차원 영상을 볼 수 있는 무안경식 양안시차 디스플레이 방식 중 하나로, 수직, 수평 시차와 총천연색의 영상을 제공한다. 집적영상 시스템은 3 차원 정보를 2 차원 엘리멘탈 이미지 (Elemental image)의 형태로 저장하는데, 엘리멘탈 이미지는 조금씩 다른 방향에서 제한된 크기로 촬영된 이미지이다. 엘리멘탈 이미지는 컴퓨터 그래픽으로 만들어질 수도 있는데, 이를 이용하는 집적 영상 방식을 CG 직접 영상 시스템이라 한다. 이와 같이 컴퓨터 계산에 의해 엘리멘탈 이미지를 얻는 과정을 이미지 매핑 (Image mapping)이라 부른다. 이제까지 제안된 이미지 매핑 방식에는 점대점 (Point to Point), MVR (Multi-Viewpoint Rendering), PGR (Parallel Group Rendering) 이 있다. 그러나 이런 방식들은 계산량이 많거나 렌즈 어레이 개수의 증가에 의해 속도에 영향을 받는 단점이 있어, 아직 가상현실 같은 실시간 CG 응용 분야에 사용하기 어려운 문제가 있다. 본 논문에서는 VVR (Viewpoint Vector Rendering)이라는 기존의 방법과 비교해 향상된 새로운 이미지 매핑 방법을 제안한다. 먼저 VVR 개념을 자세히 설명한 후 VVR 을 사용한 집적 영상 시스템을 구현하여 MVR 방법과 비교 분석한 실험결과와 개선되어야 할 방향을 제시한다.

  • PDF

An Efficient Perspective Projection using $\textrm{VolumePro}^{TM}$ Hardware (볼륨프로 하드웨어를 이용한 효율적인 투시투영 방법)

  • 임석현;신병석
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.3_4
    • /
    • pp.195-203
    • /
    • 2004
  • VolumePro is a real-time volume rendering hardware for consumer PCs. However it cannot be used for the applications requiring perspective projection such as virtual endoscopy since it provides only orthographic projection. Several methods have been presented to approximate perspective projection by decomposing a volume into slabs and applying successive parallel projection to thou. But it takes a lot of time since the entire region of every slab should be processed, which does not contribute to final image. In this paper, we propose an efficient perspective projection method that makes the use of several sub-volumes with cropping feature of VolumePro. It reduces the rendering time in comparison to slab-based method without image quality deterioration since it processes only the parts contained in the view frustum.

Real-time Image-space Hatching (실시간 영상 공간 해칭 -GPU 기반 실시간 픽셀 단위 영상공간 해칭-)

  • Kim, Yong-Jin;Lee, Seung-Yong
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.459-462
    • /
    • 2009
  • Hatching is an effective artistic tool for conveying shape and shading by placing parallel line strokes on drawing objects. We present a simple and effective per-pixel image-space hatching method to draw line strokes using given stroke directions. Our hatching method directly runs on the screen and it can efficiently render highly complex scenes in hatching styles. We implement the algorithm using a pixel shader in a modern GPU.

  • PDF

Implementation of Neural Network Accelerator for Rendering Noise Reduction on OpenCL (OpenCL을 이용한 랜더링 노이즈 제거를 위한 뉴럴 네트워크 가속기 구현)

  • Nam, Kihun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.4 no.4
    • /
    • pp.373-377
    • /
    • 2018
  • In this paper, we propose an implementation of a neural network accelerator for reducing the rendering noise using OpenCL. Among the rendering algorithms, we selects a ray tracing to assure a high quality graphics. Ray tracing rendering uses ray to render, less use of the ray will result in noise. Ray used more will produce a higher quality image but will take operation time longer. To reduce operation time whiles using fewer rays, Learning Base Filtering algorithm using neural network was applied. it's not always produce optimize result. In this paper, a new approach to Matrix Multiplication that is based on General Matrix Multiplication for improved performance. The development environment, we used specialized in high speed parallel processing of OpenCL. The proposed architecture was verified using Kintex UltraScale XKU6909T-2FDFG1157C FPGA board. The time it takes to calculate the parameters is about 1.12 times fast than that of Verilog-HDL structure.

Simulation of Deformable Objects using GLSL 4.3

  • Sung, Nak-Jun;Hong, Min;Lee, Seung-Hyun;Choi, Yoo-Joo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.8
    • /
    • pp.4120-4132
    • /
    • 2017
  • In this research, we implement a deformable object simulation system using OpenGL's shader language, GLSL4.3. Deformable object simulation is implemented by using volumetric mass-spring system suitable for real-time simulation among the methods of deformable object simulation. The compute shader in GLSL 4.3 which helps to access the GPU resources, is used to parallelize the operations of existing deformable object simulation systems. The proposed system is implemented using a compute shader for parallel processing and it includes a bounding box-based collision detection solution. In general, the collision detection is one of severe computing bottlenecks in simulation of multiple deformable objects. In order to validate an efficiency of the system, we performed the experiments using the 3D volumetric objects. We compared the performance of multiple deformable object simulations between CPU and GPU to analyze the effectiveness of parallel processing using GLSL. Moreover, we measured the computation time of bounding box-based collision detection to show that collision detection can be processed in real-time. The experiments using 3D volumetric models with 10K faces showed the GPU-based parallel simulation improves performance by 98% over the CPU-based simulation, and the overall steps including collision detection and rendering could be processed in real-time frame rate of 218.11 FPS.

Parallel Cell-Connectivity Information Extraction Algorithm for Ray-casting on Unstructured Grid Data (비정렬 격자에 대한 광선 투사를 위한 셀 사이 연결정보 추출 병렬처리 알고리즘)

  • Lee, Jihun;Kim, Duksu
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.1
    • /
    • pp.17-25
    • /
    • 2020
  • We present a novel multi-core CPU based parallel algorithm for the cell-connectivity information extraction algorithm, which is one of the preprocessing steps for volume rendering of unstructured grid data. We first check the synchronization issues when parallelizing the prior serial algorithm naively. Then, we propose a 3-step parallel algorithm that achieves high parallelization efficiency by removing synchronization in each step. Also, our 3-step algorithm improves the cache utilization efficiency by increasing the spatial locality for the duplicated triangle test process, which is the core operation of building cell-connectivity information. We further improve the efficiency of our parallel algorithm by employing a memory pool for each thread. To check the benefit of our approach, we implemented our method on a system consisting of two octa-core CPUs and measured the performance. As a result, our method shows continuous performance improvement as we add threads. Also, it achieves up to 82.9 times higher performance compared with the prior serial algorithm when we use thirty-two threads (sixteen physical cores). These results demonstrate the high parallelization efficiency and high cache utilization efficiency of our method. Also, it validates the suitability of our algorithm for large-scale unstructured data.