• Title/Summary/Keyword: Parallel Processing Architecture

Search Result 397, Processing Time 0.03 seconds

A dynamic analysis algorithm for RC frames using parallel GPU strategies

  • Li, Hongyu;Li, Zuohua;Teng, Jun
    • Computers and Concrete
    • /
    • v.18 no.5
    • /
    • pp.1019-1039
    • /
    • 2016
  • In this paper, a parallel algorithm of nonlinear dynamic analysis of three-dimensional (3D) reinforced concrete (RC) frame structures based on the platform of graphics processing unit (GPU) is proposed. Time integration is performed using Newmark method for nonlinear implicit dynamic analysis and parallelization strategies are presented. Correspondingly, a parallel Preconditioned Conjugate Gradients (PCG) solver on GPU is introduced for repeating solution of the equilibrium equations for each time step. The RC frames were simulated using fiber beam model to capture nonlinear behaviors of concrete and reinforcing bars. The parallel finite element program is developed utilizing Compute Unified Device Architecture (CUDA). The accuracy of the GPU-based parallel program including single precision and double precision was verified in comparison with ABAQUS. The numerical results demonstrated that the proposed algorithm can take full advantage of the parallel architecture of the GPU, and achieve the goal of speeding up the computation compared with CPU.

Design of a Parallel Rendering Processor Architecture with Effective Memory System (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 설계)

  • Park Woo-Chan;Yoon Duk-Ki;Kim Kyoung-Su
    • The KIPS Transactions:PartA
    • /
    • v.13A no.4 s.101
    • /
    • pp.305-316
    • /
    • 2006
  • Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. For the above two goals, effective memory organizations including a new pixel cache architecture are presented. The experimental results show that the proposed architecture achieves almost linear speedup at best case even in sixteen rasterizers.

High-Performance Low-Power FFT Cores

  • Han, Wei;Erdogan, Ahmet T.;Arslan, Tughrul;Hasan, Mohd.
    • ETRI Journal
    • /
    • v.30 no.3
    • /
    • pp.451-460
    • /
    • 2008
  • Recently, the power consumption of integrated circuits has been attracting increasing attention. Many techniques have been studied to improve the power efficiency of digital signal processing units such as fast Fourier transform (FFT) processors, which are popularly employed in both traditional research fields, such as satellite communications, and thriving consumer electronics, such as wireless communications. This paper presents solutions based on parallel architectures for high throughput and power efficient FFT cores. Different combinations of hybrid low-power techniques are exploited to reduce power consumption, such as multiplierless units which replace the complex multipliers in FFTs, low-power commutators based on an advanced interconnection, and parallel-pipelined architectures. A number of FFT cores are implemented and evaluated for their power/area performance. The results show that up to 38% and 55% power savings can be achieved by the proposed pipelined FFTs and parallel-pipelined FFTs respectively, compared to the conventional pipelined FFT processor architectures.

  • PDF

Pattern Classification with the Analog Cellular Parallel Processing Networks (아날로그 셀룰라 병렬 처리 회로망(CPPN)을 이용한 Pattern Classification)

  • 오태완;이혜정;김형석
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2367-2370
    • /
    • 2003
  • A fast pattern classification algorithm with Cellular Parallel Processing Network-based dynamic programming is proposed. The Cellular Parallel Processing Networks is an analog parallel processing architecture and the dynamic programming is an efficient computation algorithm for optimization problem. Combining merits of these two technologies, fast Pattern classification with optimization is formed. On such CPPN-based dynamic programming, if exemplars and test patterns are presented as the goals and the start positions, respectively, the optimal paths from test patterns to their closest exemplars are found. Such paths are utilized as aggregating keys for the classification. The pattern classification is performed well regardless of degree of the nonlinearity in class borders.

  • PDF

A 4-parallel Scheduling Architecture for High-performance H.264/AVC Deblocking Filter (고성능 H.264/AVC 디블로킹 필터를 위한 4-병렬 스케줄링 아키텍처)

  • Ko, Byung-Soo;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.49 no.8
    • /
    • pp.63-72
    • /
    • 2012
  • In this paper, we proposed a parallel architecture of line & block edge filter for high-performance H.264/AVC deblocking filter for Quad Full High Definition(Quad FHD) video real time processing. To improve throughput, we designed 4-parallel block edge filter with 16 line edge filter. To reduce internal buffer size and processing cycle, we scheduled 4-parallel zig-zag scan order as deblocking filtering order. To avoid data conflicts we placed 1 delay cycle between block edge filtering. We implemented interleaving buffer, as internal buffer of block edge filter, to sharing buffer for reducing buffer size. The proposed architecture was simulated in 0.18um standard cell library. The maximum operation frequency is 108MHz. The gate count is 140.16Kgates. The proposed H.264/AVC deblocking filter can support Quad FHD at 113.17 frames per second by running at 90MHz.

A Reconfigurable Digital Signal Processing Architecture for the Evolvable Hardware System (진화 하드웨어 시스템을 위한 재구성 가능한 디지털 신호처리 구조)

  • Lee, Han-Ho;Choi, Chang-Seok;Lee, Yong-Min;Choi, Jin-Tack;Lee, Chong-Ho;Chung, Duk-Jin
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.663-664
    • /
    • 2006
  • This paper presents a reconfigurable digital signal processing(rDSP) architecture that is effective for implementing adaptive digital signal processing in the applications of smart health care system. This rDSP architecture employs an evolution capability of FIR filters using genetic algorithm. Parallel genetic algorithm based rDSP architecture evolves FIR filters to explore optimal configuration of filter combination, associated parameters, and structure of feature space adaptively to noisy environments for an adaptive signal processing. The proposed DSP architecture is implemented using Xilinx Virtex4 FPGA device and SMIC 0.18um CMOS Technology.

  • PDF

A Parallel Processing of Finding Neighbor Agents in Flocking Behaviors Using GPU (GPU를 이용한 무리 짓기에서 이웃 에이전트 찾기의 병렬 처리)

  • Lee, Jae-Moon
    • Journal of Korea Game Society
    • /
    • v.10 no.5
    • /
    • pp.95-102
    • /
    • 2010
  • This paper proposes a parallel algorithm of the flocking behaviors using GPU. To do this, we used CUDA as the parallel processing architecture of GPU and then analyzed its characteristics and constraints. Based on them, the paper improved the performance by parallelizing to find the neighbors for an agent which requires the largest cost in the flocking behaviors. We implemented the proposed algorithm on GTX 285 GPU and compared experimentally its performance with the original spatial partitioning method. The results of the comparison showed that the proposed algorithm outperformed the original method up to 9 times with respect to the execution time.

Three-Parallel Reed-Solomon based Forward Error Correction Architecture for 100Gb/s Optical Communications (100Gb/s급 광통신시스템을 위한 3-병렬 Reed-Solomon 기반 FEC 구조 설계)

  • Choi, Chang-Seok;Lee, Han-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.11
    • /
    • pp.48-55
    • /
    • 2009
  • This paper presents a high-speed Forward Error Correction (FEC) architecture based on three-parallel Reed-Solomon (RS) decoder for next-generation 100-Gb/s optical communication systems. A high-speed three-parallel RS(255,239) decoder has been designed and the derived structure can also be applied to implement the 100-Gb/s RS-FEC architecture. The proposed 100-Gb/s RS-FEC has been implemented with 0.13-${\mu}m$ CMOS standard cell technology in a supply voltage of 1.2V. The implementation results show that 16-Ch. RS-FEC architecture can operate at a clock frequency of 300MHz and has a throughput of 115-Gb/s for 0.13-${\mu}m$ CMOS technology. As a result, the proposed three-parallel RS-FEC architecture has a much higher data processing rate and low hardware complexity compared with the conventional two-parallel, three-parallel and serial RS-FEC architectures.

A Study on the CAM Designed by Adopting Best-Match Method using Parallel Processing Architecture (병렬 처리 구조를 이용한 최적 정합 방식 CAM 설계에 관한 연구)

  • 김상복;박노경;차균현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.6
    • /
    • pp.1056-1063
    • /
    • 1994
  • In this paper a content addressable memory (CAM) is designed by adopting best-match method. It has a single processing element(PE) architecture with high computational efficiency and throughput. It is composed of three main functional blocks(input MUX, best-match CAM, control part). It support fully parallel processing. Logic simulation is completed by using QUICKSIM, Circuit simulation is performanced by using HSPICE. Its layout is based on the ETRI 3 m n-well process design rules. Its maximum operating frequency is 20 MHz.

  • PDF