Search | Korea Science

A dynamic analysis algorithm for RC frames using parallel GPU strategies

Li, Hongyu;Li, Zuohua;Teng, Jun
- Computers and Concrete
- /
- v.18 no.5
- /
- pp.1019-1039
- /
- 2016
In this paper, a parallel algorithm of nonlinear dynamic analysis of three-dimensional (3D) reinforced concrete (RC) frame structures based on the platform of graphics processing unit (GPU) is proposed. Time integration is performed using Newmark method for nonlinear implicit dynamic analysis and parallelization strategies are presented. Correspondingly, a parallel Preconditioned Conjugate Gradients (PCG) solver on GPU is introduced for repeating solution of the equilibrium equations for each time step. The RC frames were simulated using fiber beam model to capture nonlinear behaviors of concrete and reinforcing bars. The parallel finite element program is developed utilizing Compute Unified Device Architecture (CUDA). The accuracy of the GPU-based parallel program including single precision and double precision was verified in comparison with ABAQUS. The numerical results demonstrated that the proposed algorithm can take full advantage of the parallel architecture of the GPU, and achieve the goal of speeding up the computation compared with CPU.
https://doi.org/10.12989/cac.2016.18.5.1019 인용 KSCI

Design of a Parallel Rendering Processor Architecture with Effective Memory System (효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 설계)

Park Woo-Chan;Yoon Duk-Ki;Kim Kyoung-Su
- The KIPS Transactions:PartA
- /
- v.13A no.4 s.101
- /
- pp.305-316
- /
- 2006
Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. For the above two goals, effective memory organizations including a new pixel cache architecture are presented. The experimental results show that the proposed architecture achieves almost linear speedup at best case even in sixteen rasterizers.
https://doi.org/10.3745/KIPSTA.2006.13A.4.305 인용 PDF KSCI

A VLSI Architecture for parallel processing of an ordering type computational structure (Ordering Type 계산구조의 병렬처리를 위한 VLSI Architecture)

Kim, Hyeong-Gon;O, Myeong-Hwan
- Proceedings of the KIEE Conference
- /
- 1986.07a
- /
- pp.473-477
- /
- 1986
PDF

High-Performance Low-Power FFT Cores

Han, Wei;Erdogan, Ahmet T.;Arslan, Tughrul;Hasan, Mohd.
- ETRI Journal
- /
- v.30 no.3
- /
- pp.451-460
- /
- 2008
Recently, the power consumption of integrated circuits has been attracting increasing attention. Many techniques have been studied to improve the power efficiency of digital signal processing units such as fast Fourier transform (FFT) processors, which are popularly employed in both traditional research fields, such as satellite communications, and thriving consumer electronics, such as wireless communications. This paper presents solutions based on parallel architectures for high throughput and power efficient FFT cores. Different combinations of hybrid low-power techniques are exploited to reduce power consumption, such as multiplierless units which replace the complex multipliers in FFTs, low-power commutators based on an advanced interconnection, and parallel-pipelined architectures. A number of FFT cores are implemented and evaluated for their power/area performance. The results show that up to 38% and 55% power savings can be achieved by the proposed pipelined FFTs and parallel-pipelined FFTs respectively, compared to the conventional pipelined FFT processor architectures.
PDF

Pattern Classification with the Analog Cellular Parallel Processing Networks (아날로그 셀룰라 병렬 처리 회로망(CPPN)을 이용한 Pattern Classification)

오태완;이혜정;김형석
- Proceedings of the IEEK Conference
- /
- 2003.07e
- /
- pp.2367-2370
- /
- 2003
A fast pattern classification algorithm with Cellular Parallel Processing Network-based dynamic programming is proposed. The Cellular Parallel Processing Networks is an analog parallel processing architecture and the dynamic programming is an efficient computation algorithm for optimization problem. Combining merits of these two technologies, fast Pattern classification with optimization is formed. On such CPPN-based dynamic programming, if exemplars and test patterns are presented as the goals and the start positions, respectively, the optimal paths from test patterns to their closest exemplars are found. Such paths are utilized as aggregating keys for the classification. The pattern classification is performed well regardless of degree of the nonlinearity in class borders.
PDF

A 4-parallel Scheduling Architecture for High-performance H.264/AVC Deblocking Filter (고성능 H.264/AVC 디블로킹 필터를 위한 4-병렬 스케줄링 아키텍처)

Ko, Byung-Soo;Kong, Jin-Hyeung
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.49 no.8
- /
- pp.63-72
- /
- 2012
In this paper, we proposed a parallel architecture of line & block edge filter for high-performance H.264/AVC deblocking filter for Quad Full High Definition(Quad FHD) video real time processing. To improve throughput, we designed 4-parallel block edge filter with 16 line edge filter. To reduce internal buffer size and processing cycle, we scheduled 4-parallel zig-zag scan order as deblocking filtering order. To avoid data conflicts we placed 1 delay cycle between block edge filtering. We implemented interleaving buffer, as internal buffer of block edge filter, to sharing buffer for reducing buffer size. The proposed architecture was simulated in 0.18um standard cell library. The maximum operation frequency is 108MHz. The gate count is 140.16Kgates. The proposed H.264/AVC deblocking filter can support Quad FHD at 113.17 frames per second by running at 90MHz.
PDF KSCI

A Reconfigurable Digital Signal Processing Architecture for the Evolvable Hardware System (진화 하드웨어 시스템을 위한 재구성 가능한 디지털 신호처리 구조)

Lee, Han-Ho;Choi, Chang-Seok;Lee, Yong-Min;Choi, Jin-Tack;Lee, Chong-Ho;Chung, Duk-Jin
- Proceedings of the IEEK Conference
- /
- 2006.06a
- /
- pp.663-664
- /
- 2006
This paper presents a reconfigurable digital signal processing(rDSP) architecture that is effective for implementing adaptive digital signal processing in the applications of smart health care system. This rDSP architecture employs an evolution capability of FIR filters using genetic algorithm. Parallel genetic algorithm based rDSP architecture evolves FIR filters to explore optimal configuration of filter combination, associated parameters, and structure of feature space adaptively to noisy environments for an adaptive signal processing. The proposed DSP architecture is implemented using Xilinx Virtex4 FPGA device and SMIC 0.18um CMOS Technology.
PDF

A Parallel Processing of Finding Neighbor Agents in Flocking Behaviors Using GPU (GPU를 이용한 무리 짓기에서 이웃 에이전트 찾기의 병렬 처리)

Lee, Jae-Moon
- Journal of Korea Game Society
- /
- v.10 no.5
- /
- pp.95-102
- /
- 2010
This paper proposes a parallel algorithm of the flocking behaviors using GPU. To do this, we used CUDA as the parallel processing architecture of GPU and then analyzed its characteristics and constraints. Based on them, the paper improved the performance by parallelizing to find the neighbors for an agent which requires the largest cost in the flocking behaviors. We implemented the proposed algorithm on GTX 285 GPU and compared experimentally its performance with the original spatial partitioning method. The results of the comparison showed that the proposed algorithm outperformed the original method up to 9 times with respect to the execution time.
PDF KSCI

Three-Parallel Reed-Solomon based Forward Error Correction Architecture for 100Gb/s Optical Communications (100Gb/s급 광통신시스템을 위한 3-병렬 Reed-Solomon 기반 FEC 구조 설계)

Choi, Chang-Seok;Lee, Han-Ho
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.46 no.11
- /
- pp.48-55
- /
- 2009
This paper presents a high-speed Forward Error Correction (FEC) architecture based on three-parallel Reed-Solomon (RS) decoder for next-generation 100-Gb/s optical communication systems. A high-speed three-parallel RS(255,239) decoder has been designed and the derived structure can also be applied to implement the 100-Gb/s RS-FEC architecture. The proposed 100-Gb/s RS-FEC has been implemented with 0.13-${\mu}m$ CMOS standard cell technology in a supply voltage of 1.2V. The implementation results show that 16-Ch. RS-FEC architecture can operate at a clock frequency of 300MHz and has a throughput of 115-Gb/s for 0.13-${\mu}m$ CMOS technology. As a result, the proposed three-parallel RS-FEC architecture has a much higher data processing rate and low hardware complexity compared with the conventional two-parallel, three-parallel and serial RS-FEC architectures.
PDF KSCI

A Study on the CAM Designed by Adopting Best-Match Method using Parallel Processing Architecture (병렬 처리 구조를 이용한 최적 정합 방식 CAM 설계에 관한 연구)

김상복;박노경;차균현
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.19 no.6
- /
- pp.1056-1063
- /
- 1994
In this paper a content addressable memory (CAM) is designed by adopting best-match method. It has a single processing element(PE) architecture with high computational efficiency and throughput. It is composed of three main functional blocks(input MUX, best-match CAM, control part). It support fully parallel processing. Logic simulation is completed by using QUICKSIM, Circuit simulation is performanced by using HSPICE. Its layout is based on the ETRI 3 m n-well process design rules. Its maximum operating frequency is 20 MHz.
PDF

Search Result 397, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)