Search | Korea Science

Parallel Computation of FDTD algorithm using CUDA (CUDA를 이용한 FDTD 알고리즘의 병렬처리)

Lee, Ho-Young;Park, Jong-Hyun;Kim, Jun-Seong
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.47 no.4
- /
- pp.82-87
- /
- 2010
Modern GPUs(Graphic Processing Units) provide computing capability higher than that of the general CPUs(Central Processor Units). With supports of programmability of graphics pipeline GP-GPU(General Purpose computation on GPU) has gained much attention expanding its application area. This paper compares sequential and massively parallel implementations of FDTD(Finite Difference Time Domain) algorithm using CUDA(Compute Unified Device Architecture). Experimental results show upto 45X speedup over conventional CPU execution.
PDF KSCI

Performance Evaluation of Value Predictor in High Performance Microprocessors (고성능 마이크로프로세서에서 값 예측기의 성능평가)

Jeon Byoung-Chan;Kim Hyeock-Jin;RU Dae-Hee
- Journal of the Korea Society of Computer and Information
- /
- v.10 no.2 s.34
- /
- pp.87-95
- /
- 2005
value prediction in high performance micro processors is a technique that exploits Instruction Level Parallelism(ILP) by predicting the outcome of an instruction and by breaking and executing true data dependences. In this paper, the mean Performance improvements by predictor according to a point of time for update of each table as well as prediction accuracy and Prediction rate are measured and assessed by comparison and analysis of value predictor that issues in parallel and run by predicting value, which is for Performance improvements of ILP in micro Processor. For the verification of its validity the SPECint95 benchmark through the simulation is compared by making use of execution driven system.
PDF

Comparative and Combined Performance Studies of OpenMP and MPI Codes (OpenMP와 MPI 코드의 상대적, 혼합적 성능 고찰)

Lee Myung-Ho
- The KIPS Transactions:PartA
- /
- v.13A no.2 s.99
- /
- pp.157-162
- /
- 2006
Recent High Performance Computing (HPC) platforms can be classified as Shared-Memory Multiprocessors (SMP), Massively Parallel Processors (MPP), and Clusters of computing nodes. These platforms are deployed in many scientific and engineering applications which require very high demand on computing power. In order to realize an optimal performance for these applications, it is crucial to find and use the suitable computing platforms and programming paradigms. In this paper, we use SPEC HPC 2002 benchmark suite developed in various parallel programming models (MPI, OpenMP, and hybrid of MPI/OpenMP) to find an optimal computing environments and programming paradigms for them through their performance analyses.
https://doi.org/10.3745/KIPSTA.2006.13A.2.157 인용 PDF KSCI

Multicore Processor based Parallel SVM for Video Surveillance System (비디오 감시 시스템을 위한 멀티코어 프로세서 기반의 병렬 SVM)

Kim, Hee-Gon;Lee, Sung-Ju;Chung, Yong-Wha;Park, Dai-Hee;Lee, Han-Sung
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.21 no.6
- /
- pp.161-169
- /
- 2011
Recent intelligent video surveillance system asks for development of more advanced technology for analysis and recognition of video data. Especially, machine learning algorithm such as Support Vector Machine (SVM) is used in order to recognize objects in video. Because SVM training demands massive amount of computation, parallel processing technique is necessary to reduce the execution time effectively. In this paper, we propose a parallel processing method of SVM training with a multi-core processor. The results of parallel SVM on a 4-core processor show that our proposed method can reduce the execution time of the sequential training by a factor of 2.5.
https://doi.org/10.13089/JKIISC.2011.21.6.161 인용 PDF KSCI HTML

Interconnection Network Interfaces in Parallel Computer Systems (병렬 컴퓨터 시스템에서의 상호연결망 인터페이스)

Mo, Sang-Man;Sin, Sang-Seok;Han, U-Jong;Yun, Seok-Han
- Electronics and Telecommunications Trends
- /
- v.12 no.5 s.47
- /
- pp.62-72
- /
- 1997
상호연결망 인터페이스는 병렬 컴퓨터 시스템에서 노드 또는 프로세서를 상호연결망에 연결하는 다리 역할을 수행하는 정합 장치로서, 상호연결망으로 메시지를 송수신하는 기능을 수행한다. 본 논문에서는 상호연결망 인터페이스의 구조와 동작, 프로세서와의 인터페이스, 여러 종류의 상호 연결망 인터페이스에 대한 사례조사 결과, 상호연결망 인터페이스의 성능 및 설계 고려사항 등을 기술 한다. 상호연결망 인터페이스 설계의 초점은 상호연결망 인터페이스가 시스템의 병목지점이 되지 않도록 하는데 맞추어져야 하며, 이를 위하여 응용 분야를 충분히 고려하고 전송 대역폭을 극대화하고 지연 시간을 최소화하도록 구현되어야 한다. 또한, 오류 제어를 통하여 높은 전송 신뢰도를 제공하고, 효율적인 프로세서 인터페이스 및 프로그래밍 인터페이스를 제공해야 한다.
https://doi.org/10.22648/ETRI.1997.J.120506 인용 PDF

A Fast 64$\times$64-bit Multiplier for Crypto-Processor (암호 프로세서용 고속 64$\times$64 곱셈기)

서정욱;이상흥
- Proceedings of the Korea Institutes of Information Security and Cryptology Conference
- /
- 1998.12a
- /
- pp.471-481
- /
- 1998
피승수를 승수로 곱하는 곱셈연산은 승수에 대한 많은 부분곱을 더하기 때문에 본질적으로 느린 연산이다. 특히, 큰 수를 사용하는 암호 프로세서에서는 매우 빠른 곱셈기가 요구된다. 현재까지 느린 연산의 개선책으로 radix 4, radix 8, 또는 radix 16의 변형 부스 알고리즘을 사용하여 부분곱의 수를 줄이려는 연구와 더불어 Wallace tree나 병렬 카운터를 사용하여 부분곱의 합을 빠르게 연산하는 방법이 연구되어 왔다. 본 논문에서는 암호 프로세서용 64$\times$64 비트 곱셈기를 구현하는데 있어서, 고속의 곱셈을 위하여 고속의 병렬 카운터를 제안하였으며, radix 4의 변형 부스 알고리즘을 이용하여 부분합을 만들고 부분합의 덧셈은 제안한 카운터를 사용하였다. 64$\times$64 비트 곱셈기를 구현함에 있어서 본 논문에서 제안된 카운터를 이용하는 것이 속도 면에서 Wallace scheme또는 Dadda scheme을 적용하여 구현하는 것 보다 31% 정도, Mehta의 카운터를 적용하여 구현하는 것 보다 21% 정도 개선되었다.
PDF

Design and analysis of a parallel high speed DSP system (병렬 고속 디지털 신호처리시스템의 설계 및 성능분석)

박경택;전창호;박성주;이동호;박준석;오원천;한기택
- Proceedings of the IEEK Conference
- /
- 1998.06a
- /
- pp.503-506
- /
- 1998
본 연구에서는 방대한 양의 데이터를 실시간으로 처리하기 위한 병렬 고속 디지털 신호처리시스템을 제안한다. 시스템의 성능을 평가할 수 있는 확률적인 분석방법을 제시하며, FFT 와 같이 보드간 또는 프로세서간 통신부담이 많은 알고리즘과 행렬연산과 같이 통신부담이 적은 알고리즘에 적용하여 본다. 제안한 시스템의 다양한 구성에 대하여 두 가지 알고리듬의 성능을 확률적 방법으로 평가하였으며, 그 결과는 알고리즘 분석에 듸한 성능수치와 근접함을 확인하였다. FFT는 프로세서 개수가 증가해도 보드수가 많아지면 성능이 감소하였으며, 행렬연산은 프로세서 개수에 비례하여 시스템의 성능이 선형적으로 증가함을 확인하였다.
PDF

OpenMP Implementation using POSIX thread library on ARM11MPCore (ARM11MPCore에서 POSIX 쓰레드를 이용한 OpenMP 구현)

Lee, Jae-Won;Jeun, Woo-Chul;Ha, Soon-Hoi
- Proceedings of the Korean Information Science Society Conference
- /
- 2007.10b
- /
- pp.414-418
- /
- 2007
멀티프로세서 환경에서 OpenMP는 MPI 에 비해 병렬 프로그래밍을 쉽게 할 수 있다는 장점을 가지고 있고, OpenMP는 표준이 없는 병렬 프로그래밍 세계에서 실질적인 표준으로써 인정받고 있다. OPenMP는 대상 플랫폼에 따라 OpenMP 구현을 다르게 해야 하기 때문에 새로운 프로세서가 등장하면 그에 맞는 OpenMP구현을 만들어야 한다. 이 논문에선 다중 프로세서 시스템-온-칩 시스템인 ARM11MPCore 시스템 위에 POSIX 쓰레드에 기반하여 OpenMP 환경을 구축하고 그 성능을 측정한다.
PDF

Genetic Algorithms for Efficient Multiprocessor Scheduling (효율적인 멀티프로세서 스케줄링을 위한 전자 알고리즘 설계)

Park, Weol-Seon;Park, Sang-Il;Nam, Eun-Mi;Youn, Sung-Dae
- Proceedings of the Korea Information Processing Society Conference
- /
- 2000.04a
- /
- pp.550-556
- /
- 2000
본 논문은 NP-complete문제중의 하나인 순서제약이 있는 병렬프로그램을 멀티프로세서 시스템 상에서 효율적으로 분배하기 위한 유전자 알고리즘 설계 방법을 제안한다. 순서제약 조건을 만족하게 하는 새로운 염색체 코딩방법 및 휴리스틱한 스케줄링 알고리즘으로 정법한 해를 생성하고 프로세서 효율성을 고려한 평가 함수(evaluation function)와 우수한 유전인자를 이용하여 교배하는 교배연산자 등을 제안하였다. 그리고 제안한 알고리즘을 실험한 결과, 순서제약이 있는 다양한 형태(topology)의 병렬프로그램 스케줄링 문제에 대해서 제안한 유전자 알고리즘의 타당성을 확인하였다.
PDF

Computation-Communication Overlapping in AES-CCM Using Thread-Level Parallelism on a Multi-Core Processor (멀티코어 프로세서의 쓰레드-수준 병렬성을 활용한 AES-CCM 계산-통신 중첩화)

Lee, Eun-Ji;Lee, Sung-Ju;Chung, Yong-Wha;Lee, Myung-Ho;Min, Byoung-Ki
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.8
- /
- pp.863-867
- /
- 2010
Multi-core processors are becoming increasingly popular. As they are widely adopted in embedded systems as well as desktop PC's, many multimedia applications are being parallelized on multi-core platforms. However, it is difficult to parallelize applications with inherent data dependencies such as encryption algorithms for multimedia data. In order to overcome this limit, we propose a technique to overlap computation and communication using an otherwise idle core in this paper. In particular, we interpret the problem of multimedia computation and communication as a pipeline design problem at the application program level, and derive an optimal number of stages in the pipeline.
PDF KSCI

Search Result 579, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)