Search | Korea Science

Exploiting Parallelism in the Block Encryption Algorithms RC6 and Rijndael (블록 암호화 알고리즘 RC6 및 Rijndael에서의 병렬성 활용)

정용화;정교일;손승원
- Journal of the Korea Institute of Information Security & Cryptology
- /
- 제11권2호
- /
- pp.3-12
- /
- 2001
Currently, the superscalar architecture dominates todays microprocessor marketplase. As, more transistors are integrated onto larger die, however, an on-chip multiprocessor is regarded as a promising alternative to the superscalar microprocessor. This paper examines the behavior of the next generation block encryption algorithms RC6 and Rijndael on the on-chip multiprocessing microprocessor. Based on the simulation results by using a program-driven simulator, the on-chip multiprocessor can exploit thread level parallelism effectively and overcome the limitation of instruction level parallelism in the next generation block encryption algorithms.
https://doi.org/10.13089/JKIISC.2001.11.2.3 인용 PDF HTML

Comparison of Parallel Preconditioners for Solving Large Sparse Linear Systems on a Massively Parallel Machine (대형이산 행렬 시스템의 초대형병렬컴퓨터에서의 해법을 위한 병렬준비 행렬의 비교)

Ma, Sang-Baek
- The Transactions of the Korea Information Processing Society
- /
- 제2권4호
- /
- pp.535-542
- /
- 1995
In this paper we present two preconditioners for solving large sparse linear systems arising from elliptic partial differential equations on massively parallel machines, such as the CM-5. Most massively parallel machines do heavily rely on the message-passing for the interprocessor communications. but according to the current manufacturing standards the cost of communications is very high compared to that of floating point arithmetic computations. Due to this we need an algorithm which minimizes the amount of interprocessor communication on the massively parallel machines. We will show that Block SOR(Successive Over Relaxation) method coupled with the multi-coloring technique is one of such preconditioner on the massively parallel machines, by conducting experiments in the CM-5. Also, we implemented the ADI(Alternation Direction Implicit) method in the CM-5, which has been conventionally one of the most powerful parallel preconditioner. Our experiment shows that Block SOR method coupled with the multi-coloring technique could yield a speedup with 50% efficiency with the range of number of processors form 16 to 512 for a matrix with dimension 512x512. On the other hand, the ADI method shows a very poor performance.
PDF

Design of Multiprocess Models for Parallel Protocol Implementation (병렬 프로토콜 구현을 위한 다중 프로세스 모델의 설계)

Choi, Sun-Wan;Chung, Kwang-Sue
- The Transactions of the Korea Information Processing Society
- /
- 제4권10호
- /
- pp.2544-2552
- /
- 1997
This paper presents three multiprocess models for parallel protocol implementation, that is, (1)channel communication model, (2)fork-join model, and (3)event polling model. For the specification of parallelism for each model, a parallel programming language, Par. C System, is used. to measure the performance of multiprocess models, we implemented the Internet Protocol Suite(IPS) Internet Protocol (IP) for each model by writing the parallel language on the Transputer. After decomposing the IP functions into two parts, that is, the sending side and the receiving side, the parallelism in both sides is exploited in the form of Multiple Instruction Single Data (MISD). Three models are evaluated and compared on the basis of various run-time overheads, such as an event sending via channels in the parallel channel communication model, process creating in the fork-join model and context switching in the event polling model, at the sending side and the receiving side. The event polling model has lower processing delays as about 77% and 9% in comparison with the channel communication model and the fork-join model at the sending side, respectively. At the receiving side, the fork-join model has lower processing delays as about 55% and 107% in comparison with the channel communication model and the event polling model, respectively.
PDF

A Dynamic Co-scheduling Scheme for MPI-based Parallel Programs on Linux Clusters (리눅스 클러스터에서 MPI 기반 병렬 프로그램의 동적 동시 스케줄링 기법)

Kim, Hyuk;Rhee, Yun-Seok
- Journal of the Korea Society of Computer and Information
- /
- 제13권1호
- /
- pp.29-35
- /
- 2008
For efficient message passing of Parallel programs, it is required to schedule the involved two processes at the same time which are executed on different nodes, that is called 'co-scheduling' However, each node of cluster systems is built on top of general purpose multitasking OS. which autonomously manages local Processes. Thus it is not so easy to co-schedule two (or more) processes in such computing environment. Our work proposes a co-scheduling scheme for MPI-based parallel programs which exploits message exchange information between two parties. We implement the scheme on Linux cluster which requires slight kernel hacking and MPI library modification. The experiment with NPB parallel suite shows that our scheme results in 33-56% reduction in the execution time compared to the typical scheduling case. and especially better Performance in more communication-bound applications.
PDF

Smartphone Real Time Streaming Service using Parallel TCP Transmission (병렬 TCP 통신을 이용한 스마트폰 실시간 스트리밍 서비스)

Kim, Jang-Young
- Journal of the Korea Institute of Information and Communication Engineering
- /
- 제20권5호
- /
- pp.937-941
- /
- 2016
This paper proposed an efficient multiple TCP mechanism using Android smartphones for remote control video Wi-Fi stream transmission via network communications in real time. The wireless video stream transmission mechanism can be applied in various area such as real time server stream transmissions, movable drones, disaster robotics and real time security monitoring systems. Moreover, we urgently need to transmit data in timely fashion such as medical emergency, security surveillance and disaster prevention. Our parallel TCP transmission system can play an important role in several area such as real time server stream transmissions, movable drones, disaster robotics and real time security monitoring systems as mentioned in the previous sentence. Therefore, we designed and implemented a parallel TCP transmission (parallel stream) for an efficient real time video streaming services. In conclusion, we evaluated proposed mechanism using parallel TCP transmission under various environments with performance analysis.
https://doi.org/10.6109/jkiice.2016.20.5.937 인용 PDF KSCI

Design of Parallel Processing of Lane Detection System Based on Multi-core Processor (멀티코어를 이용한 차선 검출 병렬화 시스템 설계)

Lee, Hyo-Chan;Moon, Dai-Tchul;Park, In-hag;Heo, Kang
- Journal of the Korea Institute of Information and Communication Engineering
- /
- 제20권9호
- /
- pp.1778-1784
- /
- 2016
we improved the performance by parallelizing lane detection algorithms. Lane detection, as a intellectual assisting system, helps drivers make an alarm sound or revise the handle in response of lane departure. Four kinds of algorithms are implemented in order as following, Gaussian filtering algorithm so as to remove the interferences, gray conversion algorithm to simplify images, sobel edge detection algorithm to find out the regions of lanes, and hough transform algorithm to detect straight lines. Among parallelized methods, the data level parallelism algorithm is easy to design, yet still problem with the bottleneck. The high-speed data level parallelism is suggested to reduce this bottleneck, which resulted in noticeable performance improvement. In the result of applying actual road video of black-box on our parallel algorithm, the measurement, in the case of single-core, is approximately 30 Frames/sec. Furthermore, in the case of octa-core parallelism, the data level performance is approximately 100 Frames/sec and the highest performance comes close to 150 Frames/sec.
https://doi.org/10.6109/jkiice.2016.20.9.1778 인용 PDF KSCI

GPGPU Acceleration of SAT Algorithm with Propagation Routine Parallelization (전달 루틴의 병렬화를 통한 SAT 알고리즘의 GPGPU 가속화)

Kang, Hyeong-Ju
- Journal of the Korea Institute of Information and Communication Engineering
- /
- 제20권10호
- /
- pp.1919-1926
- /
- 2016
Because of the enormous processing ability, General-Purpose Graphics Processing Unit(GPGPU) has been applied to many fields including electronics design automation. The SAT algorithm is one of the core algorithm in many electronics design automation tools. There has been some efforts to apply GPGPU to the SAT algorithm, but it is difficult to parallelize the SAT algorithm because of its characteristics. In this paper, I applied GPGPU to the SAT algorithm by parallelizing the propagation routine that is relatively suitable to parallel processing. On the basis of the similarity of the propagation routine to the sparse matrix multiplication, the data structure for the SAT problem is constituted, and the parallel propagation routine is described. To prevent data loss between paralllel threads, atomic operations are exploited. The experimental results for some benchmark SAT problems show that the proposed algorithm is superior to the previous GPGPU-based SAT solver.
https://doi.org/10.6109/jkiice.2016.20.10.1919 인용 PDF KSCI

A Performance Evaluation of Parallel Color Conversion based on the Thread Number on Multi-core Systems (멀티코어 시스템에서 쓰레드 수에 따른 병렬 색변환 성능 검증)

Kim, Cheong Ghil
- Journal of Satellite, Information and Communications
- /
- 제9권4호
- /
- pp.73-76
- /
- 2014
With the increasing popularity of multi-core processors, they have been adopted even in embedded systems. Under this circumstance many multimedia applications can be parallelized on multi-core platforms because they usually require heavy computations and extensive memory accesses. This paper proposes an efficient thread-level parallel implementation for color space conversion on multi-core CPU. Thread-level parallelism has been becoming very useful parallel processing paradigm especially on shared memory computing systems. In this work, it is exploited by allocating different input pixels to each thread for concurrent loop executions. For the performance evaluation, this paper evaluate the performace improvements for color conversion on multi-core processors based on the processing speed comparison between its serial implementation and parallel ones. The results shows that thread-level parallel implementations show the overall similar ratios of performance improvements regardless of different multi-cores.
PDF KSCI

Performance Improvement of Prediction-Based Parallel Gate-Level Timing Simulation Using Prediction Accuracy Enhancement Strategy (예측정확도 향상 전략을 통한 예측기반 병렬 게이트수준 타이밍 시뮬레이션의 성능 개선)

Yang, Seiyang
- KIPS Transactions on Computer and Communication Systems
- /
- 제5권12호
- /
- pp.439-446
- /
- 2016
In this paper, an efficient prediction accuracy enhancement strategy is proposed for improving the performance of the prediction-based parallel event-driven gate-level timing simulation. The proposed new strategy adopts the static double prediction and the dynamic prediction for input and output values of local simulations. The double prediction utilizes another static prediction data for the secondary prediction once the first prediction fails, and the dynamic prediction tries to use the on-going simulation result accumulated dynamically during the actual parallel simulation execution as prediction data. Therefore, the communication overhead and synchronization overhead, which are the main bottleneck of parallel simulation, are maximally reduced. Throughout the proposed two prediction enhancement techniques, we have observed about 5x simulation performance improvement over the commercial parallel multi-core simulation for six test designs.
https://doi.org/10.3745/KTCCS.2016.5.12.439 인용 PDF KSCI

Implementation of Parallel Computer Generated Hologram Using Multi-GPGPU (다중 GPGPU를 이용한 컴퓨터 생성 홀로그램의 병렬화 구현)

Seo, Young-Ho;Lee, Yoon-Hyuk;Kim, Dong-Wook
- Journal of the Korea Institute of Information and Communication Engineering
- /
- 제18권5호
- /
- pp.1177-1186
- /
- 2014
Computer-generated hologram (CGH) is to mathematically model optical phenomenon with digital computer. Because it requires huge amount of computational power, a fast and high performance technique is needed. In this paper, we proposed two parallelizations for CGH calculation. The first is to parallelize CGH algorithm in a GPU (general processing unit) and the second is to parallelize multiple GPUs. The proposed algorithm was implemented in GTX780 Ti GPU. It calculates a $1,024{\times}1,024$ hologram with 10K object points for about 24ms.
https://doi.org/10.6109/jkiice.2014.18.5.1177 인용 PDF KSCI

검색결과 1,127건 처리시간 0.025초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)