• 제목/요약/키워드: 병렬 통신

Search Result 1,128, Processing Time 0.03 seconds

Comparison of Parallel Preconditioners for Solving Large Sparse Linear Systems on a Massively Parallel Machine (대형이산 행렬 시스템의 초대형병렬컴퓨터에서의 해법을 위한 병렬준비 행렬의 비교)

  • Ma, Sang-Baek
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.4
    • /
    • pp.535-542
    • /
    • 1995
  • In this paper we present two preconditioners for solving large sparse linear systems arising from elliptic partial differential equations on massively parallel machines, such as the CM-5. Most massively parallel machines do heavily rely on the message-passing for the interprocessor communications. but according to the current manufacturing standards the cost of communications is very high compared to that of floating point arithmetic computations. Due to this we need an algorithm which minimizes the amount of interprocessor communication on the massively parallel machines. We will show that Block SOR(Successive Over Relaxation) method coupled with the multi-coloring technique is one of such preconditioner on the massively parallel machines, by conducting experiments in the CM-5. Also, we implemented the ADI(Alternation Direction Implicit) method in the CM-5, which has been conventionally one of the most powerful parallel preconditioner. Our experiment shows that Block SOR method coupled with the multi-coloring technique could yield a speedup with 50% efficiency with the range of number of processors form 16 to 512 for a matrix with dimension 512x512. On the other hand, the ADI method shows a very poor performance.

  • PDF

Design of Multiprocess Models for Parallel Protocol Implementation (병렬 프로토콜 구현을 위한 다중 프로세스 모델의 설계)

  • Choi, Sun-Wan;Chung, Kwang-Sue
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.10
    • /
    • pp.2544-2552
    • /
    • 1997
  • This paper presents three multiprocess models for parallel protocol implementation, that is, (1)channel communication model, (2)fork-join model, and (3)event polling model. For the specification of parallelism for each model, a parallel programming language, Par. C System, is used. to measure the performance of multiprocess models, we implemented the Internet Protocol Suite(IPS) Internet Protocol (IP) for each model by writing the parallel language on the Transputer. After decomposing the IP functions into two parts, that is, the sending side and the receiving side, the parallelism in both sides is exploited in the form of Multiple Instruction Single Data (MISD). Three models are evaluated and compared on the basis of various run-time overheads, such as an event sending via channels in the parallel channel communication model, process creating in the fork-join model and context switching in the event polling model, at the sending side and the receiving side. The event polling model has lower processing delays as about 77% and 9% in comparison with the channel communication model and the fork-join model at the sending side, respectively. At the receiving side, the fork-join model has lower processing delays as about 55% and 107% in comparison with the channel communication model and the event polling model, respectively.

  • PDF

A Dynamic Co-scheduling Scheme for MPI-based Parallel Programs on Linux Clusters (리눅스 클러스터에서 MPI 기반 병렬 프로그램의 동적 동시 스케줄링 기법)

  • Kim, Hyuk;Rhee, Yun-Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.1
    • /
    • pp.29-35
    • /
    • 2008
  • For efficient message passing of Parallel programs, it is required to schedule the involved two processes at the same time which are executed on different nodes, that is called 'co-scheduling' However, each node of cluster systems is built on top of general purpose multitasking OS. which autonomously manages local Processes. Thus it is not so easy to co-schedule two (or more) processes in such computing environment. Our work proposes a co-scheduling scheme for MPI-based parallel programs which exploits message exchange information between two parties. We implement the scheme on Linux cluster which requires slight kernel hacking and MPI library modification. The experiment with NPB parallel suite shows that our scheme results in 33-56% reduction in the execution time compared to the typical scheduling case. and especially better Performance in more communication-bound applications.

  • PDF

Smartphone Real Time Streaming Service using Parallel TCP Transmission (병렬 TCP 통신을 이용한 스마트폰 실시간 스트리밍 서비스)

  • Kim, Jang-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.5
    • /
    • pp.937-941
    • /
    • 2016
  • This paper proposed an efficient multiple TCP mechanism using Android smartphones for remote control video Wi-Fi stream transmission via network communications in real time. The wireless video stream transmission mechanism can be applied in various area such as real time server stream transmissions, movable drones, disaster robotics and real time security monitoring systems. Moreover, we urgently need to transmit data in timely fashion such as medical emergency, security surveillance and disaster prevention. Our parallel TCP transmission system can play an important role in several area such as real time server stream transmissions, movable drones, disaster robotics and real time security monitoring systems as mentioned in the previous sentence. Therefore, we designed and implemented a parallel TCP transmission (parallel stream) for an efficient real time video streaming services. In conclusion, we evaluated proposed mechanism using parallel TCP transmission under various environments with performance analysis.

Design of Parallel Processing of Lane Detection System Based on Multi-core Processor (멀티코어를 이용한 차선 검출 병렬화 시스템 설계)

  • Lee, Hyo-Chan;Moon, Dai-Tchul;Park, In-hag;Heo, Kang
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1778-1784
    • /
    • 2016
  • we improved the performance by parallelizing lane detection algorithms. Lane detection, as a intellectual assisting system, helps drivers make an alarm sound or revise the handle in response of lane departure. Four kinds of algorithms are implemented in order as following, Gaussian filtering algorithm so as to remove the interferences, gray conversion algorithm to simplify images, sobel edge detection algorithm to find out the regions of lanes, and hough transform algorithm to detect straight lines. Among parallelized methods, the data level parallelism algorithm is easy to design, yet still problem with the bottleneck. The high-speed data level parallelism is suggested to reduce this bottleneck, which resulted in noticeable performance improvement. In the result of applying actual road video of black-box on our parallel algorithm, the measurement, in the case of single-core, is approximately 30 Frames/sec. Furthermore, in the case of octa-core parallelism, the data level performance is approximately 100 Frames/sec and the highest performance comes close to 150 Frames/sec.

GPGPU Acceleration of SAT Algorithm with Propagation Routine Parallelization (전달 루틴의 병렬화를 통한 SAT 알고리즘의 GPGPU 가속화)

  • Kang, Hyeong-Ju
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.10
    • /
    • pp.1919-1926
    • /
    • 2016
  • Because of the enormous processing ability, General-Purpose Graphics Processing Unit(GPGPU) has been applied to many fields including electronics design automation. The SAT algorithm is one of the core algorithm in many electronics design automation tools. There has been some efforts to apply GPGPU to the SAT algorithm, but it is difficult to parallelize the SAT algorithm because of its characteristics. In this paper, I applied GPGPU to the SAT algorithm by parallelizing the propagation routine that is relatively suitable to parallel processing. On the basis of the similarity of the propagation routine to the sparse matrix multiplication, the data structure for the SAT problem is constituted, and the parallel propagation routine is described. To prevent data loss between paralllel threads, atomic operations are exploited. The experimental results for some benchmark SAT problems show that the proposed algorithm is superior to the previous GPGPU-based SAT solver.

A Performance Evaluation of Parallel Color Conversion based on the Thread Number on Multi-core Systems (멀티코어 시스템에서 쓰레드 수에 따른 병렬 색변환 성능 검증)

  • Kim, Cheong Ghil
    • Journal of Satellite, Information and Communications
    • /
    • v.9 no.4
    • /
    • pp.73-76
    • /
    • 2014
  • With the increasing popularity of multi-core processors, they have been adopted even in embedded systems. Under this circumstance many multimedia applications can be parallelized on multi-core platforms because they usually require heavy computations and extensive memory accesses. This paper proposes an efficient thread-level parallel implementation for color space conversion on multi-core CPU. Thread-level parallelism has been becoming very useful parallel processing paradigm especially on shared memory computing systems. In this work, it is exploited by allocating different input pixels to each thread for concurrent loop executions. For the performance evaluation, this paper evaluate the performace improvements for color conversion on multi-core processors based on the processing speed comparison between its serial implementation and parallel ones. The results shows that thread-level parallel implementations show the overall similar ratios of performance improvements regardless of different multi-cores.

Performance Improvement of Prediction-Based Parallel Gate-Level Timing Simulation Using Prediction Accuracy Enhancement Strategy (예측정확도 향상 전략을 통한 예측기반 병렬 게이트수준 타이밍 시뮬레이션의 성능 개선)

  • Yang, Seiyang
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.12
    • /
    • pp.439-446
    • /
    • 2016
  • In this paper, an efficient prediction accuracy enhancement strategy is proposed for improving the performance of the prediction-based parallel event-driven gate-level timing simulation. The proposed new strategy adopts the static double prediction and the dynamic prediction for input and output values of local simulations. The double prediction utilizes another static prediction data for the secondary prediction once the first prediction fails, and the dynamic prediction tries to use the on-going simulation result accumulated dynamically during the actual parallel simulation execution as prediction data. Therefore, the communication overhead and synchronization overhead, which are the main bottleneck of parallel simulation, are maximally reduced. Throughout the proposed two prediction enhancement techniques, we have observed about 5x simulation performance improvement over the commercial parallel multi-core simulation for six test designs.

Implementation of Parallel Computer Generated Hologram Using Multi-GPGPU (다중 GPGPU를 이용한 컴퓨터 생성 홀로그램의 병렬화 구현)

  • Seo, Young-Ho;Lee, Yoon-Hyuk;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.5
    • /
    • pp.1177-1186
    • /
    • 2014
  • Computer-generated hologram (CGH) is to mathematically model optical phenomenon with digital computer. Because it requires huge amount of computational power, a fast and high performance technique is needed. In this paper, we proposed two parallelizations for CGH calculation. The first is to parallelize CGH algorithm in a GPU (general processing unit) and the second is to parallelize multiple GPUs. The proposed algorithm was implemented in GTX780 Ti GPU. It calculates a $1,024{\times}1,024$ hologram with 10K object points for about 24ms.

Implementation of Optical Paralle Adder using Polarization Coding (실시간 편광부호화에 의한 광병렬 가산기 구현)

  • 조웅호;배장근;노덕수;김수중
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.12
    • /
    • pp.1484-1493
    • /
    • 1992
  • In this paper, we propose the polarization coding of optical logic gates using filters and LCTV's, and represent the real-time system of an optical parallel adder to improve a carry propagation delay time. We fabricated a polarization filter for the polarization coding of a cell and an electrical system instead of an optical flip-flop which was necessary to an optical parallel adder. We used an optical fiber to play a part of decoding mask and interconnections in an optical parallel adder. The experimental results show that the polarization coding of a cell can represent 16 optical logic functions and that the implemented optical parallel adder can operate in real-time.

  • PDF