• Title/Summary/Keyword: Multiple Processor System

Search Result 166, Processing Time 0.03 seconds

Reliability Assessment of Low-Power Processor Packages for Supercomputers (슈퍼컴퓨터에 사용되는 저전력 프로세서 패키지의 신뢰성 평가)

  • Park, Ju-Young;Kwon, Daeil;Nam, Dukyun
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.23 no.2
    • /
    • pp.37-42
    • /
    • 2016
  • While datacenter operation cost increases with electricity price rise, many researchers study low-power processor based supercomputers to reduce power consumption of datacenters. Reliability of low-power processors for supercomputers can be of concern since the reliability of many low-power processors are assessed based on mobile use conditions. This paper assessed the reliability of low-power processor packages based on supercomputer use conditions. Temperature cycling was determined as a critical failure cause of low-power processor packages through literature surveys and failure mode, effect and criticality analysis. The package temperature was measured at multiple processor load conditions to examine the relationship between processor load and package temperature. A physics-of-failure reliability model associated with temperature cycling predicted the expected lifetime of low-power processors to be less than 3 years. Recommendations to improve the lifetime of low-power processors were presented based on the experimental results.

The Design and implementation of parallel processing system using the $Nios^{(R)}$ II embedded processor ($Nios^{(R)}$ II 임베디드 프로세서를 사용한 병렬처리 시스템의 설계 및 구현)

  • Lee, Si-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.11
    • /
    • pp.97-103
    • /
    • 2009
  • In this thesis, we discuss the implementation of parallel processing system which is able to get a high degree of efficiency(size, cost, performance and flexibility) by using $Nios^{(R)}$ II(32bit RISC(Reduced Instruction Set Computer) processor) embedded processor in DE2-$70^{(R)}$ reference board. The designed Parallel processing system is master-slave, shared memory and MIMD(Mu1tiple Instruction-Multiple Data stream) architecture with 4-processor. For performance test of system, N-point FFT is used. The result is represented speed-up as follow; in the case of using 2-processor(core), speed-up is shown as average 1.8 times as 1-processor's. When 4-processor, the speed-up is shown as average 2.4 times as it's.

Fast and Accurate Performance Estimation of Bus Matrix for Multi-Processor System-on-Chip (MPSoC) (멀티 프로세서 시스템-온-칩(MPSoC)을 위한 버스 매트릭스 구조의 빠르고 정확한 성능 예측 기법)

  • Kim, Sung-Chan;Ha, Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.11
    • /
    • pp.527-539
    • /
    • 2008
  • This paper presents a performance estimation technique based on queuing analysis for on-chip bus matrix architectures of Multi-Processor System-on-Chips(MPSoCs). Previous works relying on time-consuming simulation are not able to explore the vast design space to cope with increasing time-to-market pressure. The proposed technique gives accurate estimation results while achieving faster estimation time than cycle -accurate simulation by order of magnitude. We consider the followings for the modeling of practical memory subsystem: (1) the service time with the general distribution instead of the exponential distribution and (2) multiple-outstanding transactions to achieve high performance. The experimental results show that the proposed analysis technique has the accuracy of 94% on average and much shorter runtime ($10^5$ times faster at least) compared to simulation for the various examples: the synthetic traces and real-time application, 4-channel DVR.

SAMBA Type MPSoC Bus Architecture Optimization under Performance Constraints (성능 제약 조건 하에서의 SAMBA 형 MPSoC 버스 구조 최적화)

  • Kim, Hong-Yeom;Jung, Sung-Chul;Shin, Hyun-Chul
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.1
    • /
    • pp.94-101
    • /
    • 2010
  • Optimization of interconnects among processors and memories becomes important as multiple processors and memories can be integrated on a Multi-Processor System-on-Chip (MPSoC). Since the optimal interconnection architecture is usually dependent on the applications, systematic design methodology for various data transfer requirements is necessary. In this paper, we focus on bus interconnection for MPSoC applications which use 4 ~ 16 processors. We propose a new systematic bus design methodology under performance constraints using Single Arbitration Multiple Bus Accesses (SAMBA) style bus architectures. Optimized bus architecture is found to satisfy performance constraints for a single or multiple applications. When compared to the unoptimized architecture, our method can reduce the bus switch logic circuits significantly (by more than 50% sometimes). Furthermore, low cost bus architectures can be found to satisfy the performance constraints for multiple applications.

Performance on Channel Multiple Access for the OBP Satellite (OBP(On-Board Processing)위성의 채널 다중 접속 방식에 대한 성능 연구)

  • 이정렬;김덕년
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.12a
    • /
    • pp.5-8
    • /
    • 2000
  • In this paper, The new scheme for Demand Assigned Multiple Access(DAMA) system is proposed. To focus on the onboard processor's specific monitoring of output port of the downlink, we found the Throughput and Blocking probability by the request traffic(λ) and channel's departure probability(Ρ).

  • PDF

Implementation of Lattice Reduction-aided Detector using GPU on SDR System (SDR 시스템에서 GPU를 사용한 Lattice Reduction-aided 검출기 구현)

  • Kim, Tae Hyun;Leem, Hyun Seok;Choi, Seung Won
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.7 no.3
    • /
    • pp.55-61
    • /
    • 2011
  • This paper presents an implementation of Lattice Reduction (LR)-aided detector for Multiple-Input Multiple-Output (MIMO) system using Graphics Processing Unit (GPU). GPU is a parallel processor which has a number of Arithmetic Logic Units (ALUs), thus, it can minimize the operation time of LR algorithm through the parallelization using multiple threads in the GPU. Through the implemented LR-aided detector, we verify that the LR-aided detector operates a lot faster than Maximum Likelihood (ML) detector. The implemented LR-aided detector has been applied to WiMAX system to show the feasibility of its real-time processing. In addition, we demonstrate that the processing time can be reduced at the cost of 3dB SNR loss by limiting the repeating loop in Lenstra-Lenstra-Lovasz (LLL) algorithm which is frequently used in LR-aided detector.

An Efficient Processor Synchronization Scheme on Shared Memory Multiprocessor (공유메모리 다중처리기에서 효율적인 프로세서 동기화 기법)

  • 윤석한;원철호;김덕진
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.5
    • /
    • pp.683-692
    • /
    • 1995
  • Many kinds of large scale multiprocessing and parallel-processing systems have recently been developed. The contention on the shared data caused by multiple processors may degrade system performance. So, processor synchronization has become one of the important issues in these systems. To solve the synchornization issues, a lot of software and hardware schemes based on spin lock have been proposed. Although software schemes are easy to implement, hardware schemes are preferred in many systems to gain optimized performance. This paper proposes an efficient processor synchronization scheme, called QCX,and describes its design considerations, hardware, algorithm, protocol. Also, in this paper, the performance of QCX has been evaluated with QOLB[5] and LBP[7] using a simulation. The simulation, with varying the number of processor and the contention on shared variables, measured the average execution times of a workload. The simulation results show that the performances of QCX is best when practicability is considered. QCX is more efficient than QOLB and LBP in two aspects. First, the hardware of QCX is more simple and cost-effective because the cache structure need not be changed. Secondly, QCX is more general because it uses a generic atomic instruction.

  • PDF

Acceleration for Removing Sea-fog using Graphic Processors and Parallel Processing (그래픽 프로세서를 이용한 병렬연산 기반 해무 제거 고속화)

  • Kim, Young-doo;Kwak, Jae-min;Seo, Young-ho;Choi, Hyun-jun
    • Journal of Advanced Navigation Technology
    • /
    • v.21 no.5
    • /
    • pp.485-490
    • /
    • 2017
  • In this paper, we propose a technique for high speed removal of sea-fog using a graphic processor. This technique uses a host processor(CPU) and several graphics processors(GPU) capable of parallel processing to remove sea-fog from the input image. In the process of removing sea-fog, the dark channel extraction, the maximum brightness channel extraction, and the calculation of the transmission are performed by the host processor, and the process of refining the transmission by applying the bidirectional filter is performed in parallel through the graphic processor. To verify the proposed parallel processing method, three NVIDIA GTX 1070 GPUs were used to construct the verification environment. As a result, it takes about 140ms when implemented with one graphics processor, and 26ms when implemented using OpenMP and multiple GPGPUs. The proposed a parallel processing algorithm based on the graphics processor unit can be used for safe navigation, port control and monitoring system.

Hardware Design and Implementation of a Parallel Processor for High-Performance Multimedia Processing (고성능 멀티미디어 처리용 병렬프로세서 하드웨어 설계 및 구현)

  • Kim, Yong-Min;Hwang, Chul-Hee;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.5
    • /
    • pp.1-11
    • /
    • 2011
  • As the use of mobile multimedia devices is increasing in the recent year, the needs for high-performance multimedia processors are increasing. In this regard, we propose a SIMD (Single Instruction Multiple Data) based parallel processor that supports high-performance multimedia applications with low energy consumption. The proposed parallel processor consists of 16 processing elements (PEs) and operates on a 3-stage pipelining. Experimental results indicated that the proposed parallel processor outperforms conventional parallel processors in terms of performance. In addition, our proposed parallel processor outperforms commercial high-performance TI C6416 DSP in terms of performance (1.4-31.4x better) and energy efficiency (5.9-8.1x better) with same 130nm technology and 720 clock frequency. The proposed parallel processor was developed with verilog HDL and verified with a FPGA prototype system.