• Title/Summary/Keyword: multi-processor system-on-chip

Search Result 41, Processing Time 0.021 seconds

Design of a Dingle-chip Multiprocessor with On-chip Learning for Large Scale Neural Network Simulation (대규모 신경망 시뮬레이션을 위한 칩상 학습가능한 단일칩 다중 프로세서의 구현)

  • 김종문;송윤선;김명원
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.2
    • /
    • pp.149-158
    • /
    • 1996
  • In this paper we describe designing and implementing a digital neural chip and a parallel neural machine for simulating large scale neural netsorks. The chip is a single-chip multiprocessor which has four digiral neural processors (DNP-II) of the same architecture. Each DNP-II has program memory and data memory, and the chip operates in MIMD (multi-instruction, multi-data) parallel processor. The DNP-II has the instruction set tailored to neural computation. Which can be sed to effectively simulate various neural network models including on-chip learning. The DNP-II facilitates four-way data-driven communication supporting the extensibility of parallel systems. The parallel neural machine consists of a host computer, processor boards, a buffer board and an interface board. Each processor board consists of 8*8 array of DNP-II(equivalently 2*2 neural chips). Each processor board acn be built including linear array, 2-D mesh and 2-D torus. This flexibility supports efficiency of mapping from neural network models into parallel strucgure. The neural system accomplishes the performance of maximum 40 GCPS(giga connection per second) with 16 processor boards.

  • PDF

On-Chip Multiprocessor with Simultaneous Multithreading

  • Park, Kyoung;Choi, Sung-Hoon;Chung, Yong-Wha;Hahn, Woo-Jong;Yoon, Suk-Han
    • ETRI Journal
    • /
    • v.22 no.4
    • /
    • pp.13-24
    • /
    • 2000
  • As more transistors are integrated onto bigger die, an on-chip multiprocessor will become a promising alternative to the superscalar microprocessor that dominates today's microprocessor marketplace. This paper describes key parts of a new on-chip multiprocessor, called Raptor, which is composed of four 2-way superscalar processor cores and one graphic co-processor. To obtain performance characteristics of Raptor, a program-driven simulator and its programming environment were developed. The simulation results showed that Raptor can exploit thread level parallelism effectively and offer a promising architecture for future on-chip multi-processor designs.

  • PDF

The Design of Hardware MPI Units for MPSoC (MPSoC를 위한 저비용 하드웨어 MPI 유닛 설계)

  • Jeong, Ha-Young;Chung, Won-Young;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.1B
    • /
    • pp.86-92
    • /
    • 2011
  • In this paper, we propose a novel hardware MPI(Message Passing Interface) unit which supports message passing in multiprocessor system which use distributed memory architecture. MPI Hardware unit processes data synchronization, transmission and completion, and it supports processor non-blocking operation so it reduces overhead according to synchronization. Additionally, MPI hardware unit combines ready entry, request entry, reserve entry which save and manage the synchronized messages and performs the multiple outstanding issue and out of order completion. According to BFM(Bus Functional Model) simulation result, the performance is increased by 25% on many to many communication. After we designed MPI unit using HDL, with synopsys design compiler we synthesized, and for synthesis library we used MagnaChip $0.18{\mu}m$. And then we making prototype chip. The proposed message transmission interface hardware shows high performance for its increase in size. Thus, as we consider low-cost design and scalability, MPI hardware unit is useful in increasing overall performance of embedded MPSoC(Multi-Processor System-on-Chip).

Fast and Accurate Performance Estimation of Bus Matrix for Multi-Processor System-on-Chip (MPSoC) (멀티 프로세서 시스템-온-칩(MPSoC)을 위한 버스 매트릭스 구조의 빠르고 정확한 성능 예측 기법)

  • Kim, Sung-Chan;Ha, Soon-Hoi
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.11
    • /
    • pp.527-539
    • /
    • 2008
  • This paper presents a performance estimation technique based on queuing analysis for on-chip bus matrix architectures of Multi-Processor System-on-Chips(MPSoCs). Previous works relying on time-consuming simulation are not able to explore the vast design space to cope with increasing time-to-market pressure. The proposed technique gives accurate estimation results while achieving faster estimation time than cycle -accurate simulation by order of magnitude. We consider the followings for the modeling of practical memory subsystem: (1) the service time with the general distribution instead of the exponential distribution and (2) multiple-outstanding transactions to achieve high performance. The experimental results show that the proposed analysis technique has the accuracy of 94% on average and much shorter runtime ($10^5$ times faster at least) compared to simulation for the various examples: the synthetic traces and real-time application, 4-channel DVR.

Performance Analysis of Shared Buffer Router Architecture for Low Power Applications

  • Deivakani, M.;Shanthi, D.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.6
    • /
    • pp.736-744
    • /
    • 2016
  • Network on chip (NoC) is an emerging technology in the field of multi core interconnection architecture. The routers plays an essential components of Network on chip and responsible for packet delivery by selecting shortest path between source and destination. State-of-the-art NoC designs used routing table to find the shortest path and supports four ports for packet transfer, which consume high power consumption and degrades the system performance. In this paper, the multi port multi core router architecture is proposed to reduce the power consumption and increasing the throughput of the system. The shared buffer is employed between the multi ports of the router architecture. The performance of the proposed router is analyzed in terms of power and current consumption with conventional methods. The proposed system uses Modelsim software for simulation purposes and Xilinx Project Navigator for synthesis purposes. The proposed architecture consumes 31 mW on CPLD XC2C64A processor.

A Study on the Verification Platform Architecture for MPSoC (MPSoC 검증 플랫폼 구조에 관한 연구)

  • Song, Tae-Hoon;Song, Moon-Vin;Oh, Chae-Gon;Chung, Yun-Mo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.8
    • /
    • pp.74-79
    • /
    • 2007
  • In general, the high cost, long time, and complex steps are required in the design and implementation of MPSoC(Multi-Processor System on a Chip), therefore a platform is used to test the functionality and performance of IPs(Intellectual Properties). In this paper, we study a platform architecture to verify IPs based on Interconnect Network among processors, and show that the MPSoC platform gives better performance than a single processor for an application program.

SoC Network Architecture for Efficient Multi-Channel On-Chip-Bus (효율적인 다중 채널 On-Chip-Bus를 위한 SoC Network Architecture)

  • Lee Sanghun;Lee Chanho;Lee Hyuk-Jae
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.2 s.332
    • /
    • pp.65-72
    • /
    • 2005
  • We can integrate more IP blocks on a silicon die as the development of fabrication technologies and EDA tools. Consequently, we can design complicated SoC architecture including multi-processors. However, most of existing SoC buses have bottleneck in on-chip communication because of shared bus architectures, which result in the performance degradation of systems. In most cases, the performance of a multi-processor system is determined by efficient on-chip communication and the well-balanced distribution of computation rather than the performance of the processors. We propose an efficient SoC Network Architecture(SNA) using crossbar routers which provide a solution to ensure enough communication bandwidth. The SNA can significantly reduce the bottleneck of on-chip communication by providing multi-channels for multi-masters. According to the proposed architecture, we design a model system for the SNA. The proposed architecture has a better efficiency by $40\%$ than the AMBA AHB according to a simulation result.

A Programmable Multi-Format Video Decoder (프로그래머블 멀티 포맷 비디오 디코더)

  • Kim, Jaehyun;Park, Goo-man
    • Journal of Broadcast Engineering
    • /
    • v.20 no.6
    • /
    • pp.963-966
    • /
    • 2015
  • This paper introduces a programmable multi-format video decoder(MFD) to support HEVC(High Efficiency Video Coding) standard and for other video coding standards. The goal of the proposed MFD is the high-end FHD(Full High Definition) video decoder needed for a DTV(Digital Tele-Vision) SoC(System on Chip). The proposed platform consists of a hybrid architecture that is comprised of reconfigurable processors and flexible hardware accelerators to support the massive computational load and various kinds of video coding standards. The experimental results show that the proposed architecture is operating at a 300MHz clock that is capable of decoding HEVC bit-stream of FHD 30 frames per second.

A Task Scheduling Strategy in a Multi-core Processor for Visual Object Tracking Systems (시각물체 추적 시스템을 위한 멀티코어 프로세서 기반 태스크 스케줄링 방법)

  • Lee, Minchae;Jang, Chulhoon;Sunwoo, Myoungho
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.24 no.2
    • /
    • pp.127-136
    • /
    • 2016
  • The camera based object detection systems should satisfy the recognition performance as well as real-time constraints. Particularly, in safety-critical systems such as Autonomous Emergency Braking (AEB), the real-time constraints significantly affects the system performance. Recently, multi-core processors and system-on-chip technologies are widely used to accelerate the object detection algorithm by distributing computational loads. However, due to the advanced hardware, the complexity of system architecture is increased even though additional hardwares improve the real-time performance. The increased complexity also cause difficulty in migration of existing algorithms and development of new algorithms. In this paper, to improve real-time performance and design complexity, a task scheduling strategy is proposed for visual object tracking systems. The real-time performance of the vision algorithm is increased by applying pipelining to task scheduling in a multi-core processor. Finally, the proposed task scheduling algorithm is applied to crosswalk detection and tracking system to prove the effectiveness of the proposed strategy.

A Design of Multi-channel Speech Pickup Embedded System for Hands-free Comuunication (핸즈프리 통신을 위한 다중채널 음성픽업 임베디드 시스템 설계)

  • Ju, Hyng-Jun;Park, Chan-Sub;Jeon, Jae-Kuk;Kim, Ki-Man
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.2
    • /
    • pp.366-373
    • /
    • 2007
  • In this paper we propose a multi-channel speech pickup system for calling quality enhancement of hands-free communication using ALTERA Nios-II processor. Multi-channel speech pickup system uses Delay-and-Sum beamformer with zero-padding interpolator. This paper implements speech pickup system using the Nios-II processor with real-time I/O data processing speed. The proposes speech pickup embedded system shows a good agreement with those of computer simulation(MATLAB) and conventional DSP processor(TMS320C6711) result. The proposed method is effective more than previous methods in cost and design processing time. As a result, LE(Logic Element) of hardware used 3,649/5,980(61%) on a chip.