• Title/Summary/Keyword: network Processor

Search Result 559, Processing Time 0.03 seconds

A class-based rate limiting method applicable to the network processor (네트워크 프로세서에 적용 가능한 클래스 별 레이트 제한 기법)

  • Noh, Jin-Taek;Lee, Jin-Sun;Choi, Kyung-Hee;Jung, Gi-Hyun;Yim, Kang-Bin
    • The KIPS Transactions:PartC
    • /
    • v.12C no.5 s.101
    • /
    • pp.725-732
    • /
    • 2005
  • This paper proposes an implementation methodology of the rate limiting method and the class-based bandwidth management for the gigabit-powered network processor, which are used on general purpose or ASIC systems in order to efficiently manage network bandwidth. Implementation and experiments are done on Intel's IXP1200 network processor. The result shows the accuracy of limited bandwidth and settling time of the estimator against the dynamic traffic rate. Through the results, this paper proves the proposed method and implementation properly work as expected.

FLUID MODEL SOLUTION OF FEEDFORWARD NETWORK OF OVERLOADED MULTICLASS PROCESSOR SHARING QUEUES

  • AMAL EZZIDANI;ABDELGHANI BEN TAHAR;MOHAMED HANINI
    • Journal of applied mathematics & informatics
    • /
    • v.42 no.2
    • /
    • pp.291-303
    • /
    • 2024
  • In this paper, we consider a feedforward network of overloaded multiclass processor sharing queues and we give a fluid model solution under the condition that the system is initially empty. The main theorem of the paper provides sufficient conditions for a fluid model solution to be linear with time. The results are illustrated through examples.

AB9: A neural processor for inference acceleration

  • Cho, Yong Cheol Peter;Chung, Jaehoon;Yang, Jeongmin;Lyuh, Chun-Gi;Kim, HyunMi;Kim, Chan;Ham, Je-seok;Choi, Minseok;Shin, Kyoungseon;Han, Jinho;Kwon, Youngsu
    • ETRI Journal
    • /
    • v.42 no.4
    • /
    • pp.491-504
    • /
    • 2020
  • We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.

Dynamic Scheduling of Network Processes for Multi-Core Systems (멀티 코어 시스템에서 통신 프로세스의 동적 스케줄링)

  • Jang, Hye-Churn;Jin, Hyun-Wook;Kim, Hag-Young
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.12
    • /
    • pp.968-972
    • /
    • 2009
  • The multi-core processors are being widely exploited by many high-end systems. With significant advances in processor architecture, the network band-width required on the high-end systems is increasing drastically. It is therefore highly desirable to manage multiple cores efficiently to achieve high network band-width with minimum resource requirements. Modern operating systems, however, still have significant design and optimization space to leverage the network performance over multi-core systems. In this paper, we suggest a novel networking process scheduling scheme, which decides the best processor affinity of networking processes based on the processor cache layout, communication intensiveness, and processor loads. The experimental results show that the scheduling scheme implemented in the Linux kernel can improve the network bandwidth and the effectiveness of processor utilization by 20% and 59%, respectively.

Effect of Hot Spot to Performance of Interconnection Network (Hot Spot 이 Interconnection Network 의 성능에 미치는 영향)

  • Kim, Seong-Jong;Keem, Tae-Hyeong;Lee, Young-No;Shin, In-Chul
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.655-658
    • /
    • 1988
  • Interconnection network is to provide communication among functional modules. The interconnections considered are Generalized Cube networks. Two situations are examined: a memory module is equally likely to be addressed by a processor and a processor has a favorite memory. This paper proposes the effective condition of operation in interconnection network through performance evaluation by simulation.

  • PDF

A hardware implementation of neural network with modified HANNIBAL architecture (수정된 하니발 구조를 이용한 신경회로망의 하드웨어 구현)

  • 이범엽;정덕진
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.45 no.3
    • /
    • pp.444-450
    • /
    • 1996
  • A digital hardware architecture for artificial neural network with learning capability is described in this paper. It is a modified hardware architecture known as HANNIBAL(Hardware Architecture for Neural Networks Implementing Back propagation Algorithm Learning). For implementing an efficient neural network hardware, we analyzed various type of multiplier which is major function block of neuro-processor cell. With this result, we design a efficient digital neural network hardware using serial/parallel multiplier, and test the operation. We also analyze the hardware efficiency with logic level simulation. (author). refs., figs., tabs.

  • PDF

Multicore Flow Processor with Wire-Speed Flow Admission Control

  • Doo, Kyeong-Hwan;Yoon, Bin-Yeong;Lee, Bhum-Cheol;Lee, Soon-Seok;Han, Man Soo;Kim, Whan-Woo
    • ETRI Journal
    • /
    • v.34 no.6
    • /
    • pp.827-837
    • /
    • 2012
  • We propose a flow admission control (FAC) for setting up a wire-speed connection for new flows based on their negotiated bandwidth. It also terminates a flow that does not have a packet transmitted within a certain period determined by the users. The FAC can be used to provide a reliable transmission of user datagram and transmission control protocol applications. If the period of flows can be set to a short time period, we can monitor active flows that carry a packet over networks during the flow period. Such powerful flow management can also be applied to security systems to detect a denial-of-service attack. We implement a network processor called a flow management network processor (FMNP), which is the second generation of the device that supports FAC. It has forty reduced instruction set computer core processors optimized for packet processing. It is fabricated in 65-nm CMOS technology and has a 40-Gbps process performance. We prove that a flow router equipped with an FMNP is better than legacy systems in terms of throughput and packet loss.

Design of a Dingle-chip Multiprocessor with On-chip Learning for Large Scale Neural Network Simulation (대규모 신경망 시뮬레이션을 위한 칩상 학습가능한 단일칩 다중 프로세서의 구현)

  • 김종문;송윤선;김명원
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.2
    • /
    • pp.149-158
    • /
    • 1996
  • In this paper we describe designing and implementing a digital neural chip and a parallel neural machine for simulating large scale neural netsorks. The chip is a single-chip multiprocessor which has four digiral neural processors (DNP-II) of the same architecture. Each DNP-II has program memory and data memory, and the chip operates in MIMD (multi-instruction, multi-data) parallel processor. The DNP-II has the instruction set tailored to neural computation. Which can be sed to effectively simulate various neural network models including on-chip learning. The DNP-II facilitates four-way data-driven communication supporting the extensibility of parallel systems. The parallel neural machine consists of a host computer, processor boards, a buffer board and an interface board. Each processor board consists of 8*8 array of DNP-II(equivalently 2*2 neural chips). Each processor board acn be built including linear array, 2-D mesh and 2-D torus. This flexibility supports efficiency of mapping from neural network models into parallel strucgure. The neural system accomplishes the performance of maximum 40 GCPS(giga connection per second) with 16 processor boards.

  • PDF

40-TFLOPS artificial intelligence processor with function-safe programmable many-cores for ISO26262 ASIL-D

  • Han, Jinho;Choi, Minseok;Kwon, Youngsu
    • ETRI Journal
    • /
    • v.42 no.4
    • /
    • pp.468-479
    • /
    • 2020
  • The proposed AI processor architecture has high throughput for accelerating the neural network and reduces the external memory bandwidth required for processing the neural network. For achieving high throughput, the proposed super thread core (STC) includes 128 × 128 nano cores operating at the clock frequency of 1.2 GHz. The function-safe architecture is proposed for a fault-tolerance system such as an electronics system for autonomous cars. The general-purpose processor (GPP) core is integrated with STC for controlling the STC and processing the AI algorithm. It has a self-recovering cache and dynamic lockstep function. The function-safe design has proved the fault performance has ASIL D of ISO26262 standard fault tolerance levels. Therefore, the entire AI processor is fabricated via the 28-nm CMOS process as a prototype chip. Its peak computing performance is 40 TFLOPS at 1.2 GHz with the supply voltage of 1.1 V. The measured energy efficiency is 1.3 TOPS/W. A GPP for control with a function-safe design can have ISO26262 ASIL-D with the single-point fault-tolerance rate of 99.64%.

JOB Scheduling for process Control in Hierarchical Computer Network (계층구조 Computer Network에서 공정제어를 위한 JOB Scheduling)

  • Park, Yil
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.5 no.1
    • /
    • pp.83-87
    • /
    • 1980
  • The distributive processing job in a hierarchical computer network, which supervises and controls the complex relations between the variables periodically for raising the fault folerance, can be defined its periodicity and its execution time. All the job may be composed of the subsets in relation of Tree structure. For a processor job set this paper finds out a job scheduling algorithm that has the less loose time between period than that of FCFS.

  • PDF