• Title/Summary/Keyword: High-performance processor

Search Result 618, Processing Time 0.023 seconds

An Efficient Central Queue Management Algorithm for High-speed Parallel Packet Filtering (고속 병렬 패킷 여과를 위한 효율적인 단일버퍼 관리 방안)

  • 임강빈;박준구;최경희;정기현
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.41 no.7
    • /
    • pp.63-73
    • /
    • 2004
  • This paper proposes an efficient centralized sin91e buffer management algorithm to arbitrate access contention mon processors on the multi-processor system for high-speed Packet filtering and proves that the algorithm provides reasonable performance by implementing it and applying it to a real multi-processor system. The multi-processor system for parallel packet filtering is modeled based on a network processor to distribute the packet filtering rules throughout the processors to speed up the filtering. In this paper we changed the number of processors and the processing time of the filtering rules as variables and measured the packet transfer rates to investigate the performance of the proposed algorithm.

Optimized Implementation of Scalable Multi-Precision Multiplication Method on RISC-V Processor for High-Speed Computation of Post-Quantum Cryptography (차세대 공개키 암호 고속 연산을 위한 RISC-V 프로세서 상에서의 확장 가능한 최적 곱셈 구현 기법)

  • Seo, Hwa-jeong;Kwon, Hyeok-dong;Jang, Kyoung-bae;Kim, Hyunjun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.473-480
    • /
    • 2021
  • To achieve the high-speed implementation of post-quantum cryptography, primitive operations should be tailored to the architecture of the target processor. In this paper, we present the optimized implementation of multiplier operation on RISC-V processor for post-quantum cryptography. Particularly, the column-wise multiplication algorithm is optimized with the primitive instruction of RISC-V processor, which improved the performance of 256-bit and 512-bit multiplication by 19% and 8% than previous works, respectively. Lastly, we suggest the instruction extension for the high-speed multiplication on the RISC-V processor.

Implementation of Multi-Core Processor for Beamforming Algorithm of Mobile Ultrasound Image Signals (모바일 초음파 영상신호의 빔포밍 알고리즘을 위한 멀티코어 프로세서 구현)

  • Choi, Byong-Kook;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.18A no.2
    • /
    • pp.45-52
    • /
    • 2011
  • In the past, a patient went to the room where an ultrasound image diagnosis device was set, and then he or she was examined by a doctor. However, currently a doctor can go and examine the patient with a handheld ultrasound device who stays in a room. However, it was implemented with only fundamental functions, and can not meet the high performance required by the focusing algorithm of ultrasound beam which determines the quality of ultrasound image. In addition, low energy consumption was satisfied for the mobile ultrasound device. To satisfy these requirements, this paper proposes a high-performance and low-power single instruction, multiple data (SIMD) based multi-core processor that supports a representative beamforming algorithm out of several focusing methods of mobile ultrasound image signals. The proposed SIMD multi-core processor, which consists of 16 processing elements (PEs), satisfies the high-performance required by the beamforming algorithm by exploiting considerable data-level parallelism inherent in the echo image data of ultrasound. Experimental results showed that the proposed multi-core processor outperforms a commercial high-performance processor, TI DSP C6416, in terms of execution time (15.8 times better), energy efficiency (6.9 times better), and area efficiency (10 times better).

Enhancement of Response Time of Real-Time Tasks with Variable Execution Times by Using Shared Bandwidth (가변 실행시간의 실시간 태스크들에 대하여 공유대역폭을 활용한 응답시간의 개선)

  • Kim, Yong-Seok
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.3
    • /
    • pp.77-85
    • /
    • 2009
  • Execution times of tasks can be variable depend on input data. If we choose a high performance processor to satisfy the worst case execution times, the hard cost becomes high and the energy consumption also becomes large. To apply a lower performance processor, we have to utilize processor capacity maximally while overrunning tasks can not affect deadlines of other tasks. To be used for such systems, this paper presents SBP (Shared Bandwidth Partitioning) that a processor bandwidth is reserved and shared among all tasks. If a task needs more processor capacity, it can use a portion of the shared bandwidth. A simulation result shows that SBP provides better performance than previous algorithms. SBP reduces deadline miss ratio which is related to scheduling quality. And the number of context switches, which is related to system overhead, is also reduced.

On Designing 4-way Superscalar Digital Signal Processor Core (4-way 수퍼 스칼라 디지털 시그널 프로세서 코어 설계)

  • 김준석;유선국;박성욱;정남훈;고우석;이근섭;윤대희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.6
    • /
    • pp.1409-1418
    • /
    • 1998
  • The recent audio CODEC(Coding/Decoding) algorithms are complex of several coding techniques, and can be divided into DSP tasks, controller tasks and mixed tasks. The traditional DSP processor has been designed for fast processing of DSP tasks only, but not for controller and mixed tasks. This paper presents a new architecture that achieves high throughput on both controller and mixed tasks of such algorithms while maintaining high performance for DSP tasks. The proposed processor, YSP-3, operates four algorithms while maintaining high performance for DSP tasks. The proposed processor, YSP-3, operates functional units (Multiplier, two ALUs, Load/Store Unit) in parallel via 4-issue super-scalar instruction structure. The performance evaluation of YSP-3 has been done through the implementation of the several DSP algorithms and the part of the AC-3 decoding algorithms.

  • PDF

The Design of FFT Processor for Real-time Power Quality Analysis System (실시간 전력품질분석시스템을 위한 FFT 프로세서의 설계)

  • Lee, Jeong-Bok;Park, Hae-Won;Kang, Min-Sao;Jean, Hee-Jong
    • Proceedings of the KIEE Conference
    • /
    • 2002.07b
    • /
    • pp.1071-1074
    • /
    • 2002
  • In this paper, power quality analysis system is proposed where voltage or current waveforms are nonsinusoidal. The proposed system relies on the FFT algorithm to compute real and reactive power. The advantage of system is that harmonic analysis is carried out on a period of the input signal. The proposed system is based on FFT processor which is designed using VHDL(Very high-speed integrated circuit Hardware Description Language). In the design of FFT processor, radix- $2^2$ is adopted to reduce several complex multipliers for twiddle factor. Complex multiplier is implemented as only shifters and adders. Therefore, the system is able to have both high hardware efficiency and high performance.

  • PDF

FFT Array Processor System with Easily Adjustable Computation speed and Hardware Complexity (계산속도와 하드웨어 양이 조절 용이한 FFT Array Processor 시스템)

  • Jae Hee Yoo
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.30A no.3
    • /
    • pp.114-129
    • /
    • 1993
  • A FFT array processor algorithm and architecture which anc use a minumum required number of simple, duplicate multiplier-adder processing elements according to various computation speed, will be presented. It is based on the p fold symmetry in the radix p constant geometry FFT butterfly stage with shuffled inputs and normally ordered outputs. Also, a methodology to implement a high performance high radix FFT with VLSI by constructing a high radix processing element with the duplications of a simple lower radix processing element will be discussed. Various performances and the trade-off between computation speed and hardware complexity will be evaluated and compared. Bases on the presented architecture, a radix 2, 8 point FFT processing element chip has been designed and it structure and the results will be discusses.

  • PDF

A Study on Power Dissipation of The Multicore Processor (멀티코어 프로세서의 전력 소비에 대한 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.2
    • /
    • pp.251-256
    • /
    • 2017
  • Recently, multicore processor system is widely adopted not only in general purpose computers but also in embedded systems and mobile devices in order to improve performance. Since the power dissipation issue of multicore processor system is very significant, it must be estimated accurately in the early design stage. In this paper, a fast power analysis tool for a high performance multicore processor based on the trace-driven simulator has been developed. To achieve it, the power dissipation of each hardware unit per core are added. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed to estimate the average power dissipation per instruction.

An I/O Bus-Based Dual Active Fault Tolerant Architecture fort Good System Performance

  • Kwak, Seung-Uk;Kim, Jeong-Il;Jeong, Keun-Won;Park, Kyong-Bae;Kang, Kyong-In;Kim, Hyen-Uk;Lee, Kwang-Bae
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 1998.06a
    • /
    • pp.515-520
    • /
    • 1998
  • In this paper, we propose a new fault tolerant architecture for high availability systems, where for module internal operations both processor modules perform the same tasks at the same time independently of each other while for module external operations both processor modules act actively. That is, operations of synchronization between dual processor modules except clock synchronization are requested only when module external operations are executed. The architecture can not only improve system availability by reducing system reintegration time but also reduce performance degradation problem due to frequent synchronization between dual processor modules. The clock unit consists of a clock generator and a clock synchronization circuit. This supplies a stable clock signal under clock unit disorder of any processor module or rapid clock signal variation. And this architecture achieves system availability and data credibility by designing as symmetrical form.

  • PDF

Performance Evaluation of Secure Embedded Processor using FEC-Based Instruction-Level Correlation Technique (오류정정 부호 기반 명령어 연관성 기법을 적용한 임베디드 보안 프로세서의 성능평가)

  • Lee, Seung-Wook;Kwon, Soon-Gyu;Kim, Jong-Tae
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.5B
    • /
    • pp.526-531
    • /
    • 2009
  • In this paper, we propose new novel technique (ILCT: Instruction-Level Correlation Technique) which can detect tempered instructions by software attacks or hardware attacks before their execution. In conventional works, due to both high complex computation of cipher process and low processing speed of cipher modules, existing secure processor architecture applying cipher technique can cause serious performance degradation. While, the secure processor architecture applying ILCT with FEC does not incur excessive performance decrease by complexity of computation and speed of tampering detection modules. According to experimental results, total memory overhead including parity are increased in average of 26.62%. Also, secure programs incur CPI degradation in average of $1.20%{\sim}1.97%$.