• Title/Summary/Keyword: Processor Core

Search Result 397, Processing Time 0.028 seconds

Performance Evaluation and Analysis for Discrete Wavelet Transform on Many-Core Processors (매니코어 프로세서 상에서 이산 웨이블릿 변환을 위한 성능 평가 및 분석)

  • Park, Yong-Hun;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.7 no.5
    • /
    • pp.277-284
    • /
    • 2012
  • To meet the usage of discrete wavelet transform (DWT) on potable devices, this paper implements 2-level DWT using a reference many-core processor architecture and determine the optimal many-core processor. To explore the optimal many-core processor, we evaluate the impacts of a data-per-processing element ratio that is defined as the amount of data mapped directly to each processing element (PE) on system performance, energy efficiency, and area efficiency, respectively. This paper utilized five PE configurations (PEs=16, 64, 256, 1,024, and 4,096) that were implemented in 130nm CMOS technology with a 720MHz clock frequency. Experimental results indicated that maximum energy and area efficiencies were achieved at PEs=1,024. However, the system area must be limited 140mm2 and the power should not exceed 3 watts in order to implement 2-level DWT on portable devices. When we consider these restrictions, the most reasonable energy and area efficiencies were achieved at PEs=256.

Analysis on the Performance and Temperature of the 3D Quad-core Processor according to Cache Organization (캐쉬 구성에 따른 3차원 쿼드코어 프로세서의 성능 및 온도 분석)

  • Son, Dong-Oh;Ahn, Jin-Woo;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.6
    • /
    • pp.1-11
    • /
    • 2012
  • As the process technology scales down, multi-core processors cause serious problems such as increased interconnection delay, high power consumption and thermal problems. To solve the problems in 2D multi-core processors, researchers have focused on the 3D multi-core processor architecture. Compared to the 2D multi-core processor, the 3D multi-core processor decreases interconnection delay by reducing wire length significantly, since each core on different layers is connected using vertical through-silicon via(TSV). However, the power density in the 3D multi-core processor is increased dramatically compared to that in the 2D multi-core processor, because multiple cores are stacked vertically. Unfortunately, increased power density causes thermal problems, resulting in high cooling cost, negative impact on the reliability. Therefore, temperature should be considered together with performance in designing 3D multi-core processors. In this work, we analyze the temperature of the cache in quad-core processors varying cache organization. Then, we propose the low-temperature cache organization to overcome the thermal problems. Our evaluation shows that peak temperature of the instruction cache is lower than threshold. The peak temperature of the data cache is higher than threshold when the cache is composed of many ways. According to the results, our proposed cache organization not only efficiently reduces the peak temperature but also reduces the performance degradation for 3D quad-core processors.

A Performance Study of Asymmetric Multi-core Digital Signal Processor Architectures (비대칭적 멀티코어 디지털 신호처리 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.15 no.5
    • /
    • pp.219-224
    • /
    • 2015
  • Recently, the multi-core processor architecture is widely used in the digital signal processors for enhancing its performance. Multi-core processors are classified either as symmetric or asymmetric. Asymmetric multi-core processors are known to have higher performance and more efficient than symmetric multi-core processors. In order to study the performance enhancement of asymmetric multi-core digital signal processors over the symmetric ones, the trace-driven simulation has been executed for various asymmetric quad-core, octa-core and hexadeca-core digital signal processors and compared with the symmetric ones of similar hardware budget using UTDSP benchmarks as input.

A Performance Study of Multi-Core Processors with Perceptrons (퍼셉트론을 이용하는 멀티코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.12
    • /
    • pp.1704-1709
    • /
    • 2014
  • In order to increase the performance of multi-core system processor architectures, the multi-thread branch predictor which speculatively fetches and allocates threads to each core should be highly accurate. In this paper, the perceptron based multi-thread branch predictor is proposed for the multi-core processor architectures. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the 2 to 16-core architectures employing perceptron multi-thread branch predictor extensively. Its performance is compared with the architecture which utilizes the two-level adaptive multi-thread branch predictor.

A Study of Trace-driven Simulation for Multi-core Processor Architectures (멀티코어 프로세서의 명령어 자취형 모의실험에 대한 연구)

  • Lee, Jong-Bok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.9-13
    • /
    • 2012
  • In order to overcome the complexity and power problems of superscalar processors, the multi-core architecture has been prevalent recently. Although the execution-driven simulation is wide spread, the trace-driven simulation has speed advantages over the execution-driven simulation. We present a methodology to simulate multi-core architecture using trace-driven simulator. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the cores ranging from 2 to 16 extensively. As a result, the 16-core processor resulted in 4.1 IPC and 13.3 times speed up over single-core processor on the average.

Design of 32 bit Parallel Processor Core for High Energy Efficiency using Instruction-Levels Dynamic Voltage Scaling Technique

  • Yang, Yil-Suk;Roh, Tae-Moon;Yeo, Soon-Il;Kwon, Woo-H.;Kim, Jong-Dae
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.9 no.1
    • /
    • pp.1-7
    • /
    • 2009
  • This paper describes design of high energy efficiency 32 bit parallel processor core using instruction-levels data gating and dynamic voltage scaling (DVS) techniques. We present instruction-levels data gating technique. We can control activation and switching activity of the function units in the proposed data technique. We present instruction-levels DVS technique without using DC-DC converter and voltage scheduler controlled by the operation system. We can control powers of the function units in the proposed DVS technique. The proposed instruction-levels DVS technique has the simple architecture than complicated DVS which is DC-DC converter and voltage scheduler controlled by the operation system and a hardware implementation is very easy. But, the energy efficiency of the proposed instruction-levels DVS technique having dual-power supply is similar to the complicated DVS which is DC-DC converter and voltage scheduler controlled by the operation system. We simulate the circuit simulation for running test program using Spectra. We selected reduced power supply to 0.667 times of the supplied power supply. The energy efficiency of the proposed 32 bit parallel processor core using instruction-levels data gating and DVS techniques can improve about 88.4% than that of the 32 bit parallel processor core without using those. The designed high energy efficiency 32 bit parallel processor core can utilize as the coprocessor processing massive data at high speed.

40-TFLOPS artificial intelligence processor with function-safe programmable many-cores for ISO26262 ASIL-D

  • Han, Jinho;Choi, Minseok;Kwon, Youngsu
    • ETRI Journal
    • /
    • v.42 no.4
    • /
    • pp.468-479
    • /
    • 2020
  • The proposed AI processor architecture has high throughput for accelerating the neural network and reduces the external memory bandwidth required for processing the neural network. For achieving high throughput, the proposed super thread core (STC) includes 128 × 128 nano cores operating at the clock frequency of 1.2 GHz. The function-safe architecture is proposed for a fault-tolerance system such as an electronics system for autonomous cars. The general-purpose processor (GPP) core is integrated with STC for controlling the STC and processing the AI algorithm. It has a self-recovering cache and dynamic lockstep function. The function-safe design has proved the fault performance has ASIL D of ISO26262 standard fault tolerance levels. Therefore, the entire AI processor is fabricated via the 28-nm CMOS process as a prototype chip. Its peak computing performance is 40 TFLOPS at 1.2 GHz with the supply voltage of 1.1 V. The measured energy efficiency is 1.3 TOPS/W. A GPP for control with a function-safe design can have ISO26262 ASIL-D with the single-point fault-tolerance rate of 99.64%.

Multi-Core Processor for Real-Time Sound Synthesis of Gayageum (가야금의 실시간 음 합성을 위한 멀티코어 프로세서 구현)

  • Choi, Ji-Won;Cho, Sang-Jin;Kim, Cheol-Hong;Kim, Jong-Myon;Chong, Ui-Pil
    • The KIPS Transactions:PartA
    • /
    • v.18A no.1
    • /
    • pp.1-10
    • /
    • 2011
  • Physical modeling has been widely used for sound synthesis since it synthesizes high quality sound which is similar to real-sound for musical instruments. However, physical modeling requires a lot of parameters to synthesize a large number of sounds simultaneously for the musical instrument, preventing its real-time processing. To solve this problem, this paper proposes a single instruction, multiple data (SIMD) based multi-core processor that supports real-time processing of sound synthesis of gayageum which is a representative Korean traditional musical instrument. The proposed SIMD-base multi-core processor consists of 12 processing elements (PE) to control 12 strings of gayageum in which each PE supports modeling of the corresponding string. The proposed SIMD-based multi-core processor can generate synthesized sounds of 12 strings simultaneously after receiving excitation signals and parameters of each string as an input. Experimental results using a sampling reate 44.1 kHz and 16 bits quantization show that synthesis sound using the proposed multi-core processor was very similar to the original sound. In addition, the proposed multi-core processor outperforms commercial processors(TI's TMS320C6416, ARM926EJ-S, ARM1020E) in terms of execution time ($5.6{\sim}11.4{\times}$ better) and energy efficiency (about $553{\sim}1,424{\times}$ better).

Design and Implementation of a Linux-based Message Processor to Minimize the Response-time Delay of Non-real-time Messages in Multi-core Environments (멀티코어 환경에서 비실시간 메시지의 응답시간 지연을 최소화하는 리눅스 기반 메시지 처리기의 설계 및 구현)

  • Wang, Sangho;Park, Younghun;Park, Sungyong;Kim, Seungchun;Kim, Cheolhoe;Kim, Sangjun;Jin, Cheol
    • Journal of KIISE
    • /
    • v.44 no.2
    • /
    • pp.115-123
    • /
    • 2017
  • A message processor is server software that receives non-realtime messages as well as realtime messages from clients that need to be processed within a deadline. With the recent advances of micro-processor technologies and Linux, the message processor is often implemented in Linux-based multi-core servers and it is important to use cores efficiently to maximize the performance of system in multi-core environments. Numerous research efforts on a real-time scheduler for the efficient utilization of the multi-core environments have been conducted. Typically, though, they have been conducted theoretically or via simulation, making a subsequent real-system application difficult. Moreover, many Linux-based real-time schedulers can only be used in a specific Linux version, or the Linux source code needs to be modified. This paper presents the design of a Linux-based message processor for multi-core environments that maps the threads to the cores at user level. The message processor is implemented through a modification of the traditional RM algorithm that consolidates the real-time messages into certain cores using a first-fit-based bin-packing algorithm; this minimizes the response-time delay of the non-real-time messages, while guaranteeing the violation rate of the real-time messages. To compare the performances, the message processor was implemented using the two multi-core-scheduling algorithms GSN-EDF and P-FP, which are provided by the LITMUS framework. The benchmarking results show that the response-time delay of non-real-time messages in the proposed system was improved up to a maximum of 17% to 18%.

Performance Study of Asymmetric Multicore Processor Architectures (비대칭적 멀티코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.3
    • /
    • pp.163-169
    • /
    • 2014
  • Recently, the importance of multicore processor system is growing rapidly. Multicore processors are classified either as symmetric or asymmetric. Asymmetric multicore processors consist of a high performance complex core and number of low performance simple cores, and are known to be more efficient than symmetric multicore processors. Therefore, performance impact on various configurations of asymmetric multi-core processor needs to be studied. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for different asymmetric quad-core and octa-core processors and compared to the corresponding symmetric ones.