• Title/Summary/Keyword: Processor Core

Search Result 397, Processing Time 0.034 seconds

Analysis on the Thermal Efficiency of Branch Prediction Techniques in 3D Multicore Processors (3차원 구조 멀티코어 프로세서의 분기 예측 기법에 관한 온도 효율성 분석)

  • Ahn, Jin-Woo;Choi, Hong-Jun;Kim, Jong-Myon;Kim, Cheol-Hong
    • The KIPS Transactions:PartA
    • /
    • v.19A no.2
    • /
    • pp.77-84
    • /
    • 2012
  • Speculative execution for improving instruction-level parallelism is widely used in high-performance processors. In the speculative execution technique, the most important factor is the accuracy of branch predictor. Unfortunately, complex branch predictors for improving the accuracy can cause serious thermal problems in 3D multicore processors. Thermal problems have negative impact on the processor performance. This paper analyzes two methods to solve the thermal problems in the branch predictor of 3D multi-core processors. First method is dynamic thermal management which turns off the execution of the branch predictor when the temperature of the branch predictor exceeds the threshold. Second method is thermal-aware branch predictor placement policy by considering each layer's temperature in 3D multi-core processors. According to our evaluation, the branch predictor placement policy shows that average temperature is $87.69^{\circ}C$, and average maximum temperature gradient is $11.17^{\circ}C$. And, dynamic thermal management shows that average temperature is $89.64^{\circ}C$ and average maximum temperature gradient is $17.62^{\circ}C$. Proposed branch predictor placement policy has superior thermal efficiency than the dynamic thermal management. In the perspective of performance, the proposed branch predictor placement policy degrades the performance by 3.61%, while the dynamic thermal management degrades the performance by 27.66%.

Implementation of SDR-based LTE-A PDSCH Decoder for Supporting Multi-Antenna Using Multi-Core DSP (멀티코어 DSP를 이용한 다중 안테나를 지원하는 SDR 기반 LTE-A PDSCH 디코더 구현)

  • Na, Yong;Ahn, Heungseop;Choi, Seungwon
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.4
    • /
    • pp.85-92
    • /
    • 2019
  • This paper presents a SDR-based Long Term Evolution Advanced (LTE-A) Physical Downlink Shared Channel (PDSCH) decoder using a multicore Digital Signal Processor (DSP). For decoder implementation, multicore DSP TMS320C6670 is used, which provides various hardware accelerators such as turbo decoder, fast Fourier transformer and Bit Rate Coprocessors. The TMS320C6670 is a DSP specialized in implementing base station platforms and is not an optimized platform for implementing mobile terminal platform. Accordingly, in this paper, the hardware accelerator was changed to the terminal implementation to implement the LTE-A PDSCH decoder supporting the multi-antenna and the functions not provided by the hardware accelerator were implemented through core programming. Also pipeline using multicore was implemented to meet the transmission time interval. To confirm the feasibility of the proposed implementation, we verified the real-time decoding capability of the PDSCH decoder implemented using the LTE-A Reference Measurement Channel (RMC) waveform about transmission mode 2 and 3.

Implementation of Encryption Module for Securing Contents in System-On-Chip (콘텐츠 보호를 위한 시스템온칩 상에서 암호 모듈의 구현)

  • Park, Jin;Kim, Young-Geun;Kim, Young-Chul;Park, Ju-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.11
    • /
    • pp.225-234
    • /
    • 2006
  • In this paper, we design a combined security processor, ECC, MD-5, and AES, as a SIP for cryptography of securing contents. Each SIP is modeled and designed in VHDL and implemented as a reusable macro through logic synthesis, simulation and FPGA verification. To communicate with an ARM9 core, we design a BFM(Bus Functional Model) according to AMBA AHB specification. The combined security SIP for a platform-based SoC is implemented by integrating ECC, AES and MD-5 using the design kit including the ARM9 RISC core, one million-gate FPGA. Finally, it is fabricated into a MPW chip using Magna chip $0.25{\mu}m(4.7mm{\times}4.7mm$) CMOS technology.

  • PDF

Real-Time Object Segmentation in Image Sequences (연속 영상 기반 실시간 객체 분할)

  • Kang, Eui-Seon;Yoo, Seung-Hun
    • The KIPS Transactions:PartB
    • /
    • v.18B no.4
    • /
    • pp.173-180
    • /
    • 2011
  • This paper shows an approach for real-time object segmentation on GPU (Graphics Processing Unit) using CUDA (Compute Unified Device Architecture). Recently, many applications that is monitoring system, motion analysis, object tracking or etc require real-time processing. It is not suitable for object segmentation to procedure real-time in CPU. NVIDIA provide CUDA platform for Parallel Processing for General Computation to upgrade limit of Hardware Graphic. In this paper, we use adaptive Gaussian Mixture Background Modeling in the step of object extraction and CCL(Connected Component Labeling) for classification. The speed of GPU and CPU is compared and evaluated with implementation in Core2 Quad processor with 2.4GHz.The GPU version achieved a speedup of 3x-4x over the CPU version.

Design of Electronic Control Unit for Parking Assist System (주차 보조 시스템을 위한 ECU 설계)

  • Choi, Jin-Hyuk;Lee, Seongsoo
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.1172-1175
    • /
    • 2020
  • Automotive ECU integrates CPU core, IVN controller, memory interface, sensor interface, I/O interface, and so on. Current automotive ECUs are often developed with proprietary processor architectures. However, demends for standard processors such as ARM and RISC-V increase rapidly for saftware compatibility in autonomous vehicles and connected cars. In this paper, an automotive ECU is designed for parking assist system based on RISC-V with open instruction set architecture. It includes 32b RISC-V CPU core, IVN controllers such as CAN and LIN, memory interfaces such as ROM and SRAM, and I/O interfaces such as SPI, UART, and I2C. Fabricated in 65nm CMOS technology, its operating frequency, area, and gate count are 50MHz, 0.37㎟, and 55,310 gates, respectively.

The Design of a Structure of Network Co-processor for SDR(Software Defined Radio) (SDR(Software Defined Radio)에 적합한 네트워크 코프로세서 구조의 설계)

  • Kim, Hyun-Pil;Jeong, Ha-Young;Ham, Dong-Hyeon;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.2A
    • /
    • pp.188-194
    • /
    • 2007
  • In order to become ubiquitous world, the compatibility of wireless machines has become the significant characteristic of a communication terminal. Thus, SDR is the most necessary technology and standard. However, among the environment which has different communication protocol, it's difficult to make a terminal with only hardware using ASIC or SoC. This paper suggests the processor that can accelerate several communication protocol. It can be connected with main-processor, and it is specialized PHY layer of network The C-program that is modeled with the wireless protocol IEEE802.11a and IEEE802.11b which are based on widely used modulation way; OFDM and CDM is compiled with ARM cross compiler and done simulation and profiling with Simplescalar-Arm version. The result of profiling, most operations were Viterbi operations and complex floating point operations. According to this result we suggested a co-processor which can accelerate Viterbi operations and complex floating point operations and added instructions. These instructions are simulated with Simplescalar-Arm version. The result of this simulation, comparing with computing only one ARM core, the operations of Viterbi improved as fast as 4.5 times. And the operations of complex floating point improved as fast as twice. The operations of IEEE802.11a are 3 times faster, and the operations of IEEE802.11b are 1.5 times faster.

A small-area implementation of public-key cryptographic processor for 224-bit elliptic curves over prime field (224-비트 소수체 타원곡선을 지원하는 공개키 암호 프로세서의 저면적 구현)

  • Park, Byung-Gwan;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.6
    • /
    • pp.1083-1091
    • /
    • 2017
  • This paper describes a design of cryptographic processor supporting 224-bit elliptic curves over prime field defined by NIST. Scalar point multiplication that is a core arithmetic function in elliptic curve cryptography(ECC) was implemented by adopting the modified Montgomery ladder algorithm. In order to eliminate division operations that have high computational complexity, projective coordinate was used to implement point addition and point doubling operations, which uses addition, subtraction, multiplication and squaring operations over GF(p). The final result of the scalar point multiplication is converted to affine coordinate and the inverse operation is implemented using Fermat's little theorem. The ECC processor was verified by FPGA implementation using Virtex5 device. The ECC processor synthesized using a 0.18 um CMOS cell library occupies 2.7-Kbit RAM and 27,739 gate equivalents (GEs), and the estimated maximum clock frequency is 71 MHz. One scalar point multiplication takes 1,326,985 clock cycles resulting in the computation time of 18.7 msec at the maximum clock frequency.

Low-Gate-Count 32-Bit 2/3-Stage Pipelined Processor Design (소면적 32-bit 2/3단 파이프라인 프로세서 설계)

  • Lee, Kwang-Min;Park, Sungkyung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.4
    • /
    • pp.59-67
    • /
    • 2016
  • With the enhancement of built-in communication capabilities in various meters and wearable devices, which implies Internet of things (IoT), the demand of small-area embedded processors has increased. In this paper, we introduce a small-area 32-bit pipelined processor, Juno, which is available in the field of IoT. Juno is an EISC (Extendable Instruction Set Computer) machine and has a 2/3-stage pipeline structure to reduce the data dependency of the pipeline. It has a simple pipeline controller which only controls the program counter (PC) and two pipeline registers. It offers $32{\times}32=64$ multiplication, 64/32=32 division, $32{\times}32+64=64$ MAC (multiply and accumulate) operations together with 32*32=64 Galois field multiplication operation for encryption processing in wireless communications. It provides selective inclusion of these algebraic logic blocks if necessary in order to reduce the area of the overall processor. In this case, the gate count of our integer core amounts to 12k~22k and has a performance of 0.57 DMIPS/MHz and 1.024 Coremark/MHz.

Design of Reconfigurable Coprocessor for Multimedia Mobile Terminal (멀티미디어 무선 단말기를 위한 재구성 가능한 코프로세서의 설계)

  • Kim, Nam-Sub;Lee, Sang-Hun;Kum, Min-Ha;Kim, Jin-Sang;Cho, Won-Kyung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.4
    • /
    • pp.63-72
    • /
    • 2007
  • In this paper, we propose a novel reconfigurable coprocessor for multimedia mobile terminals. Because most of multimedia operations require fast operations of large amount of data in the limited clock frequency, it is necessary to enhance the performance of the embedded processor that is widely used in current multimedia mobile terminals. Therefore, we proposed and have designed the coprocessor which had the ability of fast operations of multimedia data. The proposed coprocessor was not only reconfigurable, but also flexible and expandable. The proposed coprocessor has been designed by using VHDL and compared with previous reconfigurable coprocessors and a commercial embedded processor in architecture and speed. As a result of the architectural comparison, the proposed coprocessor had better structure in terms of hardware size and flexibility. Also, the simulation results of DCT application showed that the proposed coprocessor was 26 times faster than a commercial ARM processor and 11 times faster than the ARM processor with fast DCT core.

Scalable RSA public-key cryptography processor based on CIOS Montgomery modular multiplication Algorithm (CIOS 몽고메리 모듈러 곱셈 알고리즘 기반 Scalable RSA 공개키 암호 프로세서)

  • Cho, Wook-Lae;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.1
    • /
    • pp.100-108
    • /
    • 2018
  • This paper describes a design of scalable RSA public-key cryptography processor supporting four key lengths of 512/1,024/2,048/3,072 bits. The modular multiplier that is a core arithmetic block for RSA crypto-system was designed with 32-bit datapath, which is based on the CIOS (Coarsely Integrated Operand Scanning) Montgomery modular multiplication algorithm. The modular exponentiation was implemented by using L-R binary exponentiation algorithm. The scalable RSA crypto-processor was verified by FPGA implementation using Virtex-5 device, and it takes 456,051/3,496347/26,011,947/88,112,770 clock cycles for RSA computation for the key lengths of 512/1,024/2,048/3,072 bits. The RSA crypto-processor synthesized with a $0.18{\mu}m$ CMOS cell library occupies 10,672 gate equivalent (GE) and a memory bank of $6{\times}3,072$ bits. The estimated maximum clock frequency is 147 MHz, and the RSA decryption takes 3.1/23.8/177/599.4 msec for key lengths of 512/1,024/2,048/3,072 bits.