• Title/Summary/Keyword: Processor Core

Search Result 396, Processing Time 0.026 seconds

A Study of Distribute Computing Performance Using a Convergence of Xeon-Phi Processor and Quantum ESPRESSO (퀀텀 에스프레소와 제온 파이 프로세서의 융합을 이용한 분산컴퓨팅 성능에 대한 연구)

  • Park, Young-Soo;Park, Koo-Rack;Kim, Dong-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.5
    • /
    • pp.15-21
    • /
    • 2016
  • Recently the degree of integration of processor and developed rapidly. However, clock speed is not increased, a situation that increases the number of cores in the processor. In this paper, we analyze the performance of a typical Intel Xeon Phi of many core process used for the current operation accelerate. Utilizing the Quantum ESPRESSO, which was calculated using the FFTW library. By varying the number of ranks in MPI when running the benchmarks the performance Xeon Phi. The result shows a good performance in the handling of four job on one physical core. However, four or more to expand the number of MPI Rank is degraded. Through this convergence it was found to improve the performance of Quantum ESPRESSO. It is possible to check the hardware characteristics of the Xeon Phi.

A Public-key Cryptography Processor supporting P-224 ECC and 2048-bit RSA (P-224 ECC와 2048-비트 RSA를 지원하는 공개키 암호 프로세서)

  • Sung, Byung-Yoon;Lee, Sang-Hyun;Shin, Kyung-Wook
    • Journal of IKEEE
    • /
    • v.22 no.3
    • /
    • pp.522-531
    • /
    • 2018
  • A public-key cryptography processor EC-RSA was designed, which integrates a 224-bit prime field elliptic curve cryptography (ECC) defined in the FIPS 186-2 as well as RSA with 2048-bit key length into a single hardware structure. A finite field arithmetic core used in both scalar multiplication for ECC and exponentiation for RSA was designed with 32-bit data-path. A lightweight implementation was achieved by an efficient hardware sharing of the finite field arithmetic core and internal memory for ECC and RSA operations. The EC-RSA processor was verified by FPGA implementation. It occupied 11,779 gate equivalents (GEs) and 14 kbit RAM synthesized with a 180-nm CMOS cell library and the estimated maximum clock frequency was 133 MHz. It takes 867,746 clock cycles for ECC scalar multiplication resulting in the estimated throughput of 34.3 kbps, and takes 26,149,013 clock cycles for RSA decryption resulting in the estimated throughput of 10.4 kbps.

Using Cache Access History for Reducing False Conflicts in Signature-Based Eager Hardware Transactional Memory (시그니처 기반 이거 하드웨어 트랜잭셔널 메모리에서의 캐시 접근 이력을 이용한 거짓 충돌 감소)

  • Kang, Jinku;Lee, Inhwan
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.442-450
    • /
    • 2015
  • This paper proposes a method for reducing false conflicts in signature-based eager hardware transactional memory (HTM). The method tracks the information on all cache blocks that are accessed by a transaction. If the information provides evidence that there are no conflicts for a given transactional request from another core, the method prevents the occurrence of a false conflict by forcing the HTM to ignore the decision based on the signature. The method is very effective in reducing false conflicts and the associated unnecessary transaction stalls and aborts, and can be used to improve the performance of the multicore processor that implements the signature-based eager HTM. When running the STAMP benchmark on a 16-core processor that implements the LogTM-SE, the increased speed (decrease in execution time) achieved with the use of the method is 20.6% on average.

Performance Analysis of Shared Buffer Router Architecture for Low Power Applications

  • Deivakani, M.;Shanthi, D.
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.6
    • /
    • pp.736-744
    • /
    • 2016
  • Network on chip (NoC) is an emerging technology in the field of multi core interconnection architecture. The routers plays an essential components of Network on chip and responsible for packet delivery by selecting shortest path between source and destination. State-of-the-art NoC designs used routing table to find the shortest path and supports four ports for packet transfer, which consume high power consumption and degrades the system performance. In this paper, the multi port multi core router architecture is proposed to reduce the power consumption and increasing the throughput of the system. The shared buffer is employed between the multi ports of the router architecture. The performance of the proposed router is analyzed in terms of power and current consumption with conventional methods. The proposed system uses Modelsim software for simulation purposes and Xilinx Project Navigator for synthesis purposes. The proposed architecture consumes 31 mW on CPLD XC2C64A processor.

A Low Power Design of H.264 Codec Based on Hardware and Software Co-design

  • Park, Seong-Mo;Lee, Suk-Ho;Shin, Kyoung-Seon;Lee, Jae-Jin;Chung, Moo-Kyoung;Lee, Jun-Young;Eum, Nak-Woong
    • Information and Communications Magazine
    • /
    • v.25 no.12
    • /
    • pp.10-18
    • /
    • 2008
  • In this paper, we present a low-power design of H.264 codec based on dedicated hardware and software solution on EMP(ETRI Multi-core platform). The dedicated hardware scheme has reducing computation using motion estimation skip and reducing memory access for motion estimation. The design reduces data transfer load to 66% compared to conventional method. The gate count of H.264 encoder and the performance is about 455k and 43Mhz@30fps with D1(720x480) for H.264 encoder. The software solution is with ASIP(Application Specific Instruction Processor) that it is SIMD(Single Instruction Multiple Data), Dual Issue VLIW(Very Long Instruction Word) core, specified register file for SIMD, internal memory and data memory access for memory controller, 6 step pipeline, and 32 bits bus width. Performance and gate count is 400MHz@30fps with CIF(Common Intermediated format) and about 100k per core for H.264 decoder.

Design of Software and Hardware Modules for a TCP/IP Offload Engine with Separated Transmission and Reception Paths (송수신 분리형 TCP/IP Offload Engine을 위한 소프트웨어 및 하드웨어 모듈의 설계)

  • Jang Hank-Kok;Chung Sang-Hwa;Choi Young-In
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.691-698
    • /
    • 2006
  • TCP/IP Offload Engine (TOE) is a technology that processes TCP/IP on a network adapter instead of a host CPU to reduce protocol processing overhead from the host CPU. There have been some approaches to implementing TOE: software TOE based on an embedded processor; hardware TOE based on ASIC implementation; and hybrid TOE in which software and hardware functions are combined. In this paper, we designed software modules and hardware modules for a hybrid TOE on an FPGA that had two processor cores. Software modules are based on the embedded Linux. Hardware modules are for data transmission (TX) and reception (RX). One core controls the TX path and the other controls the RX path of the Linux. This TX/RX path separation mechanism can reduce task switching overheads between processes and overcome poor performance of single embedded processor. Hardware modules deal with creating headers for outgoing packets, processing headers of incoming packets, and fetching or storing data from or to the host memory by DMA. These can make it possible to improve the performance of data transmission and reception. We proved performance of the TOE with separated transmission and reception paths by performing experiments with a TOE network adapter that was equipped with the FPGA having processor cores.

A Design of AES-based WiBro Security Processor (AES 기반 와이브로 보안 프로세서 설계)

  • Kim, Jong-Hwan;Shin, Kyung-Wook
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.7 s.361
    • /
    • pp.71-80
    • /
    • 2007
  • This paper describes an efficient hardware design of WiBro security processor (WBSec) supporting for the security sub-layer of WiBro wireless internet system. The WBSec processor, which is based on AES (Advanced Encryption Standard) block cipher algorithm, performs data oncryption/decryption, authentication/integrity, and key encryption/decryption for packet data protection of wireless network. It carries out the modes of ECB, CTR, CBC, CCM and key wrap/unwrap with two AES cores working in parallel. In order to achieve an area-efficient implementation, two design techniques are considered; First, round transformation block within AES core is designed using a shared structure for encryption/decryption. Secondly, SubByte/InvSubByte blocks that require the largest hardware in AES core are implemented using field transformation technique. It results that the gate count of WBSec is reduced by about 25% compared with conventional LUT (Look-Up Table)-based design. The WBSec processor designed in Verilog-HDL has about 22,350 gates, and the estimated throughput is about 16-Mbps at key wrap mode and maximum 213-Mbps at CCM mode, thus it can be used for hardware design of WiBro security system.

Design Space Exploration of Embedded Many-Core Processors for Real-Time Fire Feature Extraction (실시간 화재 특징 추출을 위한 임베디드 매니코어 프로세서의 디자인 공간 탐색)

  • Suh, Jun-Sang;Kang, Myeongsu;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.10
    • /
    • pp.1-12
    • /
    • 2013
  • This paper explores design space of many-core processors for a fire feature extraction algorithm. This paper evaluates the impact of varying the number of cores and memory sizes for the many-core processor and identifies an optimal many-core processor in terms of performance, energy efficiency, and area efficiency. In this study, we utilized 90 samples with dimensions of $256{\times}256$ (60 samples containing fire and 30 samples containing non-fire) for experiments. Experimental results using six different many-core architectures (PEs=16, 64, 256, 1,024, 4,096, and 16,384) and the feature extraction algorithm of fire indicate that the highest area efficiency and energy efficiency are achieved at PEs=1,024 and 4,096, respectively, for all fire/non-fire containing movies. In addition, all the six many-core processors satisfy the real-time requirement of 30 frames-per-second (30 fps) for the algorithm.

Design and Verification of the Class-based Architecture Description Language (클래스-기반 아키텍처 기술 언어의 설계 및 검증)

  • Ko, Kwang-Man
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.7
    • /
    • pp.1076-1087
    • /
    • 2010
  • Together with a new advent of embedded processor developed to support specific application area and it evolution, a new research of software development to support the embedded processor and its commercial challenge has been revitalized. Retargetability is typically achieved by providing target machine information, ADL, as input. The ADLs are used to specify processor and memory architectures and generate software toolkit including compiler, simulator, assembler, profiler, and debugger. The EXPRESSION ADL follows a mixed level approach-it can capture both the structure and behavior supporting a natural specification of the programmable architectures consisting of processor cores, coprocessors, and memories. And it was originally designed to capture processor/memory architectures and generate software toolkit to enable compiler-in-the-loop exploration of SoC architecture. In this paper, we designed the class-based ADL based on the EXPRESSION ADL to promote the write-ability, extensibility and verified the validation of grammar. For this works, we defined 6 core classes and generated the EXPRESSION's compiler and simulator through the MIPS R4000 description.

Design of RISC-based Transmission Wrapper Processor IP for TCP/IP Protocol Stack (TCP/IP프로토콜 스택을 위한 RISC 기반 송신 래퍼 프로세서 IP 설계)

  • 최병윤;장종욱
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.6
    • /
    • pp.1166-1174
    • /
    • 2004
  • In this paper, a design of RISC-based transmission wrapper processor for TCP/IP protocol stack is described. The processor consists of input and output buffer memory with dual bank structure, 32-bit RISC microprocessor core, DMA unit with on-the-fly checksum capability, and memory module. To handle the various modes of TCP/IP protocol, hardware-software codesign approach based on RISC processor is used rather than the conventional state machine design. To eliminate large delay time due to sequential executions of data transfer and checksum operation, DMA module which can execute the checksum operation along with data transfer operation is adopted. The designed processor exclusive of variable-size input/output buffer consists of about 23,700 gates and its maximum operating frequency is about 167MHz under 0.35${\mu}m$ CMOS technology.