• 제목/요약/키워드: Processor Architecture

검색결과 735건 처리시간 0.024초

Easily Adaptable On-Chip Debug Architecture for Multicore Processors

  • Xu, Jing-Zhe;Park, Hyeongbae;Jung, Seungpyo;Park, Ju Sung
    • ETRI Journal
    • /
    • 제35권2호
    • /
    • pp.301-310
    • /
    • 2013
  • Nowadays, the multicore processor is watched with interest by people all over the world. As the design technology of system on chip has developed, observing and controlling the processor core's internal state has not been easy. Therefore, multicore processor debugging is very difficult and time-consuming. Thus, we need a reliable and efficient debugger to find the bugs. In this paper, we propose an on-chip debug architecture for multicore processors that is easily adaptable and flexible. It is based on the JTAG standard and supports monitoring mode debugging, which is different from run-stop mode debugging. Compared with the debug architecture that supports the run-stop mode debugging, the proposed architecture is easily applied to a debugger and has the advantage of having a desirable gate count and execution cycle. To verify the on-chip debug architecture, it is applied to the debugger of the prototype multicore processor and is tested by interconnecting it with a software debugger based on GDB and configured for the target processor.

On-Chip Debug Architecture for Multicore Processor

  • Park, Hyeong-Bae;Xu, Jing-Zhe;Kim, Kil-Hyun;Park, Ju-Sung
    • ETRI Journal
    • /
    • 제34권1호
    • /
    • pp.44-54
    • /
    • 2012
  • Because of the intrinsic lack of internal-system observability and controllability in highly integrated multicore processors, very restricted access is allowed for the debugging of erroneous chip behavior. Therefore, the building of an efficient debug function is an important consideration in the design of multicore processors. In this paper, we propose a flexible on-chip debug architecture that embeds a special logic supporting the debug functionality in the multicore processor. It is designed to support run-stop-type debug functions that can halt and control the execution of the multicore processor at breakpoint events and inspect the possible causes of any errors. The debug architecture consists of the following three functional components: the core debug support block, the multicore debug support block, and the debug interface and control block. By embedding this debug infrastructure, the embedded processor cores within the multicore processor can be debugged simultaneously as well as independently. The debug control is performed by employing a JTAG-based scanning operation. We apply this on-chip debug architecture to build a debugger for a prototype multicore processor and demonstrate the validity and scalability of our approach.

멀티코어 순차 수퍼스칼라 프로세서의 성능 연구 (Performance Study of Multi-core In-Order Superscalar Processor Architecture)

  • 이종복
    • 한국인터넷방송통신학회논문지
    • /
    • 제12권5호
    • /
    • pp.123-128
    • /
    • 2012
  • 최근에 이르러 디지털 시스템의 성능을 극대화하기 위하여, 멀티코어 프로세서가 상용화 되어 널리 이용되고 있다. 이러한 멀티코어 프로세서를 구성하는 단위 코어의 성능을 높이면, 적은 개수의 코어를 가지고 시스템의 성능을 크게 향상시킬 수가 있다. 본 논문에서는 순차실행 방식의 수퍼스칼라를 단위 코어로 하는 멀티코어 프로세서 아키텍쳐를 제안하였다. 그리고, 윈도우 크기가 4에서 16이고 2-코어에서 16-코어로 구성되는 멀티코어 수퍼스칼라 프로세서에 대하여, SPEC 2000 벤치마크를 입력으로 하는 광범위한 모의실험을 수행하였다. 모의실험 결과, 윈도우의 크기가 16일 때 16-코어 수퍼스칼라 프로세서는 1-코어 수퍼스칼라 프로세서보다 8.4배의 성능 향상을 가져왔다. 또한, 같은 코어 개수를 가진 멀티 코어 수퍼스칼라 프로세서의 성능이 멀티코어 RISC 프로세서의 성능의 2 배를 기록하였다.

멀티미디어 프로세서 아키텍쳐에 관한 연구 (A Study on Multimedia Processor Architecture)

  • 박춘명;이택근
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2005년도 추계종합학술대회
    • /
    • pp.1177-1180
    • /
    • 2005
  • This paper present a method of constructing the multimedia processor architecture. The proposed multimedia processor architecture be able to handle each text, sound, and video in one chip. Also it have interactive function that is a characteristics of multimedia. Specially, the proposed multimedia processor be able to addressing nodes in memory map without software, and it is completely reconfigurable depend on data. Also it as able to process time and space common that have synchronous/asynchronous and it is able to protect continuous and dynamic media bus collision, and local and overall common memory structure. The proposed multimedia processor architecture apply to virtual reality and mixed reality.

  • PDF

유비쿼터스 센서 노드를 위한 저전력 프로세서의 개발 (Design of Ultra Low Power Processor for Ubiquitous Sensor Node)

  • 신치훈;오명훈;박경;김성운
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2006년도 심포지엄 논문집 정보 및 제어부문
    • /
    • pp.165-167
    • /
    • 2006
  • In this paper we present a new-generation sensor network processor which is not optimized in circuit level, but in system architecture level. The new design build on a conventional processor architecture, improving the design by focusing on application oriented specification, ISA, and micro-architectural optimization that reduce overall design size and advance energy-per-instruction. The design employs harvard architecture, 8-bit data paths, and an compact 19 bit wide RISC ISA. The design also features a unique interrupt handler which offloads periodical monitoring jobs from the main part of CPU. Our most efficient design is capable of running at 300 KHz (0.3 MIPS) while consuming only about few pJ/instruction.

  • PDF

Application에 최적의 ASIP 설계를 위한 효율적인 Architecture Exploration 방법 (An Efficient Architecture Exploration Method for Optimal ASIP Design)

  • 이성래;황선영
    • 한국통신학회논문지
    • /
    • 제32권9C호
    • /
    • pp.913-921
    • /
    • 2007
  • 프로세서에 따라 수행 가능한 코드를 생성하는 retargetable 컴파일러와 성능 프로파일러는 어플리케이션에 최적화된 프로세서 디자인에 있어 필수적이다. 본 논문은 ADL (Architecture Description Language)에 기반한 architecture exploration 방법을 제시한다. 어플리케이션 프로그램에서 얻어낸 정보로부터 인스트럭션 합성과 프로세서 구조를 최적화 하였다. 어플리케이션에서 많이 사용되는 연산과 레지스터 사용에 대한 정보는 프로세서 최적화를 위해 사용되었다. 시스템의 효용성을 보이기 위해 JPEG 인코더에 대한 architecture exploration을 수행하였다. 제안된 방법을 사용해 설계된 ASIP은 초기 프로세서에 비해 약 1.97배의 성능을 가지는 것으로 측정되었다.

Homogeneous Transformation Matrix의 곱셈을 위한 병렬구조 프로세서의 설계 (A Parallel-Architecture Processor Design for the Fast Multiplication of Homogeneous Transformation Matrices)

  • 권두올;정태상
    • 대한전기학회논문지:시스템및제어부문D
    • /
    • 제54권12호
    • /
    • pp.723-731
    • /
    • 2005
  • The $4{\times}4$ homogeneous transformation matrix is a compact representation of orientation and position of an object in robotics and computer graphics. A coordinate transformation is accomplished through the successive multiplications of homogeneous matrices, each of which represents the orientation and position of each corresponding link. Thus, for real time control applications in robotics or animation in computer graphics, the fast multiplication of homogeneous matrices is quite demanding. In this paper, a parallel-architecture vector processor is designed for this purpose. The processor has several key features. For the accuracy of computation for real application, the operands of the processors are floating point numbers based on the IEEE Standard 754. For the parallelism and reduction of hardware redundancy, the processor takes column vectors of homogeneous matrices as multiplication unit. To further improve the throughput, the processor structure and its control is based on a pipe-lined structure. Since the designed processor can be used as a special purpose coprocessor in robotics and computer graphics, additionally to special matrix/matrix or matrix/vector multiplication, several other useful instructions for various transformation algorithms are included for wide application of the new design. The suggested instruction set will serve as standard in future processor design for Robotics and Computer Graphics. The design is verified using FPGA implementation. Also a comparative performance improvement of the proposed design is studied compared to a uni-processor approach for possibilities of its real time application.

RISC-V 아키텍처 기반 6단계 파이프라인 RV32I프로세서의 설계 및 구현 (Design and Implementation of a Six-Stage Pipeline RV32I Processor Based on RISC-V Architecture)

  • 민경진;최서진;황유빈;김선희
    • 반도체디스플레이기술학회지
    • /
    • 제23권2호
    • /
    • pp.76-81
    • /
    • 2024
  • UC Berkeley developed RISC-V, which is an open-source Instruction Set Architecture. This paper proposes a 32-bit 6-stage pipeline architecture based on the RV32I RSIC-V. The performance of the proposed 6-stage pipeline architecture is compared with the existing 32-bit 5-stage pipeline architecture also based on the RV32I processor ISA to determine the impact of the number of pipeline stages on performance. The RISC-V processor is designed in Verilog-HDL and implemented using Quartus Prime 20.1. To compare performance the Dhrystone benchmark is used. Subsequently, peripherals such as GPIO, TIMER, and UART are connected to verify operation through an FPGA. The maximum clock frequency for the 5-stage pipeline processor is 42.02 MHz, while for the 6-stage pipeline processor, it was 49.9MHz, representing an 18.75% increase.

  • PDF

휴대 단말기용 3D Graphics Lighting Processor 설계 (A Design of 3D Graphics Lighting Processor for Mobile Applications)

  • 양준석;김기철
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2005년도 추계종합학술대회
    • /
    • pp.837-840
    • /
    • 2005
  • This paper presents 3D graphics lighting processor based on vector processing using pipeline chaining. The lighting process of 3D graphics rendering contains many arithmetic operations and its complexity is very high. For high throughput, proposed processor uses pipelined functional units. To implement fully pipelined architecture, we have to use many functional units. Hence, the number of functional units is restricted. However, with the restricted number of pipelined functional units, the utilization of the units is reduced and a resource reservation problem is caused. To resolve these problems, the proposed architecture uses vector processing using pipeline chaining. Due to its pipeline chaining based architecture, it can perform 4.09M vertices per 1 second with 100MHz frequency. The proposed 3D graphics lighting processor is compatible with OpenGL ES API and the design is implemented and verified on FPGA.

  • PDF

SPEC 벤치마크 프로그램에 대한 매니코어 프로세서의 성능 연구 (A Performance Study on Many-core Processor Architectures with SPEC Benchmark Programs)

  • 이종복
    • 전기학회논문지
    • /
    • 제62권2호
    • /
    • pp.252-256
    • /
    • 2013
  • In order to overcome the complexity and performance limit problems of superscalar processors, the multi-core architecture has been prevalent recently. Usually, the number of cores mostly used for the multi-core processor architecture ranges from 2 to 16. However in the near future, more than 32-cores are likely to be utilized, which is called as many-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the 32 to 1024 many-core architectures extensively. For 1024-cores, the average performance scores 15.7 IPC, but the performance increase rate is saturated.