Search | Korea Science

Code Size Reduction and Execution performance Improvement with Instruction Set Architecture Design based on Non-homogeneous Register Partition (코드감소와 성능향상을 위한 이질 레지스터 분할 및 명령어 구조 설계)

Kwon, Young-Jun;Lee, Hyuk-Jae
- The Transactions of the Korean Institute of Electrical Engineers A
- /
- v.48 no.12
- /
- pp.1575-1579
- /
- 1999
Embedded processors often accommodate two instruction sets, a standard instruction set and a compressed instruction set. With the compressed instruction set, code size can be reduced while instruction count (and consequently execution time) can be increased. To achieve code size reduction without significant increase of execution time, this paper proposes a new compressed instruction set architecture, called TOE (Two Operations Execution). The proposed instruction set format includes the parallel bit that indicates an instruction can be executed simultaneously with the next instruction. To add the parallel bit, TOE instruction format reduces the destination register field. The reduction of the register field limits the number of registers that are accessible by an instruction. To overcome the limited accessibility of registers, TOE adapts non-homogeneous register partition in which registers are divided into multiple subsets, each of which are accessed by different groups of instructions. With non-homogeneous registers, each instruction can access only a limited number of registers, but an entire program can access all available registers. With efficient non-homogeneous register allocator, all registers can be used in a balanced manner. As a result, the increase of code size due to register spills is negligible. Experimental results show that more than 30% of TOE instructions can be executed in parallel without significant increase of code size when compared to existing Thumb instruction set.
PDF

Parallel Branch Instruction Extension for Thumb-2 Instruction Set Architecture (Thumb-2 명령어 집합 구조의 병렬 분기 명령어 확장)

Kim, Dae-Hwan
- Journal of the Korea Society of Computer and Information
- /
- v.18 no.7
- /
- pp.1-10
- /
- 2013
In this paper, the parallel branch instruction is proposed which executes a branch instruction and the frequently used instruction simultaneously to improve the performance of Thumb-2 instruction set architecture. In the proposed approach, new 32-bit parallel branch instructions are introduced which combine 16-bit branch instruction with each of the frequently used 16-bit LOAD, ADD, MOV, STORE, and SUB instructions, respectively. To provide the encoding space of the new instructions, the register field in less frequently executed instructions is reduced, and the new instructions are encoded by using the saved bits. Experiments show that the proposed approach improves performance by an average of 8.0% when compared to the conventional approach.
https://doi.org/10.9708/jksci.2013.18.7.001 인용 PDF KSCI

Design of A On-Chip Caches for RISC Processors (RISC 프로세서 On-Chip Cache의 설계)

홍인식;임인칠
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.27 no.8
- /
- pp.1201-1210
- /
- 1990
This paper proposes on-chip instruction and data cache memories on RISC reduced instruction set computer) architecture which supports fast instruction fetch and data read/write, and enables RISC processor under research to obtain high performance. In the execution of HLL(high level language) programs, heavily used local scalar variables are stored in large register file, but arrays, structures, and global scalar variables are difficult for compiler to allocate registers. These problems can be solved by on-chip Instruction/Data cache. And each cycle of instruction fetch, pad delay causes the lowering of the processors's performance. Cache memories are designed in CMOS technology and SRAM(static-RAM), that saves layout area and power dissipation, is used for instruction and data storage. To speed up and support RISC processor's piplined architecture efficiently, hardwired logic technology is used overall circuits i cache blocks. The schematic capture and timing simulation of proposed cache memorises are performed on Apollo DN4000 workstation using Mentor Graphics CAD tools.
PDF

A 32-bit Microprocessor with enhanced digital signal process functionality (디지털 신호처리 기능을 강화한 32비트 마이크로프로세서)

Moon, Sang-ook
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- v.9 no.2
- /
- pp.820-822
- /
- 2005
We have designed a 32-bit microprocessor with fixed point digital signal processing functionality. This processor, combines both general-purpose microprocessor and digital signal processor functionality using the reduced instruction set computer design principles. It has functional units for arithmetic operation, digital signal processing and memory access. They operate in parallel in order to remove stall cycles after DSP or load/store instructions, which usually need one or more issue latency cycles in addition to the first issue cycle. High performance was achieved with these parallel functional units while adopting a sophisticated five-stage pipeline stucture.
PDF

Benchmarking Korean Block Ciphers on 32-Bit RISC-V Processor (32-bit RISC-V 프로세서에서 국산 블록 암호 성능 밴치마킹)

Kwak, YuJin;Kim, YoungBeom;Seo, Seog Chung
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.31 no.3
- /
- pp.331-340
- /
- 2021
As the communication industry develops, the development of SoC (System on Chip) is increasing. Accordingly, the paradigm of technology design of industries and companies is changing. In the existing process, companies purchased micro-architecture, but now they purchase ISA (Instruction Set Architecture), and companies design the architecture themselves. RISC-V is an open instruction set based on a reduced instruction set computer. RISC-V is equipped with ISA, which can be expanded through modularization, and an expanded version of ISA is currently being developed through the support of global companies. In this paper, we present benchmarking frameworks ARIA, LEA, and PIPO of Korean block ciphers in RISC-V. We propose implementation methods and discuss performance by utilizing the basic instruction set and features of RISC-V.
https://doi.org/10.13089/JKIISC.2021.31.3.331 인용 PDF KSCI HTML

The Design of A Program Counter Unit for RISC Processors (RISC 프로세서의 프로그램 카운터 부(PCU)의 설계)

홍인식;임인칠
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.27 no.7
- /
- pp.1015-1024
- /
- 1990
This paper proposes a program counter unit(PCU) on the pipelined architecture of RISC (Reduced Instruction Set Computer) type high performance processors, PCU is used for supplying instruction addresses to memory units(Instruction Cache) efficiently. A RISC processor's PCU has to compute the instruction address within required intervals continnously. So, using the method of self-generated incrementor, is more efficient than the conventional one's using ALU or private adder. The proposed PCU is designed to have the fast +4(Byte Address) operation incrementor that has no carry propagation delay. Design specifications are taken by analyzing the whole data path operation of target processor's default and exceptional mode instructions. CMOS and wired logic circuit technologic are used in PCU for the fast operation which has small layout area and power dissipation. The schematic capture and logic, timing simulation of proposed PCU are performed on Apollo W/S using Mentor Graphics CAD tooks.
PDF

Selecting a Synthesizable RISC-V Processor Core for Low-cost Hardware Devices

Gookyi, Dennis Agyemanh Nana;Ryoo, Kwangki
- Journal of Information Processing Systems
- /
- v.15 no.6
- /
- pp.1406-1421
- /
- 2019
The Internet-of-Things (IoT) has been deployed in almost every facet of our day to day activities. This is made possible because sensing and data collection devices have been given computing and communication capabilities. The devices implement System-on-Chips (SoCs) that incorporate a lot of functionalities, yet they are severely constrained in terms of memory capacitance, hardware area, and power consumption. With the increase in the functionalities of sensing devices, there is a need for low-cost synthesizable processors to handle control, interfacing, and error processing. The first step in selecting a synthesizable processor core for low-cost devices is to examine the hardware resource utilization to make sure that it fulfills the requirements of the device. This paper gives an analysis of the hardware resource usage of ten synthesizable processors that implement the Reduced Instruction Set Computer Five (RISC-V) Instruction Set Architecture (ISA). All the ten processors are synthesized using Vivado v2018.02. The maximum frequency, area, and power reports are extracted and a comparison is made to determine which processor is ideal for low-cost hardware devices.
https://doi.org/10.3745/JIPS.03.0129 인용 PDF KSCI

Design of Architecture of Programmable Stack-based Video Processor with VHDL (VHDL을 이용한 프로그램 가능한 스택 기반 영상 프로세서 구조 설계)

박주현;김영민
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.36C no.4
- /
- pp.31-43
- /
- 1999
The main goal of this paper is to design a high performance SVP(Stack based Video Processor) for network applications. The SVP is a comprehensive scheme; 'better' in the sense that it is an optimal selection of previously proposed enhancements of a stack machine and a video processor. This can process effectively object-based video data using a S-RISC(Stack-based Reduced Instruction Set Computer) with a semi -general-purpose architecture having a stack buffer for OOP(Object-Oriented Programming) with many small procedures at running programs. And it includes a vector processor that can improve the MPEG coding speed. The vector processor in the SVP can execute advanced mode motion compensation, motion prediction by half pixel and SA-DCT(Shape Adaptive-Discrete Cosine Transform) of MPEG-4. Absolutors and halfers in the vector processor make this architecture extensive to a encoder. We also designed a VLSI stack-oriented video processor using the proposed architecture of stack-oriented video decoding. It was designed with O.5$\mu\textrm{m}$ 3LM standard-cell technology, and has 110K logic gates and 12 Kbits SRAM internal buffer. The operating frequency is 50MHz. This executes algorithms of video decoding for QCIF 15fps(frame per second), maximum rate of VLBV(Very Low Bitrate Video) in MPEG-4.
PDF

Multicore Flow Processor with Wire-Speed Flow Admission Control

Doo, Kyeong-Hwan;Yoon, Bin-Yeong;Lee, Bhum-Cheol;Lee, Soon-Seok;Han, Man Soo;Kim, Whan-Woo
- ETRI Journal
- /
- v.34 no.6
- /
- pp.827-837
- /
- 2012
We propose a flow admission control (FAC) for setting up a wire-speed connection for new flows based on their negotiated bandwidth. It also terminates a flow that does not have a packet transmitted within a certain period determined by the users. The FAC can be used to provide a reliable transmission of user datagram and transmission control protocol applications. If the period of flows can be set to a short time period, we can monitor active flows that carry a packet over networks during the flow period. Such powerful flow management can also be applied to security systems to detect a denial-of-service attack. We implement a network processor called a flow management network processor (FMNP), which is the second generation of the device that supports FAC. It has forty reduced instruction set computer core processors optimized for packet processing. It is fabricated in 65-nm CMOS technology and has a 40-Gbps process performance. We prove that a flow router equipped with an FMNP is better than legacy systems in terms of throughput and packet loss.
https://doi.org/10.4218/etrij.12.1812.0046 인용 PDF KSCI

低電力 MCU core의 設計에 對해

An, Hyeong-Geun;Jeong, Bong-Yeong;No, Hyeong-Rae
- The Magazine of the IEIE
- /
- v.25 no.5
- /
- pp.31-41
- /
- 1998
With the advent of portable electronic systems, power consumption has recently become a major issue in circuit and system design. Furthermore, the sophisticated fabrication technology makes it possible to embed more functions and features in a VLSI chip, consequently calling for both higher performance and lower power to deal with the ever growing complexity of system algorithms than in the past. VLSI designers should cope with two conflicting constraints, high performance and low power, offering an optimum trade off of these constraints to meet requirements of system. Historically, VLSI designers have focused on performance improvement, and power dissipation was not a design criteria but an afterthought. This design paradigm should be changed, as power is emerging as the most critical design constraint. In VLSI design, low power design can be accomplished through many ways, for instance, process, circuit/logic design, architectural design, and etc.. In this paper, a few low power design examples, which have been used in 8 bit micro-controller core, and can be used also in 4/16/32 bit micro-controller cores, are presented in the areas of circuit, logic and architectural design. We first propose a low power guidelines for micro-controller design in SAMSUNG, and more detailed design examples are followed applying 4 specific design guidelines. The 1st example shows the power reduction through reduction of number of state clocks per instruction. The 2nd example realized the power reduction by applying RISC(Reduced Instruction Set Computer) concept. The 3rd example is to optimize the algorithm for ALU(Arithmetic Logic Unit) to lower the power consumption, Lastly, circuit cells designed for low power are described.
PDF

Search Result 22, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)