• Title/Summary/Keyword: Instruction code

Search Result 145, Processing Time 0.021 seconds

Code Size Reduction and Execution performance Improvement with Instruction Set Architecture Design based on Non-homogeneous Register Partition (코드감소와 성능향상을 위한 이질 레지스터 분할 및 명령어 구조 설계)

  • Kwon, Young-Jun;Lee, Hyuk-Jae
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.12
    • /
    • pp.1575-1579
    • /
    • 1999
  • Embedded processors often accommodate two instruction sets, a standard instruction set and a compressed instruction set. With the compressed instruction set, code size can be reduced while instruction count (and consequently execution time) can be increased. To achieve code size reduction without significant increase of execution time, this paper proposes a new compressed instruction set architecture, called TOE (Two Operations Execution). The proposed instruction set format includes the parallel bit that indicates an instruction can be executed simultaneously with the next instruction. To add the parallel bit, TOE instruction format reduces the destination register field. The reduction of the register field limits the number of registers that are accessible by an instruction. To overcome the limited accessibility of registers, TOE adapts non-homogeneous register partition in which registers are divided into multiple subsets, each of which are accessed by different groups of instructions. With non-homogeneous registers, each instruction can access only a limited number of registers, but an entire program can access all available registers. With efficient non-homogeneous register allocator, all registers can be used in a balanced manner. As a result, the increase of code size due to register spills is negligible. Experimental results show that more than 30% of TOE instructions can be executed in parallel without significant increase of code size when compared to existing Thumb instruction set.

  • PDF

Cloudification of On-Chip Flash Memory for Reconfigurable IoTs using Connected-Instruction Execution (연결기반 명령어 실행을 이용한 재구성 가능한 IoT를 위한 온칩 플래쉬 메모리의 클라우드화)

  • Lee, Dongkyu;Cho, Jeonghun;Park, Daejin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.2
    • /
    • pp.103-111
    • /
    • 2019
  • The IoT-driven large-scaled systems consist of connected things with on-chip executable embedded software. These light-weighted embedded things have limited hardware space, especially small size of on-chip flash memory. In addition, on-chip embedded software in flash memory is not easy to update in runtime to equip with latest services in IoT-driven applications. It is becoming important to develop light-weighted IoT devices with various software in the limited on-chip flash memory. The remote instruction execution in cloud via IoT connectivity enables to provide high performance software execution with unlimited software instruction in cloud and low-power streaming of instruction execution in IoT edge devices. In this paper, we propose a Cloud-IoT asymmetric structure for providing high performance instruction execution in cloud, still low power code executable thing in light-weighted IoT edge environment using remote instruction execution. We propose a simulated approach to determine efficient partitioning of software runtime in cloud and IoT edge. We evaluated the instruction cloudification using remote instruction by determining the execution time by the proposed structure. The cloud-connected instruction set simulator is newly introduced to emulate the behavior of the processor. Experimental results of the cloud-IoT connected software execution using remote instruction showed the feasibility of cloudification of on-chip code flash memory. The simulation environment for cloud-connected code execution successfully emulates architectural operations of on-chip flash memory in cloud so that the various software services in IoT can be accelerated and performed in low-power by cloudification of remote instruction execution. The execution time of the program is reduced by 50% and the memory space is reduced by 24% when the cloud-connected code execution is used.

Low-Power IoT Microcontroller Code Memory Interface using Binary Code Inversion Technique Based on Hot-Spot Access Region Detection (핫스팟 접근영역 인식에 기반한 바이너리 코드 역전 기법을 사용한 저전력 IoT MCU 코드 메모리 인터페이스 구조 연구)

  • Park, Daejin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.2
    • /
    • pp.97-105
    • /
    • 2016
  • Microcontrollers (MCUs) for endpoint smart sensor devices of internet-of-thing (IoT) are being implemented as system-on-chip (SoC) with on-chip instruction flash memory, in which user firmware is embedded. MCUs directly fetch binary code-based instructions through bit-line sense amplifier (S/A) integrated with on-chip flash memory. The S/A compares bit cell current with reference current to identify which data are programmed. The S/A in reading '0' (erased) cell data consumes a large sink current, which is greater than off-current for '1' (programmed) cell data. The main motivation of our approach is to reduce the number of accesses of erased cells by binary code level transformation. This paper proposes a built-in write/read path architecture using binary code inversion method based on hot-spot region detection of instruction code access to reduce sensing current in S/A. From the profiling result of instruction access patterns, hot-spot region of an original compiled binary code is conditionally inverted with the proposed bit-inversion techniques. The de-inversion hardware only consumes small logic current instead of analog sink current in S/A and it is integrated with the conventional S/A to restore original binary instructions. The proposed techniques are applied to the fully-custom designed MCU with ARM Cortex-M0$^{TM}$ using 0.18um Magnachip Flash-embedded CMOS process and the benefits in terms of power consumption reduction are evaluated for Dhrystone$^{TM}$ benchmark. The profiling environment of instruction code executions is implemented by extending commercial ARM KEIL$^{TM}$ MDK (MCU Development Kit) with our custom-designed access analyzer.

A study on the Data Anti-Collision of RFID system (무선인식시스템의 데이터 충돌방지에 관한 연구)

  • 강민수;신석균;백선기;박면규;곽칠성;이기서
    • Proceedings of the KSR Conference
    • /
    • 2002.10a
    • /
    • pp.471-477
    • /
    • 2002
  • In this paper, it is proposed instruction code satisfied algorithm that is able to prevent data collision when transponder access in the area of recognition system that is operated to single channel. Differ from absolute collision is used to in the time domain procedure, instruction code satisfied algorithm, transmits data which don't generate collision and must satisfy instruction code. So, to prevent data collision, transponder is of great if it read the entire instruction code. Consequently, it is applied to algorithm that made the system of wireless recognition13.56 Mhz. As a result, for the time of one bit data transmission had 14${\mu}$s difference, it is proved the prevention of data loss in experiment.

  • PDF

Register Pressure Aware Code Selection Algorithm for Multi-Output Instructions (Register Pressure를 고려한 다중 출력 명령어를 위한 개선된 코드 생성 방법)

  • Youn, Jong-Hee M.;Paek, Yun-Heung;Ko, Kwang-Man
    • The KIPS Transactions:PartA
    • /
    • v.19A no.1
    • /
    • pp.45-50
    • /
    • 2012
  • The demand for faster execution time and lower energy consumption has compelled architects of embedded processors to customize it to the needs of their target applications. These processors consequently provide a rich set of specialized instructions in order to enable programmers to access these features. Such an instruction is typically a $multi$-$output$ $instruction$ (MOI), which outputs multiple results parallely in order to exploit inherent underlying hardware parallelism. Earlier study has exhibited that MOIs help to enhance performance in aspect of instruction counts and code size. However the earlier algorithm does not consider the register pressure. So, some selected MOIs introduce register spill/reload code that increases the code size and instruction count. To attack this problem, we introduce a novel iterated instruction selection algorithm based on the register pressure of each selected MOIs. The experimental results show the suggested algorithm achieves 3% code-size reduction and 2.7% speed-up on average.

Resuable Design of 32-Bit RISC Processor for System On-A Chip (SOC 설계를 위한 저전력 32-비트 RISC 프로세서의 재사용 가능한 설계)

  • 이세환;곽승호;양훈모;이문기
    • Proceedings of the IEEK Conference
    • /
    • 2001.06b
    • /
    • pp.105-108
    • /
    • 2001
  • 4 32-bit RISC core is designed for embedded application and DSP. This processor offers low power consumption by fully static operation and compact code size by efficient instruction set. Processor performance is improved by wing conditional instruction execution, block data transfer instruction, multiplication instruction, bunked register file structure. To support compact code size of embedded application, It is capable cf executing both 16-bit instructions and 32-bit instruction through mixed mode instruction conversion Furthermore, for fast MAC operation for DSP applications, the processor has a dedicated hardware multiplier, which can complete a 32-bit by 32-bit integer multiplication within seven clock cycles. These result in high instruction throughput and real-time interrupt response. This chip is implemented with 0.35${\mu}{\textrm}{m}$, 4- metal CMOS technology and consists of about 50K gate equivalents.

  • PDF

Study on LLVM application in Parallel Computing System (병렬 컴퓨팅 시스템에서 LLVM 응용 연구)

  • Cho, Jungseok;Cho, Doosan;Kim, Yongyeon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.1
    • /
    • pp.395-399
    • /
    • 2019
  • In order to support various parallel computing systems, it is necessary to extend LLVM IR to more efficiently support vector / matrix and to design LLVM IR to machine code as a new algorithm. As shown in the IR example, RISC instruction generation is naturally generated because the RISC instruction is basically composed of the RISC instruction, and the vector instruction is also not supported. There is a need for new IR structures, command generation algorithms and related extensions to support vector / matrix more robustly. To do this, it is important to map each instruction in the LLVM IR to the appropriate instruction in the target architecture (vector / matrix) (instruction selection algorithm). It is necessary to understand the meaning of LLVM IR command, to compare the meaning of each instruction of the target architecture with syntax, and to select the instruction that matches the pattern to make mapping efficient.

A Design of an Embedded Microprocessor with Variable Length Instruction Mode (가변길이 명령어 모드를 갖는 Embedded Microprocessor의 설계)

  • 박기현;오민석;이광엽;한진호;김영수;배영환;조한진
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.4
    • /
    • pp.83-90
    • /
    • 2004
  • In this paper, we proposed a new instruction set(X32Y ISA) with 3 different types of instruction mode. The proposed instruction set organizes 32-bit, 24-bit, 16-bit instruction in order to solves a problem of memory size limitation in an embedded microprocessor. We designed a 32-bit 5 stage pipeline RISC microprocessor based on the X32V ISA. To verify the proposed the X32V ISA and a microprocessor, we estimated a program code size of multimedia application programs using a X32V simulator. In result, we verified that the Light mode and the Ultra Light mode obtains 8%, 27% reduction of a program code size through comparison with the Default mode. The proposed microprocessor was verified all X32V instructions execution at Xilinx FPGA with 33MHz operating frequency,

The Compressed Instruction Set Architecture for the OpenRISC Processor (OpenRISC 프로세서를 위한 압축 명령어 집합 구조)

  • Kim, Dae-Hwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.10
    • /
    • pp.11-23
    • /
    • 2012
  • To achieve efficient code size reduction, this paper proposes a new compressed instruction set architecture for the OpenRISC architecture. The new instructions and their corresponding formats are designed by the profiling information of the existing instruction usage. New 16-bit instructions and 32-bit instructions are proposed to compressed the existing 32-bit instructions and instruction sequences, respectively. The proposed instructions can be classified into three types. The first is the new 16-bit instructions for the frequent normal 32-bit instructions such as add, load, store, branch, and jump instructions. The second type is the new 32-bit instructions for the consecutive two load instructions, two store instructions, and 32-bit data mov instructions. Finally, two new 32-bit instructions are proposed to compress function prolog and epilog code, respectively. OpenRISC hardware decoder is extended to support the new instructions. Experiments show that the efficiency of code size reduction improves by an average of 30.4% when compared to the OR1200 instruction set architecture without loss of execution performance.

A Study on software performance acceleration for improving real time constraint of a VLIW type Drone FCC (VLIW (Very Long Instruction Word) 형식 드론 FCC(Flight Control Computer)의 실시간성 개선을 위한 소프트웨어 성능 가속화 연구)

  • Cho, Doo-San
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.20 no.1
    • /
    • pp.1-7
    • /
    • 2017
  • Most conventional processors execute program instructions in a sequential manner. On the other hand, VLIW processor can execute multiple instructions at the same time. It exploits instruction level parallelism to improve system performance. To that end, program code should be rearranged to VLIW instruction format by a compiler. The compiler determine an optimal execution order of instructions of a program code. This instruction ordering is also called instruction scheduling. The scheduling is an algorithm that decides the execution order for instruction codes in loop parts of a program so that the instruction level parallelism can be maximized. In this research, we apply an existing scheduling algorithm to a VLIW FCC and describe analysis results to further improve its performance. And, we present a solution to solve some limitation of the existing scheduling technique. By using our solution, FCC's performance can be improved upto 32% compared to the existing scheduling only setting.