• Title/Summary/Keyword: Compiler based technique

Search Result 28, Processing Time 0.032 seconds

DEX2C: Translation of Dalvik Bytecodes into C Code and its Interface in a Dalvik VM

  • Kim, Minseong;Han, Youngsun;Cho, Myeongjin;Park, Chanhyun;Kim, Seon Wook
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.3
    • /
    • pp.169-172
    • /
    • 2015
  • Dalvik is a virtual machine (VM) that is designed to run Java-based Android applications. A trace-based just-in-time (JIT) compilation technique is currently employed to improve performance of the Dalvik VM. However, due to runtime compilation overhead, the trace-based JIT compiler provides only a few simple optimizations. Moreover, because each trace contains only a few instructions, the trace-based JIT compiler inherently exploits fewer optimization and parallelization opportunities than a method-based JIT compiler that compiles method-by-method. So we propose a new method-based JIT compiler, named DEX2C, in order to improve performance by finding more opportunities for both optimization and parallelization in Android applications. We employ C code as an intermediate product in order to find more optimization opportunities by using the GNU C Compiler (GCC), and we will detect parallelism by using the Intel C/C++ parallel compiler and the AESOP compiler in our future work. In this paper, we introduce our DEX2C compiler, which dynamically translates Dalvik bytecodes (DEX) into C code with method granularity. We also describe a new method-based JIT interface in the Dalvik VM for the DEX2C compiler. Our experiment results show that our compiler and its interface achieve significant performance improvement by up to 15.2 times and 3.7 times on average, in Element Benchmark, and up to 2.8 times for FFT in Smartbench.

Compiler triggered C level error check (컴파일러에 의한 C레벨 에러 체크)

  • Zheng, Zhiwen;Youn, Jong-Hee M.;Lee, Jong-Won;Paek, Yun-Heung
    • The KIPS Transactions:PartA
    • /
    • v.18A no.3
    • /
    • pp.109-114
    • /
    • 2011
  • We describe a technique for automatically proving compiler optimizations sound, meaning that their transformations are always semantics-preserving. As is well known, IR (Intermediate Representation) optimization is an important step in a compiler backend. But unfortunately, it is difficult to detect and debug the IR optimization errors for compiler developers. So, we introduce a C level error check system for detecting the correctness of these IR transformation techniques. In our system, we first create an IR-to-C converter to translate IR to C code before and after each compiler optimization phase, respectively, since our technique is based on the Memory Comparison-based Clone(MeCC) detector which is a tool of detecting semantic equivalency in C level. MeCC accepts only C codes as its input and it uses a path-sensitive semantic-based static analyzer to estimate the memory states at exit point of each procedure, and compares memory states to determine whether the procedures are equal or not. But MeCC cannot guarantee two semantic-equivalency codes always have 100% similarity or two codes with different semantics does not get the result of 100% similarity. To increase the reliability of the results, we describe a technique which comprises how to generate C codes in IR-to-C transformation phase and how to send the optimization information to MeCC to avoid the occurrence of these unexpected problems. Our methodology is illustrated by three familiar optimizations, dead code elimination, instruction scheduling and common sub-expression elimination and our experimental results show that the C level error check system is highly reliable.

A Rule-based Optimal Placement of Scaling Shifts in Floating-point to Fixed-point Conversion for a Fixed-point Processor

  • Park, Sang-Hyun;Cho, Doo-San;Kim, Tae-Song;Paek, Yun-Heung
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.6 no.4
    • /
    • pp.234-239
    • /
    • 2006
  • In the past decade, several tools have been developed to automate the floating-point to fixed-point conversion for DSP systems. In the conversion process, a number of scaling shifts are introduced, and they inevitably alter the original code sequence. Recently, we have observed that a compiler can often be adversely affected by this alteration, and consequently fails to generate efficient machine code for its target processor. In this paper, we present an optimization technique that safely migrates scaling shifts to other places within the code so that the compiler can produce better-quality code. We consider our technique to be safe in that it does not introduce new overflows, yet preserving the original SQNR. The experiments on a commercial fixed-point DSP processor exhibit that our technique is effective enough to achieve tangible improvement on code size and speed for a set of benchmarks.

Generalized Command Mode Finite Element Method Toolbox in CEMTool

  • Ahn, Choon-Ki;Kwon, Wook-Hyun
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1349-1353
    • /
    • 2003
  • CEMTool is a command style design and analyzing package for scientific and technological algorithm and a matrix based computation language. In this paper, we present a compiler based approach to the implementation of the command mode generalized PDE solver in CEMTool. In contrast to the existing MATLAB PDE Toolbox, our proposed FEM package can deal with the combination of the reserved words such as "laplace" and "convect". Also, we can assign the border lines and the boundary conditions in a very easy way. With the introduction of the lexical analyzer and the parser, our FEM toolbox can handle the general boundary condition and the various PDEs represented by the combination of equations. That is why we need not classify PDE as elliptic, hyperbolic, parabolic equations. Consequently, with our new FEM toolbox, we can overcome some disadvantages of the existing MATLAB PDE Toolbox.

  • PDF

Efficient Use of On-chip Memory through Profile-Driven Array Reorganization

  • Cho, Doosan;Youn, Jonghee
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.6 no.6
    • /
    • pp.345-359
    • /
    • 2011
  • In high performance embedded systems, the use of multiple on-chip memories is an essential architectural feature for exploiting inherent parallelism in multimedia applications. This feature allows multiple data accesses to be executed in parallel. However, it remains difficult to effectively exploit of multiple on-chip memories. The successful use of this architecture strongly depends on how to efficiently detect and exploit memory parallelism in target applications. In this paper, we propose a technique based on a linear array access descriptor [1], which is generated from profiled data, to detect and exploit memory parallelism. The proposed technique tackles an array reorganization problem to maximize memory parallelism in multimedia applications. We present preliminary experiments applying the proposed technique onto a representative coarse grained reconfigurable array processor (CGRA) with multimedia kernel codes. Our experimental results demonstrate that our technique optimizes data placement by putting independent data on separate storage. The results exhibit 9.8% higher performance on average compared to the existing method.

A Technique to Apply Inlining for Code Obfuscation based on Genetic Algorithm (유전 알고리즘에 기반한 코드 난독화를 위한 인라인 적용 기법)

  • Kim, Jung-Il;Lee, Eun-Joo
    • Journal of Information Technology Services
    • /
    • v.10 no.3
    • /
    • pp.167-177
    • /
    • 2011
  • Code obfuscation is a technique that protects the abstract data contained in a program from malicious reverse engineering and various obfuscation methods have been proposed for obfuscating intention. As the abstract data of control flow about programs is important to clearly understand whole program, many control flow obfuscation transformations have been introduced. Generally, inlining is a compiler optimization which improves the performance of programs by reducing the overhead of calling invocation. In code obfuscation, inlining is used to protect the abstract data of control flow. In this paper, we define new control flow complexity metric based on entropy theory and N-Scope metric, and then apply genetic algorithm to obtain optimal inlining results, based on the defined metric.

Ensuring Securityllable Real-Time Systems by Static Program Analysis (원격 실시간 제어 시스템을 위한 정적 프로그램 분석에 의한 보안 기법)

  • Lim Sung-Soo;Lee Kihwal
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.3 s.35
    • /
    • pp.75-88
    • /
    • 2005
  • This paper proposes a method to ensure security attacks caused by insertion of malicious codes in a real-time control system that can be accessed through networks. The proposed technique is for dynamically upgradable real-time software through networks and based on a static program analysis technique to detect the malicious uses of memory access statements. Validation results are shown using a remotely upgradable real-time control system equipped with a modified compiler where the proposed security technique is applied.

  • PDF

Compiler Optimization for Parallelism and Locality Improvement (병렬성 및 지역성 증진을 위한 컴파일러 최적화)

  • Jim, Jin-Mi;Byeon, Seok-U;Pyo, Chang-U;Lee, Man-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.2
    • /
    • pp.307-314
    • /
    • 1999
  • In this paper, we study on the transformation technique of sequential programs for the purpose of 'exploiting parallelism' and 'improving locality'. Based on the analysis of loop procedures of sequential programs with the factor of dependency and locality, two transformation techniques of loop distribution and loop fusion are applied to them. Transformed programs can be easily expressed as a parallel program wit thread notation, having coarse-grain parallelism and improved locality. This means that those transformations can be useful tools for optimizing and automatic-parallelizing compiler construction. Application of those techniques to SPEC95 on a solaris machine with four SPARC processors show an improvement of execution time.

  • PDF

A Design of Two-Dimensional Wavelet Transformer Using SDRAM (SDRAM을 이용한 이차원 웨이블렛 변환기의 설계)

  • 이선영;홍석일;조경순
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.351-355
    • /
    • 1999
  • The amount of data stored, processed and transmitted in the multi-media systems has been growing very fast, especially for the image data. For example, it takes 0.75Mbytes to store 512 12 pixels of 24-bit color image. A video signal with 30 frames per second will require 22.5Mbytes of storage space. To solve this problem, we need a good image compression technique. Recently, many researches on the image compression technique based on the wavelet transform are being pursued to overcome the problems of traditional JPEG. This paper describes the architecture and design of two-dimensional wavelet transform circuit. To keep the sire of the circuit small, we tried to minimize the internal storage space by using external SDRAM. This circuit was designed in Verilog-HDL, synthesized using Design Compiler and verified using Verilog-XL.

  • PDF

Performance Improvement of ASIP Assembly Simulator Using Compiled Simulation Technique (컴파일방식 시뮬레이션 기법을 이용한 ASIP 어셈블리 시뮬레이터의 성능 향상)

  • 김호영;김탁곤
    • Journal of the Korea Society for Simulation
    • /
    • v.12 no.2
    • /
    • pp.45-53
    • /
    • 2003
  • This paper presents a retargetable compiled assembly simulation technique for fast ASIP(application specific instruction processor) simulation. Development of ASIP which satisfies design requirements in various fields of applications such as telecommunication, wireless network, etc. needs formal design methodology and high-performance relevant software environments such as compiler and simulator In this paper, we employ the architecture description language(ADL) named ${HiXR}^2$ to automatically synthesize an instruction-level compiled assembly simulator. A compiled simulation has benefit of time efficiency to interpretive one because it performs instruction fetching and decoding at compile time. Especially, in case of assembly simulation, instruction decoding is usually a time-consuming job(string operation), so the compiled simulation of assembly simulation is more efficient than that of binary simulation. Performance improvement of the compiled assembly simulation based on ${HiXR}^2$ is exemplified with an ARM9 architecture and a CalmRISC32 architecture. As a result, the compiled simulation is about 150 times faster than interpretive one.

  • PDF