• 제목/요약/키워드: Compiler based technique

Search Result 28, Processing Time 0.022 seconds

A Systematic Generation of Register-Reuse Chains (레지스터 재활용 사슬의 체계적 생성)

  • Lee, Hyuk-Jae
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.12
    • /
    • pp.1564-1574
    • /
    • 1999
  • In order to improve the efficiency of optimizing compilers, integration of register allocation and instruction scheduling has been extensively studied. One of the promising integration techniques is register allocation based on register-reuse chains. However, the generation of register-reuse chains in the previous approach was not completely systematic and consequently it creates unnecessarily dependencies that restrict instruction scheduling. This paper proposes a new register allocation technique based on a systematic generation of register-reuse chains. The first phase of the proposed technique is to generate register-reuse chains that are optimal in the sense that no additional dependencies are created. Thus, register allocation can be done without restricting instruction scheduling. For the case when the optimal register-reuse chains require more than available registers, the second phase reduces the number of required registers by merging the register-reuse chains. Chain merging always generates additional dependencies and consequently enforces the execution order of instructions. A heuristic is developed for the second phase in order to reduce additional dependencies created by merging chains. For matrix multiplication program, the number of registers resulting from the first phase is small enough to fit into available registers for most basic blocks. In addition, it is shown that the restriction to instruction scheduling is reduced by the proposed merging heuristic of the second phase.

  • PDF

RDF Query Optimization Technique based on Program Analysis (프로그램 분석을 통한 RDF 질의 최적화 기법)

  • Choi, Nak-Min;Cho, Eun-Sun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.54-62
    • /
    • 2010
  • Semantic Web programming is such an immature area that it is yet based on API calls, and does not provide high productivity in compiler time and sufficient efficiency in runtime. To get over this limitation, some efforts have been devoted on dedicated programming languages for Semantic Web. In this paper, we introduce a sophisticated cashing technique to enhance the runtime efficiency of RDF (Resource Description Framework) processing programs with SPARQL queries. We use static program analysis on those programs to determine what to be cashed, so as to decrease the cash miss ratio. Our method is implemented on programs in 'Jey' language, which is one of the programming languages devised for RDF data processing.

A Bit-level ACSU of High Speed Viterbi Decoder

  • Kim, Min-Woo;Cho, Jun-Dong
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.6 no.4
    • /
    • pp.240-245
    • /
    • 2006
  • Viterbi decoder is composed of BMU(Branch metric Unit), ACSU(Add Compare Select Unit), and SMU(Survivor path Memory Unit). For high speed viterbi decoders, ACSU is the main bottleneck due to the compare-select and feedback operation. Thus, many studies have been advanced to solve the problem. For example, M-step look ahead technique and Minimized method are typical high speed algorithms. In this paper, we designed a bit-level ACSU(K=3, R=1/2, 4bit soft decision) based on those algorithms and switched the matrix product order in the backward direction of Minimized method so as to apply Code-Optimized-Array in order to reduce the area complexity. For experimentation, we synthesized our design by using SYNOPSYS Design compiler, with TSMC 0.18 um library, and verified the timing by using CADENCE verilog-XL.

An Ultra-High Speed 1.7ns Access 1Mb CMOS SRAM macro

  • T.J. Song;E.K. Lim;J.J. Lim;Lee, Y.K.;Kim, M.G.
    • Proceedings of the IEEK Conference
    • /
    • 2002.07c
    • /
    • pp.1559-1562
    • /
    • 2002
  • This paper describes a 0.13um ultra-high speed 1Mb CMOS SRAM macro with 1.7ns access time. It achieves ultra-high speed operation using two novel approaches. First, it uses process insensitive sense amplifier (Double-Equalized Sense Amplifier) which improves voltage offset by about 10 percent. Secondly, it uses new replica-based sense amplifier driver which improves bit- line evaluation time by about 10 percent compared to the conventional technique. The various memory macros can be generated automatically by using a compiler, word-bit size from 64kb to 1 Mb including repairable redundancy circuits.

  • PDF

Low Power Design of Filter Based Face Detection Hardware (필터방식 얼굴검출 하드웨어의 저전력 설계)

  • Kim, Yoon-Gu;Jeong, Yong-Jin
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.6
    • /
    • pp.89-95
    • /
    • 2008
  • In this paper, we designed a low power face detection hardware and analysed its power consumption. The face detection hardware was fabricated using Samsung 0.18um CMOS technology and it can detect multiple face locations from a 2-D image. The hardware is composed of 6 functional modules and 11 internal memories. We introduced two operating modes(SLEEP and ACTIVE) to save power and a clock gating technique was used at two different levels: modules and registers. In additional, we divided an internal memory into several pieces to reduce the energy consumed when accessing memories, and fully utilized low power design option provided in Synopsis Design Compiler. As a result, we could obtain 68% power reduction in ACTIVE mode compared to the original design in which none of the above low power techniques were used.

AIT: A method for operating system kernel function call graph generation with a virtualization technique

  • Jiao, Longlong;Luo, Senlin;Liu, Wangtong;Pan, Limin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.2084-2100
    • /
    • 2020
  • Operating system (OS) kernel function call graphs have been widely used in OS analysis and defense. However, most existing methods and tools for generating function call graphs are designed for application programs, and cannot be used for generating OS kernel function call graphs. This paper proposes a virtualization-based call graph generation method called Acquire in Trap (AIT). When target kernel functions are called, AIT dynamically initiates a system trap with the help of a virtualization technique. It then analyzes and records the calling relationships for trap handling by traversing the kernel stacks and the code space. Our experimental results show that the proposed method is feasible for both Linux and Windows OSs, including 32 and 64-bit versions, with high recall and precision rates. AIT is independent of the source code, compiler and OS kernel architecture, and is a universal method for generating OS kernel function call graphs.

Detecting Security Vulnerabilities in TypeScript Code with Static Taint Analysis (정적 오염 분석을 활용한 타입스크립트 코드의 보안 취약점 탐지)

  • Moon, Taegeun;Kim, Hyoungshick
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.2
    • /
    • pp.263-277
    • /
    • 2021
  • Taint analysis techniques are popularly used to detect web vulnerabilities originating from unverified user input data, such as Cross-Site Scripting (XSS) and SQL Injection, in web applications written in JavaScript. To detect such vulnerabilities, it would be necessary to trace variables affected by user-submitted inputs. However, because of the dynamic nature of JavaScript, it has been a challenging issue to identify those variables without running the web application code. Therefore, most existing taint analysis tools have been developed based on dynamic taint analysis, which requires the overhead of running the target application. In this paper, we propose a novel static taint analysis technique using symbol information obtained from the TypeScript (a superset of JavaScript) compiler to accurately track data flow and detect security vulnerabilities in TypeScript code. Our proposed technique allows developers to annotate variables that can contain unverified user input data, and uses the annotation information to trace variables and data affected by user input data. Since our proposed technique can seamlessly be incorporated into the TypeScript compiler, developers can find vulnerabilities during the development process, unlike existing analysis tools performed as a separate tool. To show the feasibility of the proposed method, we implemented a prototype and evaluated its performance with 8 web applications with known security vulnerabilities. We found that our prototype implementation could detect all known security vulnerabilities correctly.

PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units

  • Misun Yu;Yongin Kwon;Jemin Lee;Jeman Park;Junmo Park;Taeho Kim
    • ETRI Journal
    • /
    • v.45 no.2
    • /
    • pp.318-328
    • /
    • 2023
  • Recently, embedded systems, such as mobile platforms, have multiple processing units that can operate in parallel, such as centralized processing units (CPUs) and neural processing units (NPUs). We can use deep-learning compilers to generate machine code optimized for these embedded systems from a deep neural network (DNN). However, the deep-learning compilers proposed so far generate codes that sequentially execute DNN operators on a single processing unit or parallel codes for graphic processing units (GPUs). In this study, we propose PartitionTuner, an operator scheduler for deep-learning compilers that supports multiple heterogeneous PUs including CPUs and NPUs. PartitionTuner can generate an operator-scheduling plan that uses all available PUs simultaneously to minimize overall DNN inference time. Operator scheduling is based on the analysis of DNN architecture and the performance profiles of individual and group operators measured on heterogeneous processing units. By the experiments for seven DNNs, PartitionTuner generates scheduling plans that perform 5.03% better than a static type-based operator-scheduling technique for SqueezeNet. In addition, PartitionTuner outperforms recent profiling-based operator-scheduling techniques for ResNet50, ResNet18, and SqueezeNet by 7.18%, 5.36%, and 2.73%, respectively.

Lightweight Loop Invariant Code Motion for Java Just-In-Time Compiler on Itanium (Itanium상의 자바 적시 컴파일러를 위한 가벼운 루프 불변 코드 이동)

  • Yu Jun-Min;Choi Hyung-Kyu;Moon Soo-Mook
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.3
    • /
    • pp.215-226
    • /
    • 2005
  • Loop invariant code motion (LICM) optimization includes relatively heavy code analyses, thus being not readily applicable to Java Just-In-Time (JIT) compilation where the JIT compilation time is part of the whole running time. 'Classical' LICM optimization first analyzes the code and constructs both the def-use chains and the use-def chains. which are then used for performing code motions. This paper proposes a light-weight LICM algorithm, which requires only the def-use chains of loop invariant code (without use-def chains) by exploiting the fact that the Java virtual machine is based on a stack machine, hence generating code with simpler patterns. We also propose two techniques that allow more code motions than classical LICM techniques. First, unlike previous JIT techniques that uses LICM only in single-path loops for simplicity, we apply LICM to multi-path loops (natural loops) safely for partially redundant code. Secondly, we move loop-invariant, partially-redundant null pointer check code via predication support in Itanium. The proposed techniques were implemented in a JIT compiler for Itanium processor on ORP (Open Runtime Platform) Java virtual machine of Intel. On SPECjvrn98 benchmarks, the proposed technique increases the JIT compilation overhead by the geometric mean of 1.3%, yet it improves the total running time by the geometric mean of 2.2%.

Detection of an Open-Source Software Module based on Function-level Features (함수 수준 특징정보 기반의 오픈소스 소프트웨어 모듈 탐지)

  • Kim, Dongjin;Cho, Seong-je
    • Journal of KIISE
    • /
    • v.42 no.6
    • /
    • pp.713-722
    • /
    • 2015
  • As open-source software (OSS) becomes more widely used, many users breach the terms in the license agreement of OSS, or reuse a vulnerable OSS module. Therefore, a technique needs to be developed for investigating if a binary program includes an OSS module. In this paper, we propose an efficient technique to detect a particular OSS module in an executable program using its function-level features. The conventional methods are inappropriate for determining whether a module is contained in a specific program because they usually measure the similarity between whole programs. Our technique determines whether an executable program contains a certain OSS module by extracting features such as its function-level instructions, control flow graph, and the structural attributes of a function from both the program and the module, and comparing the similarity of features. In order to demonstrate the efficiency of the proposed technique, we evaluate it in terms of the size of features, detection accuracy, execution overhead, and resilience to compiler optimizations.