• Title/Summary/Keyword: Execution-Driven Simulation

Search Result 29, Processing Time 0.025 seconds

A Study of Trace-driven Simulation for Multi-core Processor Architectures (멀티코어 프로세서의 명령어 자취형 모의실험에 대한 연구)

  • Lee, Jong-Bok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.9-13
    • /
    • 2012
  • In order to overcome the complexity and power problems of superscalar processors, the multi-core architecture has been prevalent recently. Although the execution-driven simulation is wide spread, the trace-driven simulation has speed advantages over the execution-driven simulation. We present a methodology to simulate multi-core architecture using trace-driven simulator. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the cores ranging from 2 to 16 extensively. As a result, the 16-core processor resulted in 4.1 IPC and 13.3 times speed up over single-core processor on the average.

A Practical Approach to Incremental Event-driven HDL Simulation (인크리멘탈 이벤트 - 구동 HDL 시뮬레이션에의 실제적 접근법)

  • Yang, Seiyang;Shim, Kyuho
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.3
    • /
    • pp.73-80
    • /
    • 2014
  • In this paper, we propose an incremental simulation method in event-driven HDL simulation to reduce the simulation execution time. In general, the simulation is repeated with a series of design changes. Incremental simulation is an efficient simulation method that shortens the simulation execution time for the following simulation by using the result of previous simulation. We have observed the effectiveness of the proposed approach through the experimentation with multiple real designs.

An Efficient Scheduling Method Taking into Account Resource Usage Patterns on Desktop Grids (데스크탑 그리드에서 자원 사용 경향성을 고려한 효율적인 스케줄링 기법)

  • Hyun Ju-Ho;Lee Sung-Gu;Kim Sang-Cheol;Lee Min-Gu
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.7
    • /
    • pp.429-439
    • /
    • 2006
  • A desktop grid, which is a computing grid composed of idle computing resources in a large network of desktop computers, is a promising platform for compute-intensive distributed computing applications. However, due to reliability and unpredictability of computing resources, effective scheduling of parallel computing applications on such a platform is a difficult problem. This paper proposes a new scheduling method aimed at reducing the total execution time of a parallel application on a desktop grid. The proposed method is based on utilizing the histories of execution behavior of individual computing nodes in the scheduling algorithm. In order to test out the feasibility of this idea, execution trace data were collected from a set of 40 desktop workstations over a period of seven weeks. Then, based on this data, the execution of several representative parallel applications were simulated using trace-driven simulation. The simulation results showed that the proposed method improves the execution time of the target applications significantly when compared to previous desktop grid scheduling methods. In addition, there were fewer instances of application suspension and failure.

Cloudification of On-Chip Flash Memory for Reconfigurable IoTs using Connected-Instruction Execution (연결기반 명령어 실행을 이용한 재구성 가능한 IoT를 위한 온칩 플래쉬 메모리의 클라우드화)

  • Lee, Dongkyu;Cho, Jeonghun;Park, Daejin
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.14 no.2
    • /
    • pp.103-111
    • /
    • 2019
  • The IoT-driven large-scaled systems consist of connected things with on-chip executable embedded software. These light-weighted embedded things have limited hardware space, especially small size of on-chip flash memory. In addition, on-chip embedded software in flash memory is not easy to update in runtime to equip with latest services in IoT-driven applications. It is becoming important to develop light-weighted IoT devices with various software in the limited on-chip flash memory. The remote instruction execution in cloud via IoT connectivity enables to provide high performance software execution with unlimited software instruction in cloud and low-power streaming of instruction execution in IoT edge devices. In this paper, we propose a Cloud-IoT asymmetric structure for providing high performance instruction execution in cloud, still low power code executable thing in light-weighted IoT edge environment using remote instruction execution. We propose a simulated approach to determine efficient partitioning of software runtime in cloud and IoT edge. We evaluated the instruction cloudification using remote instruction by determining the execution time by the proposed structure. The cloud-connected instruction set simulator is newly introduced to emulate the behavior of the processor. Experimental results of the cloud-IoT connected software execution using remote instruction showed the feasibility of cloudification of on-chip code flash memory. The simulation environment for cloud-connected code execution successfully emulates architectural operations of on-chip flash memory in cloud so that the various software services in IoT can be accelerated and performed in low-power by cloudification of remote instruction execution. The execution time of the program is reduced by 50% and the memory space is reduced by 24% when the cloud-connected code execution is used.

A Study On Statistical Simulation for Asymmetric Multi-Core Processor Architectures (비대칭적 멀티코어 프로세서의 통계적 모의실험에 관한 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.157-163
    • /
    • 2016
  • If trace-driven or execution-driven simulation is used for the performance analysis of asymmetric multi-core processors, excessive time and much disk space are necessary. In this paper, statistical simulations are performed for asymmetric multi-core processors with various hardware configurations. For the experiment, SPEC 2000 benchmark programs are used for profiling and synthesis, which is supplied as input for the simulation of asymmetric multi-core processors. As a result, the performance of asymmetric multi-core processor obtained by statistical simulation is comparable to that of the trace-driven simulation with a tremendous reduction in the simulation time.

A Study on Demand-Driven Dataflow Computer Architecture based on Packet Communication (Packet Communication에 의한 Demand-Driven Dataflow 컴퓨터 구조에 관한 연구)

  • Rhee, Sang Burm;Ryu, Keun Ho;Park, Kyu Tae
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.23 no.2
    • /
    • pp.225-235
    • /
    • 1986
  • Dataflow computers exhibit a high degree of parallelism which can not be obtained easily with the conventional von-Neumann architecture. Since many instructions are ready for execution simultaneously, concurrency can easily by achieved by the multiple processors modified the data-flow machine. In paper, we describe an improved dataflow architecture which is designed by adding the demand propagation network to the MIT dataflow machine. and show the improved performance by the execution time and the efficiency of processing elements through simulation with the time acceleration method.

  • PDF

Performance Analysis of Multicore Out-of-Order Superscalar Processor with Multiple Basic Block Execution (다중블럭을 실행하는 멀티코어 비순차 수퍼스칼라 프로세서의 성능 분석)

  • Lee, Jong Bok
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.2
    • /
    • pp.198-205
    • /
    • 2013
  • In this paper, the performance of multicore processor architecture is analyzed which utilizes out-of-order superscalar processor core using multiple basic block execution. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the out-of-order superscalar processor with the window size from 32 to 64 and the number of cores between 1 and 16, exploiting multiple basic block execution from 1 to 4 extensively. As a result, the multicore out-of-order superscalar processor with 4 basic block execution achieves 22.0 % average performance increase over the same architecture with the single basic block execution.

Agent-based Collaborative Simulation Architecture for Distributed Manufacturing Systems (분산 생산 시스템을 위한 에이전트 기반의 협업 시뮬레이션 체계)

  • Cha Yeong Pil;Jeong Mu Yeong
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2003.05a
    • /
    • pp.808-813
    • /
    • 2003
  • Maintaining agility and responsiveness m designing and manufacturing activities are the key issues for manufacturing companies to cope with global competition. Distributed design and control systems are regarded as an efficient solution for agility and responsiveness. However, distributed nature of a manufacturing system complicates production activities such as design, simulation, scheduling, and execution control. Especially, existing simulation systems have limited external integration capabilities, which make it difficult to implement complex control mechanisms for the distributed manufacturing systems. Moreover, integration and coupling of heterogeneous components and models are commonly required for the simulation of complex distributed systems. In this paper, a collaborative and adaptive simulation architecture is proposed as an open framework for simulation and analysis of the distributed manufacturing enterprises. By incorporating agents with their distributed characteristics of autonomy, intelligence, and goal-driven behavior, the proposed agent-based simulation architecture can be easily adapted to support the agile and distributed manufacturing systems. The architecture supports the coordination and cooperation relations, and provides a communication middleware among the participants in simulation.

  • PDF

Enhanced Bitmap Lookup Algorithm for High-Speed Routers (고속 라우터를 위한 향상된 비트맵 룩업 알고리즘)

  • Lee, Kang-woo;Ahn, Jong-suk
    • The KIPS Transactions:PartA
    • /
    • v.11A no.2
    • /
    • pp.129-142
    • /
    • 2004
  • As the Internet gets faster, the demand for high-speed routers that are capable of forwarding more than giga bits of data per second keeps increasing. In the previous research, Bitmap Trie algorithm was developed to rapidly execute LPM(longest prefix matching) process which is Well known as the Severe performance bottleneck. In this paper, we introduce a novel algorithm that drastically enhanced the performance of Bitmap. Trie algorithm by applying three techniques. First, a new table called the Count Table was devised. Owing to this table, we successfully eliminated shift operations that was the main cause of performance degradation in Bitmap Trie algorithm. Second, memory utilization was improved by removing redundant forwarding information from the Transfer Table. Lastly. the range of prefix lookup was diversified to optimize data accesses. On the other hand, the processing delays were classified into three categories according to their causes. They were, then, measured through the execution-driven simulation that provides the higher quality of the results than any other simulation techniques. We tried to assure the reliability of the experimental results by comparing with those that collected from the real system. Finally the Enhanced Bitmap Trie algorithm reduced 82% of time spent in previous algorithm.

32 Bit RISC Core modeling using SystemC

  • 최홍미;박성모
    • Proceedings of the IEEK Conference
    • /
    • 2002.06b
    • /
    • pp.325-328
    • /
    • 2002
  • In this paper, we present a SystemC model of a 32-Bit RISC core wi)ich is based on the ARMTTDMI architecture. The RISC core model was first modeled in C for architecture verification and then refined down to a level that allows concurrent behavior lot hardware timing using the SystcmC class library. It was driven in timed functional level that uses handshake protocol. It was compiled using standard C++ compiler. The functional simulation result was verified by comparing the contents of memory, the result of execution with the result from the ARMulator of ADS(Arm Developer Suite).

  • PDF