• Title/Summary/Keyword: Processor Trace

Search Result 39, Processing Time 0.02 seconds

A Study of Trace-driven Simulation for Multi-core Processor Architectures (멀티코어 프로세서의 명령어 자취형 모의실험에 대한 연구)

  • Lee, Jong-Bok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.9-13
    • /
    • 2012
  • In order to overcome the complexity and power problems of superscalar processors, the multi-core architecture has been prevalent recently. Although the execution-driven simulation is wide spread, the trace-driven simulation has speed advantages over the execution-driven simulation. We present a methodology to simulate multi-core architecture using trace-driven simulator. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the cores ranging from 2 to 16 extensively. As a result, the 16-core processor resulted in 4.1 IPC and 13.3 times speed up over single-core processor on the average.

Low Power Trace Cache for Embedded Processor

  • Moon Je-Gil;Jeong Ha-Young;Lee Yong-Surk
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.204-208
    • /
    • 2004
  • Embedded business will be expanded market more and more since customers seek more wearable and ubiquitous systems. Cellular telephones, PDAs, notebooks and portable multimedia devices could bring higher microprocessor revenues and more rewarding improvements in performance and functions. Increasing battery capacity is still creeping along the roadmap. Until a small practical fuel cell becomes available, microprocessor developers must come up with power-reduction methods. According to MPR 2003, the instruction and data caches of ARM920T processor consume $44\%$ of total processor power. The rest of it is split into the power consumptions of the integer core, memory management units, bus interface unit and other essential CPU circuitry. And the relationships among CPU, peripherals and caches may change in the future. The processor working on higher operating frequency will exact larger cache RAM and consume more energy. In this paper, we propose advanced low power trace cache which caches traces of the dynamic instruction stream, and reduces cache access times. And we evaluate the performance of the trace cache and estimate the power of the trace cache, which is compared with conventional cache.

  • PDF

A Study on Power Dissipation of The Multicore Processor (멀티코어 프로세서의 전력 소비에 대한 연구)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.2
    • /
    • pp.251-256
    • /
    • 2017
  • Recently, multicore processor system is widely adopted not only in general purpose computers but also in embedded systems and mobile devices in order to improve performance. Since the power dissipation issue of multicore processor system is very significant, it must be estimated accurately in the early design stage. In this paper, a fast power analysis tool for a high performance multicore processor based on the trace-driven simulator has been developed. To achieve it, the power dissipation of each hardware unit per core are added. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed to estimate the average power dissipation per instruction.

A Performance Study on Many-core Processor Architectures with SPEC Benchmark Programs (SPEC 벤치마크 프로그램에 대한 매니코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.2
    • /
    • pp.252-256
    • /
    • 2013
  • In order to overcome the complexity and performance limit problems of superscalar processors, the multi-core architecture has been prevalent recently. Usually, the number of cores mostly used for the multi-core processor architecture ranges from 2 to 16. However in the near future, more than 32-cores are likely to be utilized, which is called as many-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the 32 to 1024 many-core architectures extensively. For 1024-cores, the average performance scores 15.7 IPC, but the performance increase rate is saturated.

A Study on Dynamic Code Analysis Method using 2nd Generation PT(Processor Trace) (2세대 PT(Processor Trace)를 이용한 동적 코드분석 방법 연구)

  • Kim, Hyuncheol
    • Convergence Security Journal
    • /
    • v.19 no.1
    • /
    • pp.97-101
    • /
    • 2019
  • If the operating system's core file contains an Intel PT, the debugger can not only check the program state at the time of the crash, but can also reconfigure the control flow that caused the crash. We can also extend the execution trace scope to the entire system to debug kernel panics and other system hangs. The second-generation PT, the WinIPT library, includes an Intel PT driver with additional code to run process and core-specific traces through the IOCTL and registry mechanisms provided by Windows 10 (RS5). In other words, the PT trace information, which was limited access only by the first generation PT, can be executed by process and core by the IOCTL and registry mechanism provided by the operating system in the second generation PT. In this paper, we compare and describe methods for collecting, storing, decoding and detecting malicious codes of data packets in a window environment using 1/2 generation PT.

Development of a Post-Processor for Three-Dimensional Forging Analysis (3차원 단조해석용 후처리기 개발)

  • 정완진;최석우
    • Transactions of Materials Processing
    • /
    • v.12 no.6
    • /
    • pp.542-549
    • /
    • 2003
  • Three-dimensional forging analysis becomes an inevitable tool to make design process more reliable and more producible. In this study, in order to make the investigation for three-dimensional forging analysis more conveniently and accurately, a new post processor was developed. For post-processing of multi-stage forging simulation, efficient data structure was proposed and applied by using STL. New file architecture was developed to handle successive and huge data efficiently, common in three-dimensional forging analysis. Since sectioning and flow tracing plays an important role in the investigation of analysis result, we developed an algorithm suitable for 4-node and 10-node tetrahedron. This flow tracing algorithm can trace and reverse-trace flow through remeshing. Developed program shows good performance and functionality. Especially, a big size problem can be handled easily due to proposed data structure and file architecture.

The DRAM Effects on The Performance of Multicore Processors (멀티코어 프로세서의 성능에 대한 DRAM의 영향)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.3
    • /
    • pp.203-208
    • /
    • 2017
  • Recently, the importance of DRAM is very significant in multicore processors which are widely used in computers, laptops, tablet PCs, and mobile devices. To keep up with this, both industry and academia have actively studied various types of future DRAMs. Therefore, accurate DRAM model is requisite when evaluating the multicore processor performance. In this paper, a multicore processor trace-driven simulator which can couple with the cycle-accurate DRAM simulator has been developed. Using SPEC 2000 benchmarks as input, the effect of cycle-accurate DDR3 model on the multicore processor performance has been evaluated.

A Performance Study of Multi-core Out-of-Order Superscalar Processor Architecture (멀티코어 비순차 수퍼스칼라 프로세서의 성능 연구)

  • Lee, Jong-Bok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.10
    • /
    • pp.1502-1507
    • /
    • 2012
  • In order to overcome the hardware complexity and power consumption problems, recently the multi-core architecture has been prevalent. For hardware simplicity, usually RISC processor is adopted as the unit core processor. However, if the performance of unit core processor is enhanced, the overall performance of the multi-core processor architecture can be further increased. In this paper, out-of-order superscalar processor is utilized for the multi-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the out-of-order superscalar cores between 2 and 16 extensively. As a result, the 16-core out-of-order superscalar processor for the window size of 16 resulted in 17.4 times speed up over the single-core out-of-order superscalar processor, and 50 times speed up over the single core RISC processor. When compared for the same number of cores on the average, the multi-core out-of-order superscalar processor performance achieved 3.2 times speed up over the multi-core RISC processor and 1.6 times speed up over the multi-core in-order superscalar processor.

A study of extended processor trace decoder structure for malicious code detection (악성코드 검출을 위한 확장된 프로세서 트레이스 디코더 구조 연구)

  • Kang, Seungae;Kim, Youngsoo;Kim, Jonghyun;Kim, Hyuncheol
    • Convergence Security Journal
    • /
    • v.18 no.5_1
    • /
    • pp.19-24
    • /
    • 2018
  • For a long time now, general-purpose processors have provided dedicated hardware / software tracing modules to provide developers with tools to fix bugs. A hardware tracer generates its enormous data into a log that is used for both performance analysis and debugging. Processor Trace (PT) is a new hardware-based tracing feature for Intel CPUs that traces branches executing on the CPU, which allows the reconstruction of the control flow of all executed code with minimal labor. Hardware tracer has been integrated into the operating system, which allows tight integration with its profiling and debugging mechanisms. However, in the Windows environment, existing studies related to PT focused on decoding only one flow in sequence. In this paper, we propose an extended PT decoder structure that provides basic data for real-time trace and malicious code detection using the functions provided by PT in Windows environment.

  • PDF

Heterogeneous Operating Systems Integrated Trace Method for Real-Time Virtualization Environment (다중 코어 기반의 실시간 가상화 시스템을 위한 이종 운영체제 통합 성능 분석 방법에 관한 연구)

  • Kyong, Joohyun;Han, In-Kyu;Lim, Sung-Soo
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.10 no.4
    • /
    • pp.233-239
    • /
    • 2015
  • This paper describes a method that is integrated trace for real-time virtualization environment. This method has solved the problem that the performance trace may not be able to analyze integrated method between heterogeneous operating systems which is consists of real-time operating systems and general-purpose operating system. In order to solve this problem, we have attempted to reuse the performance analysis function in general-purpose operating system, thereby real-time operating systems can be analyzed along with general-operating system. Furthermore, we have implemented a prototype based on ARM Cortex-A15 dual-core processor. By using this integrated trace method, real-time system developers can be improved productivity and reliability of results on real-time virtualization environment.