• Title/Summary/Keyword: Benchmarks

Search Result 379, Processing Time 0.031 seconds

The Performance Analysis of Distributed Reorder Buffer Superscalar Processor using Queuing Model (큐잉 모델을 이용한 분산된 리오더 버퍼 수퍼스칼라 프로세서의 성능분석)

  • Baek, Seock-Kyun;Jung, Jin-Ha;Shin, Kwang-Sik;Choi, Sang-Bang
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.1087-1090
    • /
    • 2005
  • In all contemporary superscalar processors, the result repositories are implemented as the Reorder Buffer(ROB) slots. In such designs, the ROB is a large multi-ported structure. There are several approaches for reducing the ROB complexity in processors. The one technique relies on a distributed implementation that spreads the centralized ROB structure across the function units(FUs). Each distributed component sized to match the FU workload and with one write port and one read port on each component. We are using M/M/1 Queuing theory to determine the number of entries in each ROB component that the performance of processor depends on. Our schemes are evaluated using the simulation of CPU2000 benchmarks.

  • PDF

Design of Accurate and Efficient Indirect Branch Predictor (정확하고 효율적인 간접 분기 예측기 설계)

  • Paik, Kyoung-Ho;Kim, Eun-Sung
    • Proceedings of the IEEK Conference
    • /
    • 2005.11a
    • /
    • pp.1083-1086
    • /
    • 2005
  • Modern superscalar processors exploit Instruction Level Parallelism to achieve high performance by speculative techniques such as branch prediction. The indirect branch target prediction is very difficult compared to the prediction of direct branch target and branch direction, since it has dynamically polymorphic target. We present a accurate and hardware-efficient indirect branch target predictor. It can reduce the tags which has to be stored in the Indirect Branch Target Cache without a sacrifice of the prediction accuracy. We implement the proposed scheme on SimpleScalar and show the efficiency running SPEC95 benchmarks.

  • PDF

Enhancing GPU Performance by Efficient Hardware-Based and Hybrid L1 Data Cache Bypassing

  • Huangfu, Yijie;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.2
    • /
    • pp.69-77
    • /
    • 2017
  • Recent GPUs have adopted cache memory to benefit general-purpose GPU (GPGPU) programs. However, unlike CPU programs, GPGPU programs typically have considerably less temporal/spatial locality. Moreover, the L1 data cache is used by many threads that access a data size typically considerably larger than the L1 cache, making it critical to bypass L1 data cache intelligently to enhance GPU cache performance. In this paper, we examine GPU cache access behavior and propose a simple hardware-based GPU cache bypassing method that can be applied to GPU applications without recompiling programs. Moreover, we introduce a hybrid method that integrates static profiling information and hardware-based bypassing to further enhance performance. Our experimental results reveal that hardware-based cache bypassing can boost performance for most benchmarks, and the hybrid method can achieve performance comparable to state-of-the-art compiler-based bypassing with considerably less profiling cost.

Two-Level Scratchpad Memory Architectures to Achieve Time Predictability and High Performance

  • Liu, Yu;Zhang, Wei
    • Journal of Computing Science and Engineering
    • /
    • v.8 no.4
    • /
    • pp.215-227
    • /
    • 2014
  • In modern computer architectures, caches are widely used to shorten the gap between processor speed and memory access time. However, caches are time-unpredictable, and thus can significantly increase the complexity of worst-case execution time (WCET) analysis, which is crucial for real-time systems. This paper proposes a time-predictable two-level scratchpad-based architecture and an ILP-based static memory objects assignment algorithm to support real-time computing. Moreover, to exploit the load/store latencies that are known statically in this architecture, we study a Scratch-pad Sensitive Scheduling method to further improve the performance. Our experimental results indicate that the performance and energy consumption of the two-level scratchpad-based architecture are superior to the similar cache based architecture for most of the benchmarks we studied.

Human Motion Recognition Based on Spatio-temporal Convolutional Neural Network

  • Hu, Zeyuan;Park, Sange-yun;Lee, Eung-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.8
    • /
    • pp.977-985
    • /
    • 2020
  • Aiming at the problem of complex feature extraction and low accuracy in human action recognition, this paper proposed a network structure combining batch normalization algorithm with GoogLeNet network model. Applying Batch Normalization idea in the field of image classification to action recognition field, it improved the algorithm by normalizing the network input training sample by mini-batch. For convolutional network, RGB image was the spatial input, and stacked optical flows was the temporal input. Then, it fused the spatio-temporal networks to get the final action recognition result. It trained and evaluated the architecture on the standard video actions benchmarks of UCF101 and HMDB51, which achieved the accuracy of 93.42% and 67.82%. The results show that the improved convolutional neural network has a significant improvement in improving the recognition rate and has obvious advantages in action recognition.

A Study of CPLD Low Power Algorithm using Reduce Glitch Power Consumption (글리치 전력소모 감소를 이용한 CPLD 저전력 알고리즘 연구)

  • Hur, Hwa Ra
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.5 no.3
    • /
    • pp.69-75
    • /
    • 2009
  • In this paper, we proposed CPLD low power algorithm using reduce glitch power consumption. Proposed algorithm generated a feasible cluster by circuit partition considering the CLB condition within CPLD. Glitch removal process using delay buffer insertion method for feasible cluster. Also, glitch removal process using same method between feasible clusters. The proposed method is examined by using benchmarks in SIS, it compared power consumption to a CLB-based CPLD low power technology mapping algorithm for trade-off and a low power circuit design using selective glitch removal method. The experiments results show reduction in the power consumption by 15% comparing with that of and 6% comparing with that of.

A Survey on Number Sense Performance of Sixth Graders (초등학교 6학년 학생의 수감각 실태 조사)

  • Sun, Chun-Hwa;Jeon, Pyung-Kook
    • The Mathematical Education
    • /
    • v.44 no.4 s.111
    • /
    • pp.587-602
    • /
    • 2005
  • The primary purpose of this study was to investigate how number sense performance of sixth graders was and what every character of five components of number sense possessed by sixth graders was. For the this purpose, Two kinds of studies were conducted : a descriptive study by pencil-and-paper tests(Basic Test, Number Sense Test) and a clinical study by interviews. The conclusions drawn from the results obtained in the this study were as follows : First, students were highly scored in Basic Test but not highly scored equally in Number Sense Test. Second, students hardly used the benchmarks and lacked consideration of the reasonableness about computation results. Interview results were that students' notion about the meaning, and the greater - than and less - than relations for fractions was weak and students tended to not use number sense but apply standard algorithm and compute numbers in the question without thinking.

  • PDF

A Wide-Window Superscalar Microprocessor Profiling Performance Model Using Multiple Branch Prediction (대형 윈도우에서 다중 분기 예측법을 이용하는 수퍼스칼라 프로세서의 프로화일링 성능 모델)

  • Lee, Jong-Bok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.7
    • /
    • pp.1443-1449
    • /
    • 2009
  • This paper presents a profiling model of a wide-window superscalar microprocessor using multiple branch prediction. The key idea is to apply statistical profiling technique to the superscalar microprocessor with a wide instruction window and a multiple branch predictor. The statistical profiling data are used to obtain a synthetical instruction trace, and the consecutive multiple branch prediction rates are utilized for running trace-driven simulation on the synthesized instruction trace. We describe our design and evaluate it with the SPEC 2000 integer benchmarks. Our performance model can achieve accuracy of 8.5 % on the average.

The plasma polymerized polymer thin films for application to organic thin film transistor (유기박막 트랜지스터로의 응용을 위한 플라즈마 중합 고분자 박막)

  • Lim, Jae-Sung;Shin, Paik-Kyun;Lee, Boong-Joo;You, Do-Hyun;Park, Se-Geun;Lee, El-Hang
    • Proceedings of the KIEE Conference
    • /
    • 2009.07a
    • /
    • pp.1353_1354
    • /
    • 2009
  • The OTFT devices had inverted staggered structures of Au/pentacene/ppMMA/ITO on PET substrate. The overall device performances of the flexible devices such as the operating voltage, the field effect mobility, the on/off ratio and the off current are somewhat worse than those of devices fabricated on glass substrates. Pentacene/ppMMA OTFT benchmarks (mobility, sub-threshold slope, on/off ratio) were comparable to that of solution cast PMMA, but below average when compared to other polymer gate dielectrics. However, threshold and drive voltages were among the lowest reported for a polymer gate dielectric, and surpassed only by ultra-thin SAM gate dielectrics.

  • PDF

Analytical investigation of the surface effects on nonlocal vibration behavior of nanosize curved beams

  • Ebrahimi, Farzad;Daman, Mohsen
    • Advances in nano research
    • /
    • v.5 no.1
    • /
    • pp.35-47
    • /
    • 2017
  • This paper deals with free vibration analysis of nanosize rings and arches with consideration of surface effects. The Gurtin-Murdach model is employed for incorporating the surface effect parameters including surface density, while the small scale effect is taken into consideration based on nonlocal elasticity theory of Eringen. An analytical Navier solution is presented to solve the governing equations of motions. Comparison between results of the present work and those available in the literature shows the accuracy of this method. It is explicitly shown that the vibration characteristics of the curved nanosize beams are significantly influenced by the surface density effects. Moreover, it is shown that by increasing the nonlocal parameter, the influence of surface density reduce to zero, and the natural frequency reaches its classical value. Numerical results are presented to serve as benchmarks for future analyses of nanosize rings and arches.