• Title/Summary/Keyword: superscalar architecture

Search Result 39, Processing Time 0.029 seconds

Superscalar RISC Microprocessor Architecture with enhanced Multimedia Instructions (멀티미디어 명령어를 강화한 수퍼스칼라 RISC 마이크로프로세서 구조)

  • 이용환;문병인;이용석
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.931-934
    • /
    • 1999
  • For applications in multimedia to which genuine RISC microprocessors are not suitably applicable, a new generation of fast and flexible microprocessors is required. In this paper, as a technique of integrating DSP functionality in a general RISC processor, a RISC that can execute DSP extension instructions is developed to improve the performance of multimedia application execution. This processor can execute DSP instructions in parallel with the execution of ALU instructions for efficient and fast execution. In addition, the execution ability of integer instructions is improved by enhancing the RISC core itself.

  • PDF

A Processor Architecture for 802.11 Wireless LAN Environment (802.11 Wireless LAN 환경에 적합한 프로세서 구조)

  • 전성재;홍인표;이용주;이용석;정진우
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.550-552
    • /
    • 2004
  • 최근 휴대폰, PDA, 노트북 등의 모바일 제품의 인기에 따라 모바일에 대한 소비자의 관심이 증대되고 있으며, 대형 네트워크 장비보다 소형의 개인 휴대용의 모바일 제품의 성장세가 두드러지고 있다. 이러한 추세에 따라 무선랜에 대한 관심도 증대되고 있다. 본 논문에서는 기존의 ARM 프로세서를 기반으로 802..11 무선랜 환경에 맞는 네트워크 프로세서 구조에 대한 연구를 수행하였다. 그 결과 전송과 수신이 빈번하게 동시에 일어나는 무선랜 환경에서는 multi-threading을 처리할 수 있는 프로세서가 구조(SMT)가 Superscalar 구조에 비해 높은 성능 향상 폭을 보여주었다

  • PDF

A Study on Processor Monitoring for Integration Test of Flight Control Computer equipped with A Modern Processor (최신 프로세서 탑재 비행제어 컴퓨터의 통합시험을 위한 프로세서 모니터링 연구)

  • Lee, Cheol;Kim, Jae-Cheol;Cho, In-Jae
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.14 no.10
    • /
    • pp.1081-1087
    • /
    • 2008
  • This paper describes limitations and solutions of the existing processor-monitoring concept for a military supersonics aircraft Flight Control Computer (FLCC) equipped with modern architecture processor to perform the system integration test. Safecritical FLCC integration test, which requires automatic test for thousands of test cases and real-time input/output test condition generation, depends on the processor-monitoring device called Processor Interface (PI). The PI, which relies upon on the FLCC processor's external address and data-bus data, has some limitations due to multi-fetching capability of the modern sophisticated military processors, like C6000's VLIW (Very-Long Instruction Word) architecture and PowerPC's Superscalar architecture. Several techniques for limitations were developed and proper monitoring approach was presented for modem processor-adopted FLCC system integration test.

A Performance Study on Many-core Processor Architectures with SPEC Benchmark Programs (SPEC 벤치마크 프로그램에 대한 매니코어 프로세서의 성능 연구)

  • Lee, Jongbok
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.2
    • /
    • pp.252-256
    • /
    • 2013
  • In order to overcome the complexity and performance limit problems of superscalar processors, the multi-core architecture has been prevalent recently. Usually, the number of cores mostly used for the multi-core processor architecture ranges from 2 to 16. However in the near future, more than 32-cores are likely to be utilized, which is called as many-core processor architecture. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the 32 to 1024 many-core architectures extensively. For 1024-cores, the average performance scores 15.7 IPC, but the performance increase rate is saturated.

Exploiting Parallelism in the Block Encryption Algorithms RC6 and Rijndael (블록 암호화 알고리즘 RC6 및 Rijndael에서의 병렬성 활용)

  • 정용화;정교일;손승원
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.11 no.2
    • /
    • pp.3-12
    • /
    • 2001
  • Currently, the superscalar architecture dominates todays microprocessor marketplase. As, more transistors are integrated onto larger die, however, an on-chip multiprocessor is regarded as a promising alternative to the superscalar microprocessor. This paper examines the behavior of the next generation block encryption algorithms RC6 and Rijndael on the on-chip multiprocessing microprocessor. Based on the simulation results by using a program-driven simulator, the on-chip multiprocessor can exploit thread level parallelism effectively and overcome the limitation of instruction level parallelism in the next generation block encryption algorithms.

Design of a high-performance floating-point unit adopting a new divide/square root implementation (새로운 제산/제곱근기를 내장한 고성능 부동 소수점 유닛의 설계)

  • Lee, Tae-Young;Lee, Sung-Youn;Hong, In-Pyo;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.37 no.12
    • /
    • pp.79-90
    • /
    • 2000
  • In this paper, a high-performance floating point unit, which is suitable for high-performance superscalar microprocessors and supports IEEE 754 standard, is designed. Floating-point arithmetic unit (AU) supports all denormalized number processing through hardware, while eliminating the additional delay time due to the denormalized number processing by proposing the proposed gradual underflow prediction (GUP) scheme. Contrary to the existing fixed-radix implementations, floating-point divide/square root unit adopts a new architecture which determines variable length quotient bits per cycle. The new architecture is superior to the SRT implementations in terms of performance and design complexity. Moreover, sophisticated exception prediction scheme enables precise exception to be implemented with ease on various superscalar microprocessors, and removes the stall cycles in division. Designed floating-point AU and divide/square root unit are integrated with and instruction decoder, register file, memory model and multiplier to form a floating-point unit, and its function and performance is verified.

  • PDF

Efficient CHAM-Like Structures on General-Purpose Processors with Changing Order of Operations (연산 순서 변경에 따른 범용 프로세서에서 효율적인 CHAM-like 구조)

  • Myoungsu Shin;Seonkyu Kim;Hanbeom Shin;Insung Kim;Sunyeop Kim;Donggeun Kwon;Deukjo Hong;Jaechul Sung;Seokhie Hong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.4
    • /
    • pp.629-639
    • /
    • 2024
  • CHAM is designed with an emphasis on encryption speed, considering that in the ISO/IEC standard block cipher operation mode, encryption functions are used more often than decryption functions. In the superscalar architecture of modern general-purpose processors, different ordering of operations can lead to different processing speeds, even if the computation configuration is the same. In this paper, we analyze the implementation efficiency and security of CHAM-like structures, which rearrange the order of operations in the ARX-based block cipher CHAM, for single-block and parallel implementations in a general-purpose processor environment. The proposed structures are at least 9.3% and at most 56.4%efficient in terms of encryption speed. The security analysis evaluates the resistance of the CHAM-like structures to differential and linear attacks. In terms of security margin, the difference is 3.4% for differential attacks and 6.8%for linear attacks, indicating that the security strength is similar compared to the efficiency difference. These results can be utilized in the design of ARX-based block ciphers.

Simulation of YUV-Aware Instructions for High-Performance, Low-Power Embedded Video Processors (고성능, 저전력 임베디드 비디오 프로세서를 위한 YUV 인식 명령어의 시뮬레이션)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.5
    • /
    • pp.252-259
    • /
    • 2007
  • With the rapid development of multimedia applications and wireless communication networks, consumer demand for video-over-wireless capability on mobile computing systems is growing rapidly. In this regard, this paper introduces YUV-aware instructions that enhance the performance and efficiency in the processing of color image and video. Traditional multimedia extensions (e.g., MMX, SSE, VIS, and AltiVec) depend solely on generic subword parallelism whereas the proposed YUV-aware instructions support parallel operations on two-packed 16-bit YUV (6-bit Y, 5-bits U, V) values in a 32-bit datapath architecture, providing greater concurrency and efficiency for color image and video processing. Moreover, the ability to reduce data format size reduces system cost. Experiment results on a representative dynamically scheduled embedded superscalar processor show that YUV-aware instructions achieve an average speedup of 3.9x over the baseline superscalar performance. This is in contrast to MMX (a representative Intel#s multimedia extension), which achieves a speedup of only 2.1x over the same baseline superscalar processor. In addition, YUV-aware instructions outperform MMX instructions in energy reduction (75.8% reduction with YUV-aware instructions, but only 54.8% reduction with MMX instructions over the baseline).

A Prefetch Architecture with Efficient Branch Prediction for a 64-bit 4-way Superscalar Microprocessor (64비트 4-way 수퍼스칼라 마이크로프로세서의 효율적인 분기 예측을 수행하는 프리페치 구조)

  • 문상국;문병인;이용환;이용석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.11B
    • /
    • pp.1939-1947
    • /
    • 2000
  • 본 논문에서는 명령어의 효율적인 페치를 위해 분기 타겟 주소 전체를 사용하지 않고 캐쉬 메모리(cache memory) 내의 적은 비트 수로 인덱싱 하여 한 클럭 사이클 안에 최대 4개의 명령어를 다음 파이프라인으로 보내줄 수 있는 방법을 제시한다. 본 프리페치 유닛은 크게 나누어 3개의 영역으로 나눌 수 있는데, 분기에 관련하여 미리 부분적으로 명령어를 디코드 하는 프리디코드(predecode) 블록, 타겟 주소(NTA : Next Target Address) 테이블 영역을 추가시킨 명령어 캐쉬(instruction cache) 블록, 전체 유닛을 제어하고 가상 주소를 관리하는 프리페치(prefetch) 블록으로 나누어진다. 사용된 명령어들은 SPARC(Scalable Processor ARChitecture) V9에 기준 하였고 구현은 Verilog-HDL(Hardwave Description Language)을 사용하여 기능 수준으로 기술되고 검증되었다. 구현된 프리페치 유닛은 명령어 흐름에 분기가 존재하더라도 단일 사이클 안에 4개까지의 명령어들을 정확한 예측 하에 다음 파이프라인으로 보내줄 수 있다. 또한 NTA를 사용한 방법은 같은 수의 레지스터 비트를 사용하였을 때 BTB(Branch Target Buffer)를 사용하는 방법과 비교하여 2배정도 많은 개수의 분기 명령 주소를 저장할 수 있는 장점이 있다.

  • PDF

A Study of Trace-driven Simulation for Multi-core Processor Architectures (멀티코어 프로세서의 명령어 자취형 모의실험에 대한 연구)

  • Lee, Jong-Bok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.3
    • /
    • pp.9-13
    • /
    • 2012
  • In order to overcome the complexity and power problems of superscalar processors, the multi-core architecture has been prevalent recently. Although the execution-driven simulation is wide spread, the trace-driven simulation has speed advantages over the execution-driven simulation. We present a methodology to simulate multi-core architecture using trace-driven simulator. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed for the cores ranging from 2 to 16 extensively. As a result, the 16-core processor resulted in 4.1 IPC and 13.3 times speed up over single-core processor on the average.