통합 검색 | Korea Science

수퍼스칼라 디지털 신호처리 프로세서에 대한 통계적 모의실험 (Statistical Simulation for Superscalar DSP Processors)

이종복
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2005년도 추계종합학술대회
- /
- pp.1217-1220
- /
- 2005
In this paper, statistical simulation is applied to a superscalar digital signal processor architecture using DSP kernel and DSP application benchmarks. As a result, the performance of a digital signal processor with several microarchitecture configurations can be estimated with the relative error of 3.7 ${\backslash}%$ on the average.
PDF

슈퍼스칼라 프로세서에서 명령어 이슈 길이를 고려한 값 예측기의 성능분석 (Performance Analysis of Value Predictor considering instruction issue width in Superscalar processor)

전병찬;김혁진
- 한국컴퓨터산업학회논문지
- /
- 제7권3호
- /
- pp.171-178
- /
- 2006
슈퍼스칼라 프로세서에서 명령어 이슈 길이 값 예측방식은 명령의 결과 값을 미리 예측하고, 그 이후에 데이터 종속관계가 이는 명령들에게 값을 조기에 공급하므로써 이들 명령들을 모험적으로 실행하여 성능을 향상시키는 방식이다. ILP 프로세서는 명령어 수준 병렬성의 성능향상을 위하여 값을 미리 예측하여 병렬로 이슈하고 수행한다[4]. 본 논문에서는 이를 수행하기 위한 값 예측기의 명령어 이슈 길이(4,8,16)의 성능분석을 위한 예측률, 예측정확도, 성능향상 등을 측정하여 평가한다. 실험결과 8이슈의 성능향상이 높음을 보였다.
PDF

병렬 알고리즘의 가속화를 위한 GP-GPU의 Thread할당 기법 (Thread Distribution Method of GP-GPU for Accelerating Parallel Algorithms)

이관호;김치용
- 전기전자학회논문지
- /
- 제21권1호
- /
- pp.92-95
- /
- 2017
본 논문에서는 적은 면적의 GP-GPU에서 성능을 향상시키기 위한 방법을 제안한다. 본 논문에서는 superscalar와 같이 과도하게 스케줄링 복잡성을 증가시키지 않는 대신 단순한 코어의 수를 늘려 성능을 극대화 시키는 방법을 제안한다. GP-GPU를 구성하는 Stream Processor의 구조를 단순화한다. 또한, Warp Schedule에서 thread 할당을 어플리케이션에 적합한 방법을 개발하여 성능을 개선한다. 성능을 검증하는 방안으로 neural network의 한 분야인 딥러닝에 대한 스레드 할당방식을 제안한다. Neural Network 알고리즘의 경우 Intel CPU 대비 90%에서 ARM Cortex-A15 4 core 대비 98% 성능 향상을 확인할 수 있었다.
https://doi.org/10.7471/ikeee.2017.21.1.92 인용 PDF KSCI

Mobile Multimedia 지원을 위한 Embedded Processor 구조 설계 (Design of Embedded Processor Architecture Applicable to Mobile Multimedia)

이호석;한진호;배영환;조한진
- 대한전자공학회논문지SD
- /
- 제41권5호
- /
- pp.71-80
- /
- 2004
본 논문은 mobile platform에서 사용될 Multimedia 적용을 위한 embedded Processor의 기본 구조 연구에 관한 내용으로 MPEG4 응용에 적합한 processor의 기본 구조 그리고 mobile platform에 적용될 수 있는 energy efficiency를 고려한 구조설계를 주 내용으로 하고 있다. multimedia 응용 embedded processor의 기본 구현 구조 요소인 processor data path architecture(pipeline, branch prediction, multiple issue superscalar, function unit number)의 기본 구조 설정과 cache hierarchy와 그 구성의 적합한 예상구조를 설정하기 위해 본 논문에서는 multimedia 응용 프로그램인 MPEG4를 processor simulator의 test bench로 사용하여 다양한 구조에 대한 simulation을 수행하였다. 그리고 mobile platform 적용에 적합한 구조인지에 대한 문제를 energy efficiency관점에서 고찰하여 적용 가능한 기본 processor 구조를 설정하였다. 그리고 본 논문에서 제안된 기본 구조 연구는 mobile platform에 바로 적용이 가능하며 더 나아가 특정 응용 프로그램에 최적의 성능을 발휘할 수 있는 자동화 설계기반환경에서의 configurable processor 설계에서 그 기본 processor 구조로 사용될 수 있다.
PDF KSCI

On-Chip Multiprocessor with Simultaneous Multithreading

Park, Kyoung;Choi, Sung-Hoon;Chung, Yong-Wha;Hahn, Woo-Jong;Yoon, Suk-Han
- ETRI Journal
- /
- 제22권4호
- /
- pp.13-24
- /
- 2000
As more transistors are integrated onto bigger die, an on-chip multiprocessor will become a promising alternative to the superscalar microprocessor that dominates today's microprocessor marketplace. This paper describes key parts of a new on-chip multiprocessor, called Raptor, which is composed of four 2-way superscalar processor cores and one graphic co-processor. To obtain performance characteristics of Raptor, a program-driven simulator and its programming environment were developed. The simulation results showed that Raptor can exploit thread level parallelism effectively and offer a promising architecture for future on-chip multi-processor designs.
PDF

슈퍼스칼라 프로세서 시뮬레이터의 생성을 위한 Attributed AND-OR 그래프 (Attributed AND-OR Graph for Synthesis of Superscalar Processor Simulator)

Jun Kyoung Kim;Tag Gon Kim
- 한국시뮬레이션학회:학술대회논문집
- /
- 한국시뮬레이션학회 2003년도 춘계학술대회논문집
- /
- pp.73-78
- /
- 2003
This paper proposes the simulator synthesis scheme which is based on the exploration of the total design space in attributed AND-OR graph. Attributed AND-OR graph is a systematic design space representation formalism which enables to represent all the design space by decomposition rule and specialization rule. In addition, attributes attached to the design entity provides flexible modeling. Based on this design space representation scheme, a pruning algorithm which can transform the total design space into sub-design space that satisfies the user requirements is given. We have shown the effectiveness of our framework by (ⅰ) constructing the design space of superscalar processor in attributed AND-OR graph (ⅱ) pruning it to obtain the ARM9 processor architecture. (ⅲ) modeling the components of the architecture and (ⅳ) simulating the ARM9 model.
PDF

개인용 정보 단말장치를 위한 내장형 멀티스레딩 프로세서 구조 (Embedded Multithreading Processor Architecture for Personal Information Devices)

정하영;정원영;이용석
- 대한전자공학회논문지SD
- /
- 제47권9호
- /
- pp.7-13
- /
- 2010
본 논문은 스마트폰, 타블렛 PC와 같은 개인용 정보 단말장치 응용에 적합한 프로세서 구조를 제안한다. 고성능 내장형 프로세서 개발은 아키텍쳐의 변화가 필요하고, 오버헤드가 크기 때문에, 업계에서는 높은 동작 주파수의 고성능 내장형 프로세서의 개발에 전념하고 있다. 고성능 프로세서 구조 중 비순차 슈퍼스칼라(out-of-order superscalar)는 하드웨어 복잡도가 과도하게 증가하며, 그에 비해 성능 향상이 적으므로 내장형 응용에 적합하지 않다. 따라서 하드웨어 복잡도가 낮은 고성능 내장형 프로세서 구조의 개발이 필요하다. 본 논문에서는 스칼라, 슈퍼스칼라, 멀티프로세서 방식에 비하여 복잡도가 낮은 새로운 SMT(Simultaneous Multi-Threading) 구조를 제안한다. 최근의 개인용 정보단말기는 많은 작업을 동시에 수행하기 때문에, SMT나 CMP는 이에 적합한 구조라 할 수 있다. 또한, 시뮬레이션 결과 SMT는 여러 프로세서 구조 중 가장 효율이 높은 프로세서로 보인다.
PDF KSCI

수퍼스칼라 프로세서를 위한 컴파일러에서 조건부 분기의 최적화 (Conditional Branch Optimization in the Compilers for Superscalar Processors)

김명호;최완
- 한국정보처리학회논문지
- /
- 제2권2호
- /
- pp.264-276
- /
- 1995
본 논문에서는 수퍼스칼라 프로세서를 위한 컴파일러에서 조건부 분기 명령을 제 거하는 최적화 기법을 제시하였다. 분기를 제거하는 단계적 방법으로 먼저 대수적 규 칙을 사용하여 분기를 산술식의 형태로 변형하고, 그 식에 대응하는 명령 수순을 Granlund/Kenner의 GSO를 사용하여 완전 탐색한 후 목적 프로세서에서 실행시 최소의 동적 계수를 갖는 명령 수순을 선택하였다. 제안한 분기 최적화 기법을 SuperSPARC 프로세서와 GNU C 컴파일러를 사용하여 실험한 결과 입력 프로그램에서 최적화 패턴 과 대응하는 조건부 분기의 경우 원래의 컴파일러가 생성하는 최적 코드 수순에 비하 여 25% 이상의 추가적인 수행시간 개선 효과를 얻을 수 있었다.
PDF

큐잉 모델을 이용한 분산된 리오더 버퍼 수퍼스칼라 프로세서의 성능분석 (The Performance Analysis of Distributed Reorder Buffer Superscalar Processor using Queuing Model)

백석균;정진하;신광식;최상방
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2005년도 추계종합학술대회
- /
- pp.1087-1090
- /
- 2005
In all contemporary superscalar processors, the result repositories are implemented as the Reorder Buffer(ROB) slots. In such designs, the ROB is a large multi-ported structure. There are several approaches for reducing the ROB complexity in processors. The one technique relies on a distributed implementation that spreads the centralized ROB structure across the function units(FUs). Each distributed component sized to match the FU workload and with one write port and one read port on each component. We are using M/M/1 Queuing theory to determine the number of entries in each ROB component that the performance of processor depends on. Our schemes are evaluated using the simulation of CPU2000 benchmarks.
PDF

슈퍼스칼라 프로세서에서 값 예측을 이용한 모험적 실행의 전력소모 측정 및 분석 (Measurement and Analysis of Power Dissipation of Value Speculation in Superscalar Processors)

이상정;이명근;신화정
- 한국정보과학회논문지:시스템및이론
- /
- 제30권12호
- /
- pp.724-735
- /
- 2003
최근의 고성능 슈퍼스칼라 프로세서에서는 명령어 수준 병렬성(Instruction-Level Parallelism, ILP)의 장애가 되는 명령어 간의 데이타 종속관계를 극복하기 위해 명령의 결과 값을 미리 예측하여 종속 명령들을 모험적으로 실행한다. 이러한 값 예측을 사용한 모험적 실행으로 성능은 향상되나 값 예측 테이블의 빈번한 참조와 갱신으로 부가적인 전력 소모를 요구한다. 본 논문에서는 값 예측으로 인한 성능향상과 부가적인 전력소모 간의 관계를 측정 분석한다. 또한 확신 카운터(confidence counter)를 사용한 값 예측 시도의 조정으로 모험적 실행의 정도를 조절하고, 예측 성공률이 높은 유용한 명령들만을 선택적으로 예측하여 성능을 유지하면서 부가 전력소모를 줄인다. 제안된 방식의 검증을 위해 사이클 수준 시뮬레이터에 전력소모 모델을 결합하여 프로세서의 기능수준 동작뿐만 아니라 프로세서의 전체 전력소모 및 사이클 당 전력소모도 측정할 수 있는 도구를 개발하여 검증한다.
PDF KSCI

검색결과 58건 처리시간 0.028초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)