Search | Korea Science

다중프로세서 컴퓨터시스템을 위한 버스중재 프로토콜의 성능 분석 및 비교

김병량
- Proceedings of the Korea Society for Simulation Conference
- /
- 1992.10a
- /
- pp.2-2
- /
- 1992
최근 여러 분야에서 컴퓨터의 용도가 확산되고 더 높은 computing power에 대한 요구가 증가함에 따라, 컴퓨터의 성능을 향상시키기 위하여 프로세서의 고속화와 함께 시스템 구조의 개선을 위한 많은 연구가 진행되고 있다. 한 시스템내에 여러 개의 CPU들이 존재하는 다중프로세서 시스템(multiprocessor system) 구조를 가진 슈퍼미니급 중형 컴퓨터들은 상호연결망으로서 버스(bus) 방식을 많이 채택하고 있다. 버스 구조는 하드웨어가 간단하여 구현이 용이하지만, 여러 개의 시스템 지원들(프로세서들, 기억장치 모듈들 및 입출력 모듈들)이 버스를 공유하기 때문에 경합으로 인한 지연 시간이 발생하게 된다. 이러한 지연 시간으로 인한 성능 저하를 개선하는 방법으로는 버스 수의 증가와 최적 통제 프로토콜의 설계가 있다. 본 연구에서는 여러 개의 버스를 가진 다중프로세서 시스템에서 4가지 대표적인 버스 중재 프로토콜들에 대해 성능을 분석, 비교하여 최적 프로토콜을 제시하고자 한다. 이러한 대규모 하드웨어에 의하여 구현되는 시스템에서 주요 설계 요소들에 따른 시스템 성능 분석과 비교는 설계 단계에서 필수적인 과정이다. 그러나 하드웨어를 만들어서 분석하는 방법은 시간과 비용이 많이 소요되기 때문에 소프트웨어 시뮬레이션 방법이 널리 사용되고 있다. 본 연구팀에서는 시뮬레이션 전용언어인 SLAM II를 이용하여 다중프로세서 시스템의 시뮬레이터를 개발하고, 버스중재 프로토콜(bus arbitration protocol)을 용이하게 변경할 수 있도록 하여 각각의 성능을 비교하였다. 이 연구에서 비교된 프로토콜들은 고정-우선순위 방식(fixed-priority scheme), FIFO(first-in first-out) 방식, 라운드-로빈 방식(round-robin scheme), 및 회전-우선순위 방식(rotating-priority scheme) 등이다. 실험은 시스템의 주요 요소들인 프로세서와 기억장치 모듈 및 버스의 수들을 변경시킴으로써 다양한 시스템 환경에 대한 분석을 시도하였다. 작업 부하가 되는 기하장치 액세스 요구간 시간가격(inter-memory access request time interval)은 필요에 따라서 고정값 또는 확률 분포함수를 사용하였다. 특히, 실행될 프로그램의 특성에 따라 각 프로토콜의 성능이 다르게 나타날 수 있음을 검증하였으며, 기억장치의 지역성(memory locality)에 대한 프로토콜들의 성능도 비교하였다.
PDF

Processor Design Technique for Low-Temperature Filter Cache (필터 캐쉬의 저온도 유지를 위한 프로세서 설계 기법)

Choi, Hong-Jun;Yang, Na-Ra;Lee, Jeong-A;Kim, Jong-Myon;Kim, Cheol-Hong
- Journal of the Korea Society of Computer and Information
- /
- v.15 no.1
- /
- pp.1-12
- /
- 2010
Recently, processor performance has been improved dramatically. Unfortunately, as the process technology scales down, energy consumption in a processor increases significantly whereas the processor performance continues to improve. Moreover, peak temperature in the processor increases dramatically due to the increased power density, resulting in serious thermal problem. For this reason, performance, energy consumption and thermal problem should be considered together when designing up-to-date processors. This paper proposes three modified filter cache schemes to alleviate the thermal problem in the filter cache, which is one of the most energy-efficient design techniques in the hierarchical memory systems : Bypass Filter Cache (BFC), Duplicated Filter Cache (DFC) and Partitioned Filter Cache (PFC). BFC scheme enables the direct access to the L1 cache when the temperature on the filter cache exceeds the threshold, leading to reduced temperature on the filter cache. DFC scheme lowers temperature on the filter cache by appending an additional filter cache to the existing filter cache. The filter cache for PFC scheme is composed of two half-size filter caches to lower the temperature on the filter cache by reducing the access frequency. According to our simulations using Wattch and Hotspot, the proposed partitioned filter cache shows the lowest peak temperature on the filter cache, leading to higher reliability in the processor.
https://doi.org/10.9708/jksci.2010.15.1.001 인용 PDF KSCI

Operating Systems Research for the Embedded Multi-core Platforms (임베디드 멀티코어 플랫폼을 위한 운영체제 연구)

Hong, Cheol-Ho;Yoo, Chuck
- Proceedings of the Korean Information Science Society Conference
- /
- 2008.06b
- /
- pp.327-330
- /
- 2008
최근 무어의 법칙이 깨짐에 따라 멀티코어 프로세서의 활용이 늘어나고 있으며 이는 임베디드 환경에서도 보편화되었다. 이러한 멀티코어 환경에 기존에 멀티프로세서용으로 개발된 AMP 또는 SMP 구조의 운영체제를 적용시키게 된다면 멀티코어의 장점을 살리기 어렵다. 본 논문에서는 기존 운영체제 구조에 대한 분석을 통해 멀티코어용으로 적합한 운영체제 구조가 가상 머신 구조라는 것을 보이고 있으며 산업에 활용할 수 있는 멀티코어용 가상 머신 모니터의 설계를 제공하고 있다.
PDF

A Study on Power Dissipation of The Microprocessor Based on Trace-Driven Simulation (명령어 자취형 모의실험을 기반으로 하는 마이크로프로세서의 전력 소비에 대한 연구)

Lee, Jongbok
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.16 no.5
- /
- pp.191-196
- /
- 2016
Recently, power dissipation is a very significant issue not only in embedded systems and mobile devices but also in high-end modern processors. Especially, by the prevalent use of smart phones and tablet PCs, low power consumption of microprocessors is requisite. In this paper, a fast power measurement tool for a high performance microprocessor based on the trace-driven simulator has been developed. The power model of the microprocessor consists of complex combinational circuits, array structures, and CAM structures. Using SPEC 2000 benchmarks as input, the trace-driven simulation has been performed to estimate the average power dissipation of each program.
https://doi.org/10.7236/JIIBC.2016.16.5.191 인용 PDF KSCI

TDX-10 호처리 소프트웨어 구조

An, Ji-Hwan;Jeong, Dong-Su
- ETRI Journal
- /
- v.14 no.4
- /
- pp.56-66
- /
- 1992
대용량의 가입자나 중계선을 수용하는 TDX-10 교환기는 모든 가입자의 호(call) 요구를 실시간으로 처리하여야 한다. 이러한 가입자의 호 요구를 효과적으로 처리하기 위해서 호처리 소프트웨어는 물리적으로는 여러 프로세스에 분산되고, 논리적으로는 여러 프로세서에 분산되는 분산구조로 모듈화 하였다. 프로세서는 기능분담과 부하분산을 위한 여러 프로세서로 구성되며, 프로세스는 영구적인 '모듈-프로세스와' 호 수행 중에 가입자선이나 중계선에 생성되어 호를 제어하는 '호-프로세스'로 구성하였다. 기능분담과 부하분산의 모듈화 구조는 기능의 설계와 실현이 용이하고, 새로운 기능의 추가나 기존 기능의 개선이 용이하도록 하였다. 본 논문에서는 프로그램의 생산성을 증대시키고 기능 확장과 개선이 용이한 호처리 소프트웨어 구조와 모듈화에 대하여 기술하였다.
PDF

Performance Analysis of Caching Instructions on SVLIW Processor and VLIW Processor (SVLIW 프로세서와 VLIW 프로세서의 명령어 캐싱에 따른 성능 분석)

Ji, Sung-Hyun;Park, No-Kwang;Kim, Suk-Il
- Journal of IKEEE
- /
- v.1 no.1 s.1
- /
- pp.101-110
- /
- 1997
SVLIW processor architectures can resolve resource collisions and data dependencies between the instructions while scheduling VLIW instructions at run-time. As a result, long NOP word instructions can be removed from the object code produced for the processor. Thus, the occurrence of cache misses on the SVLIW processor would be lesser than that on the same cache size VLIW processor. Less frequent cache misses on the SVLIW processor would incur less frequent memory access, and thus, the total execution cycles to complete an application would be shortened compared with cases on the VLIW processor. Such a feature eventually compromises effects of longer instruction pipeline stages than those of the VLIW processor. In this paper, we formulate and compare two execution cycle models of the two architectures. A simulation results show that the longer memory access cycles when cache miss occurs, the total execution cycles of SVLIW processor would be shorter than those of VLIW processor.
PDF

Design of an Optimal RSA Crypto-processor for Embedded Systems (내장형 시스템을 위한 최적화된 RSA 암호화 프로세서 설계)

허석원;김문경;이용석
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.29 no.4A
- /
- pp.447-457
- /
- 2004
This paper proposes a RSA crypto-processor for embedded systems. The architecture of the RSA crypto-processor should be used relying on Big Montgomery algorithm, and is supported by configurable bit size. The RSA crypto-processor includes a RSA control signal generator, an optimal Big Montgomery processor(adder, multiplier). We use diverse arithmetic unit (adder, multiplier) algorithm. After we compared the various results, we selected the optimal arithmetic unit which can be connected with ARM core-processor. The RSA crypto-processor was implemented with Verilog HDL with top-down methodology, and it was verified by C language and Cadence Verilog-XL. The verified models were synthesized with a Hynix 0.25${\mu}{\textrm}{m}$, CMOS standard cell library while using Synopsys Design Compiler. The RSA crypto-processor can operate at a clock speed of 51 MHz in this worst case conditions of 2.7V, 10$0^{\circ}C$ and has about 36,639 gates.
PDF KSCI

A Study on Highly Performance Multimedia Processor Architecture (고효율 멀티미디어 프로세서 아키텍쳐에 관한 연구)

박춘명
- Proceedings of the Korea Multimedia Society Conference
- /
- 2001.06a
- /
- pp.12-15
- /
- 2001
본 논문에서는 고효율 멀티미디어 프로세서 아키텍쳐에 대해 논의하였다. 제안한 멀티미디어 프로세서 아케텍쳐는 제안한 방법은 기존의 멀티미디어 프로세서의 단점들인 각종 텍스트, 사운드, 비디오 등의 미디어 들을 1개의 칩 속에서 처리할 수 있도록 하였으며, 또한 멀티미디어의 특성인 상호대화식 처리도 가능하게 하였다. 특히, 완전한 그래프에 기반을 둔 네트워크를 지향하므로 소프트웨어 없이 메모리 맵의 노드어드레싱을 가능하게 하였으며, 데이터 형태에 의존하는 완전한 재구성이 가능하며 동기/비동기를 갖는 시간 공유와 공간 공유 처리가 가능하다. 또한, 연속적임과 동적인 매체 데이터의 버스 충돌을 방지할 수 있으며 지역적임과 전반적인 공유 메모리 구조로부터의 버스 충돌도 방지할 수 있으며, 또한 가상현실과 흔합현실에도 적용할 수 있으리라 사료된다.
PDF

Design of an AE32000-compatible 32-bit EISC Microprocessor (AE32000 호환 32-비트 EISC 마이크로프로세서 설계)

곽기영;박진국;이두영;이범근;정연모
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.10c
- /
- pp.700-702
- /
- 2002
본 논문은 16-비트 고정된 명령어 형식을 갖는 32-비트 EISC(Extendable Instruction Set Computer) 코어 구현에 대하여 기술하였다. EISC구조는 코드 밀도가 높은 확장 오퍼랜드(operand) 형식을 사용하여 메모리 크기를 줄일 수 있으므로 ASIC 구현시 저전력 시스템 및 소형화된 임베디드 시스템을 위한 프로세서 구현을 가능하게 한다. 설계된 프로세서는 AE32000 명령어 셋과 호환이 가능하도록 설계되었으며 5단 파이프라인을 적용하여 프로세서의 성능을 높였다. 또한 BTB(Branch Target Buffer)를 사용하여 분기 지연을 줄여 낮은 CPI(Clock Per Instruction)을 유지하게 하였다.

Design Space Exploration of Many-Core Processors for Mobile Ultrasound Image Signal Processing (모바일 초음파 영상신호처리를 위한 매니코어 프로세서 디자인 공간 탐색)

Choi, Byong-Kook;Kim, Jong-Myon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2011.04a
- /
- pp.183-186
- /
- 2011
본 논문에서는 모바일 초음파(mobile ultrasound) 영상신호의 빔포밍 알고리즘에서 요구되는 고성능 및 저전력을 만족시키는 매니코어 프로세서에 대한 디자인 공간 탐색 방법을 소개한다. 매니코어 프로세서의 디자인 공간 탐색을 위해 매니코어의 각 프로세싱 엘리먼트(Processing Element, PE)당 초음파 영상신호 데이터의 수를 변화시키는 실험을 통해 실행시간, 에너지 효율 및 시스템 면적 효율을 측정하고, 측정된 결과를 바탕으로 최적의 매니코어 프로세서 구조를 선택하였다.
https://doi.org/10.3745/PKIPS.y2011m04a.183 인용 PDF

Search Result 1,042, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)