Search | Korea Science

Enhancing the Performance of Multiple Parallel Applications using Heterogeneous Memory on the Intel's Next-Generation Many-core Processor (인텔 차세대 매니코어 프로세서에서의 다중 병렬 프로그램 성능 향상기법 연구)

Rho, Seungwoo;Kim, Seoyoung;Nam, Dukyun;Park, Geunchul;Kim, Jik-Soo
- Journal of KIISE
- /
- v.44 no.9
- /
- pp.878-886
- /
- 2017
This paper discusses performance bottlenecks that may occur when executing high-performance computing MPI applications in the Intel's next generation many-core processor called Knights Landing(KNL), as well as effective resource allocation techniques to solve this problem. KNL is composed of a host processor to enable self-booting in addition to an existing accelerator consisting of a many-core processor, and it was released with a new type of on-package memory with improved bandwidth on top of existing DDR4 based memory. We empirically verified an improvement of the execution performance of multiple MPI applications and the overall system utilization ratio by studying a resource allocation method optimized for such new many-core processor architectures.
https://doi.org/10.5626/JOK.2017.44.9.878 인용 KSCI

NTL Analysis of Computer Systems using Hierarchical Model (계층모델을 이용한 컴퓨터 시스템 NTL 분석)

Yoo, Ki Yoon;Ro, Cheul Woo
- Proceedings of the Korea Contents Association Conference
- /
- 2013.05a
- /
- pp.261-262
- /
- 2013
본 논문에서는, 구조 상태모델로 시스템의 고장과 복구를 나타내는 상위계층 모델과, 주어진 구조 상태애서 해당시스템의 처리과정인 도착, 큐잉, 서비스를 나타내는 하위계층모델을 갖는 다중 프로세서 컴퓨터 시스템의 계층모델을 페트리 네트의 확장형인 SRN을 이용하여 개발한다. 컴퓨터 가용도의 한 지표로써 NTL(Normalized Throughput Loss)를 구한다.
PDF

Design and Implementation of Boostrap Loader on Multi Core Operating System (다중코어 운영체제를 위한 부트로더 설계 및 구현)

DongHwi Kim;YeonTaek Park;HaeRam Jung;TaeHoon Bang;YongWan Ju;JunDong Lee
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2023.01a
- /
- pp.1-4
- /
- 2023
운영체제(Operating System)는 사용자의 하드웨어, 시스템 자원(System Resources)을 제어하고 프로그램에 대한 일반적 서비스를 지원하는 시스템 소프트웨어(System Software)이다. 시스템 하드웨어를 관리할 뿐아니라 응용 소프트웨어를 실행하기 위하여 하드웨어 추상화 플랫폼과 공통 시스템 서비스를 제공한다. 최근에는 가상화 기술의 발전에 힘입어 실제 하드웨어가 아닌 가상 머신(HyperVisor) 위에서 실행되기도 한다. 본 연구에서는 다중 코어 프로세서를 타겟으로 한 소규모 운영체제 개발 프로젝트의 일환으로 부트로더를 설계하고 구현하였다. 부팅은 최초 컴퓨터에 전원이 들어온 후 운영체제가 실행할 수 있는 환경을 구축하는데 가장 중요한 역할을 하는 프로그램이며, 이를 잘 활용하면, 임베디드 시스템, IOT 등 다양한 분야에 이용할 수 있다.
PDF

A Parallel Emulation Scheme for Data-Flow Architecture on Loosely Coupled Multiprocessor Systems (이완 결합형 다중 프로세서 시스템을 사용한 데이터 플로우 컴퓨터 구조의 병렬 에뮬레이션에 관 한 연구)

이용두;채수환
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.18 no.12
- /
- pp.1902-1918
- /
- 1993
Parallel architecture based on the von Neumann computation model has a limitation as a massively parallel architecture due to its inherent drawback of architectural features. The data-flow model of computation has a high programmability in software perspective and high scalability in hardware perspective. However, the practical programming and experimentaion of date-flow architectures are hardly available due to the absence of practical data-flow, we present a programming environment for performing the data-flow computation on conventional parallel machines in general, loosely compled multiprocessor system in particular. We build an emulator for tagged token data-flow architecture on the iPSC/2 hypercube, a loosely coupled multiprocessor system. The emulator is a shallow layer of software executing on an iPSC/2 system, and thus makes the iPSC/2 system work as a data-flow architecture from the programmer`s viewpoint. We implement various numerical and non-numerical algorithm in a data-flow assembler language, and then compare the performance of the program with those of the versions of conventional C language, Consequently, We verify the effectiveness of this programming environment based on the emulator in experimenting the data-flow computation on a conventional parallel machine.
PDF

Design and Implementation of Asynchronous Memory for Pipelined Bus (파이프라인 방식의 버스를 위한 비 동기식 주 기억장치의 설계 및 구현)

Hahn, Woo-Jong;Kim, Soo-Won
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.31B no.11
- /
- pp.45-52
- /
- 1994
In recent days low cost, high performance microprocessors have led to construction of medium scale shared memory multiprocessor systems with shared bus. Such multiprocessor systems are heavily influenced by the structures of memory systems and memory systems become more important factor in design space as microprocessors are getting faster. Even though local cache memories are very common for such systems, the latency on access to the shared memory limits throughput and scalability. There have been many researches on the memory structure for multiprocessor systems. In this paper, an asynchronous memory architecture is proposed to utilize the bandwith of system bus effectively as well as to provide flexibility of implementation. The effect of the proposed architecture if shown by simulation. We choose, as our model of the shared bus is HiPi+Bus which is designed by ETRI to meet the requirements of the High-Speed Midrange Computer System. The simulation is done by using Verilog hardware decription language. With this simulation, it is explored that the proposed asynchronous memory architecture keeps the utilization of system bus low enough to provide better throughput and scalibility. The implementation trade-offs are also described in this paper. The asynchronous memory is implemented and tested under the prototype testing environment by using test program. This intensive test has validated the operation of the proposed architecture.
PDF

Design and Simulation of Interconnection Network Based on Topological Combination (위상 결합을 기반으로 한 연결 망 설계 및 시뮬레이션)

장창수;최창훈
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.29 no.6B
- /
- pp.563-574
- /
- 2004
In this paper, we propose a new class of MIN(Multistage Interconnection Network) called Combine MIN which combines static network topology and apimic network topology. Combine U provides multiple paths at a hardware cost lower than that of MIN with unique path property. Combine MIN can be constructed suitable for localized communication by providing the shortcut path and multiple paths inside the processor-memory cluster which has frequent data communications. According to the results of analysis and simulation for performance evaluation, Combine MIN shows higher performance than MINs of the same network size in the highly localized communication Therefore, Combine MIN can be used as an attractive interconnection network for parallel applications with a localized communication pattern in shared-memory multiprocessor systems.
PDF KSCI

Design of Beam Steering n-phase OPTO-ULSI Processor for IIPS (IIPS를 위한 빔 조향 n위상 광 ULSI 프로세서 디자인)

Lee, Chang-Ki;Lim, Hyung-Kyu
- The Journal of the Korea institute of electronic communication sciences
- /
- v.3 no.3
- /
- pp.158-164
- /
- 2008
This project investigates an optimum phase design implementing a 256 phase Opto-ULSI processor for multi-function capable optical networks. The design of an 8 phase processor is already in construction and will provide the initial base for experimentation and characterization. The challenge is to be able to compensate for the non-linearity of the liquid crystal, find an optimum phase, and implement a larger scale Opto-ULSI processor. This research is oriented around the initial development of an 8 phase Opto-ULSI processor that implements a Beam Steering(BS) Opto-ULSI processor(OUP) for integrated intelligent photonic system(IIPS), while investigating the optimal phase characteristics and developing compensation for the non-linearity of liquid crystal.
PDF

Low Complexity Channel Preprocessor for Multiple Antenna Communication Systems (다중 안테나 통신 시스템을 위한 저복잡도 채널 전처리 프로세서)

Hwang, You-Sun;Jang, Soo-Hyun;Han, Chul-Hee;Choi, Sung-Nam;Jung, Yun-Ho
- Journal of Advanced Navigation Technology
- /
- v.15 no.2
- /
- pp.213-220
- /
- 2011
In this paper, the channel preprocessor with an area-efficient architecture is proposed for the MIMO symbol detector which can support four transmit and receive antennas. The proposed channel preprocessor can shrink the channel dimension to reduce the hardware complexity of the MIMO symbol detector. Also, the proposed channel preprocessor is implemented with very low complexity by using QR decomposition (QRD) and log-number system (LNS). By applying QRD and LNS to the nulling matrix calculation block, the numbers of matrix-multiplications and matrix-divisions are decreased and thus the complexity of the proposed channel preprocessor is significantly reduced. The proposed channel preprocessor was designed in a hardware description language (HDL) and synthesized to gate-level circuits using 0.13um CMOS standard cell library. With the proposed channel preprocessor, the number of logic gates for channel preprocessor is reduced by 20.2% compared with the conventional architecture.
https://doi.org/10.12673/jant.2011.15.2.213 인용 PDF KSCI

A Ring-Based Multiprocessor System using a New Snooping Protocol (새로운 스누핑 프로토콜을 사용한 링 구조의 다중 프로세서 시스템)

Jeong, Seong-U;Kim, Hyeong-Ho;Jang, Seong-Tae;Jeon, Ju-Sik
- Journal of KIISE:Computer Systems and Theory
- /
- v.26 no.3
- /
- pp.313-323
- /
- 1999
현재 컴퓨터 시장에서는 버스에 기반한 시스템이 주류를 이루고 있다. 프로세서의 속도가 매우 빠른 속도로 증가하고 있기 때문에 버스는 병목현상을 일으키고 , 버스의 속도는 불완전한 전송선의 한계로 인해서 제한된다. 시스템 연구자들은 버스를 고속의 단방향 지점간 링크(point-to-point link)를 사용해서 대체하려고 하고 있다. 이 논문에서 새로운 링 구조의 시스템(PANDA)을 제안하고,이 시스템에 적합한 스누핑 캐쉬 일관성 프로토콜을 제사한다. 또한 제안된 시스템은 SCI 캐쉬 일관성 프로토콜을 채택하는 시스템의 네트워크 인터페이스를 수정함으로써 쉽게 구현될 수 있는 이점을 지닌다. 확률적 모델링과 program-driven simulator를 이용하여 제안된 시스템과 full map 디렉토리 프로토콜을 사용하는 시스템과 스누핑 프로토콜을 사용하는 슬롯 링 시스템(Express Ring)을 분석하였다. 실험의 결과로 제안된 시스템은 부가적 하드웨어가 필요한 full map 디렉토리 시스템에 비해서 대등한 성능을 지니고 슬롯링 시스템에 비해서는 29%까지의 성능향상을 보인다.

The architecture of a multiprocessor based programmable controller with emphasis on its system bus (다중 프로세서 방식의 프로그램형 제어기의 구조와 시스템 버스)

김종일;권욱현
- 제어로봇시스템학회:학술대회논문집
- /
- 1988.10a
- /
- pp.407-413
- /
- 1988
The architecture of a multiprocessor based programmable controller(MBPC) is presented. It consists of a host processor, processing elements, and Input/Output processors. Some problems in implementing such architecture are also described. To resolve them, we proposed and presented INFOBUS, a system bus for MBPC. The performances of INFOBUS and MBPC are analysed using both analytic models and simulations. Some results from the analysis will be given and validated. In case of 50% of BTI(Block Type Instruction) and 4 processors, the scanning time is shown to be 0.194msec/Kstep with some reasonable assumptions.
PDF

Search Result 281, Processing Time 0.034 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)