통합 검색 | Korea Science

이종 멀티코어 프로세서에서 분할된 공유 LLC가 성능에 미치는 영향 분석 (Analysis on the Performance Impact of Partitioned LLC for Heterogeneous Multicore Processors)

문민구;김철홍
- 한국차세대컴퓨팅학회논문지
- /
- 제15권2호
- /
- pp.39-49
- /
- 2019
컴퓨팅 성능을 향상시키기 위해 다양한 구조적 설계 기법들이 제안되고 있는데 그중에서도 CPU-GPU 융합형 이종 멀티코어 프로세서가 많은 관심을 받고 있다. CPU-GPU 융합형 이종 멀티코어 프로세서는 단일 칩에 CPU와 GPU를 집적하기 때문에 일반적으로 CPU와 GPU가 Last Level Cache(LLC)를 공유하게 된다. LLC 공유는 CPU와 GPU 코어 사이에 심각한 캐쉬 경합이 발생하는 경우 각각의 코어 활용도가 저하되는 문제를 가지고 있다. 본 논문에서는 CPU와 GPU 사이의 캐쉬 경합 문제를 해결하기 위해 단일 LLC를 CPU와 GPU 각각의 공간으로 분할하고, 분할된 공간의 크기 변화가 전체 시스템 성능에 미치는 영향을 분석하고자 한다. 모의실험 결과에 따르면, CPU는 사용하는 LLC 크기가 커질수록 성능이 최대 21%까지 향상되지만 GPU는 사용하는 LLC 크기가 커져도 큰 성능변화를 보이지 않는다. 즉, GPU는 LLC 크기가 감소하더라도 CPU에 비하여 성능이 적게 하락함을 알 수 있다. GPU에서의 LLC 크기 감소에 의한 성능하락이 CPU에서의 LLC 크기 증가에 따른 성능향상보다 훨씬 작기 때문에 실험결과를 기반으로 각각의 코어에 LLC를 분할하여 할당한다면 전체적인 이종 멀티코어 프로세서의 성능을 향상시킬 수 있을 것으로 기대된다. 또한, 이러한 분석을 통해 향후 각 코어의 성능을 최대한 높일 수 있는 메모리 관리기법을 개발한다면 이종 멀티코어 프로세서의 성능을 크게 향상시킬 수 있을 것이다.

다중블럭을 실행하는 멀티코어 비순차 수퍼스칼라 프로세서의 성능 분석 (Performance Analysis of Multicore Out-of-Order Superscalar Processor with Multiple Basic Block Execution)

이종복
- 한국멀티미디어학회논문지
- /
- 제16권2호
- /
- pp.198-205
- /
- 2013
본 논문에서는 다중블럭 실행을 이용하는 멀티코어 비순차 수퍼스칼라 프로세서 아키텍쳐의 성능을 분석하였다. 이것을 위하여 SPEC 2000 벤치마크를 입력으로 하며, 윈도우 크기가 32와 64이고 1개에서 4개의 다중블럭을 실행하는 멀티코어 비순차 수퍼스칼라 프로세서에 대하여 1 코어에서 16 코어까지 광범위한 모의실험을 수행하였다. 모의실험 결과, 4개의 다중블럭을 실행하는 멀티코어 비순차 수퍼스칼라 프로세서는 같은 사양에서 단일 블럭을 실행할 때보다 평균 22.0%의 성능 향상을 가져왔다.
https://doi.org/10.9717/kmms.2013.16.2.198 인용 PDF KSCI

A Multithreaded Implementation of HEVC Intra Prediction Algorithm for a Photovoltaic Monitoring System

Choi, Yung-Ho;Ahn, Hyung-Keun
- Transactions on Electrical and Electronic Materials
- /
- 제13권5호
- /
- pp.256-261
- /
- 2012
Recently, many photovoltaic systems (PV systems) including solar parks and PV farms have been built to prepare for the post fossil fuel era. To investigate the degradation process of the PV systems and thus, efficiently operate PV systems, there is a need to visually monitor PV systems in the range of infrared ray through the Internet. For efficient visual monitoring, this paper explores a multithreaded implementation of a recently developed HEVC standard whose compression efficiency is almost two times higher than H.264. For an efficient parallel implementation under a meshbased 64 multicore system, this work takes into account various design choices which can solve potential problems of a two-dimensional interconnects-based 64 multicore system. These problems may have not occurred in a small-scale multicore system based on a simple bus network. Through extensive evaluation, this paper shows that, for an efficient multithreaded implementation of HEVC intra prediction in a mesh-based multicore system, much effort needs to be made to optimize communications among processing cores. Thus, this work provides three design choices regarding communications, i.e., main thread core location, cache home policy, and maximum coding unit size. These design choices are shown to improve the overall parallel performance of the HEVC intra prediction algorithm by up to 42%, achieving a 7 times higher speed-up.
https://doi.org/10.4313/TEEM.2012.13.5.256 인용 PDF KSCI

캐쉬 용량 효과에 대한 멀티코어 프로세서의 성능 연구 (Performance Analysis of Multicore Processor Architectures Based On Cache Size Effects)

이종복
- 한국인터넷방송통신학회논문지
- /
- 제12권6호
- /
- pp.175-180
- /
- 2012
최근에 이르러, 수퍼스칼라 프로세서의 하드웨어 복잡도와 성능 한계의 문제를 극복하기 위하여 멀티코어 프로세서가 각종 컴퓨터 시스템에 상용화되어 널리 이용되고 있다. 이 때, 멀티코어 프로세서의 성능에 큰 영향을 미치는 것은 명령어 캐쉬와 데이터 캐쉬의 구성 방법과 용량이다. 본 논문에서는 캐쉬의 구조와 용량이 멀티코어 프로세서의 성능에 미치는 영향을 분석하기 위하여, 다양한 캐쉬의 구조와 용량으로 구성되는 2 개에서 16 개까지의 멀티코어 프로세서에 대하여 SPEC 2000 벤치마크를 입력으로 하여 모의실험을 수행하였다. 모의실험 결과, 명령어 캐쉬와 데이터 캐쉬의 구조를 2 차 연관도로 구성하고 각 용량을 64 KB로 설정할 때 하드웨어의 비용 대 성능 효과가 가장 높았다.
https://doi.org/10.7236/JIWIT.2012.12.6.175 인용 PDF KSCI

The Performance Study of a Virtualized Multicore Web System

Lu, Chien-Te;Yeh, C.S. Eugene;Wang, Yung-Chung;Yang, Chu-Sing
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제10권11호
- /
- pp.5419-5436
- /
- 2016
Enhancing the performance of computing systems has been an important topic since the invention of computers. The leading-edge technologies of multicore and virtualization dramatically influence the development of current IT systems. We study performance attributes of response time (RT), throughput, efficiency, and scalability of a virtualized Web system running on a multicore server. We build virtual machines (VMs) for a Web application, and use distributed stress tests to measure RTs and throughputs under varied combinations of virtual cores (VCs) and VM instances. Their gains, efficiencies and scalabilities are also computed and compared. Our experimental and analytic results indicate: 1) A system can perform and scale much better by adopting multiple single-VC VMs than by single multiple-VC VM. 2) The system capacity gain is proportional to the number of VM instances run, but not proportional to the number of VCs allocated in a VM. 3) A system with more VMs or VCs has higher physical CPU utilization, but lower vCPU utilization. 4) The maximum throughput gain is less than VM or VC gain. 5) Per-core computing efficiency does not correlate to the quality of VCs or VMs employed. The outcomes can provide valuable guidelines for selecting instance types provided by public Cloud providers and load balancing planning for Web systems.
https://doi.org/10.3837/tiis.2016.11.012 인용 PDF KSCI

MC-MIPOG: A Parallel t-Way Test Generation Strategy for Multicore Systems

Younis, Mohammed I.;Zamli, Kamal Z.
- ETRI Journal
- /
- 제32권1호
- /
- pp.73-83
- /
- 2010
Combinatorial testing has been an active research area in recent years. One challenge in this area is dealing with the combinatorial explosion problem, which typically requires a very expensive computational process to find a good test set that covers all the combinations for a given interaction strength (t). Parallelization can be an effective approach to manage this computational cost, that is, by taking advantage of the recent advancement of multicore architectures. In line with such alluring prospects, this paper presents a new deterministic strategy, called multicore modified input parameter order (MC-MIPOG) based on an earlier strategy, input parameter order generalized (IPOG). Unlike its predecessor strategy, MC-MIPOG adopts a novel approach by removing control and data dependency to permit the harnessing of multicore systems. Experiments are undertaken to demonstrate speedup gain and to compare the proposed strategy with other strategies, including IPOG. The overall results demonstrate that MC-MIPOG outperforms most existing strategies (IPOG, IPOF, IPOF2, IPOG-D, ITCH, TConfig, Jenny, and TVG) in terms of test size within acceptable execution time. Unlike most strategies, MC-MIPOG is also capable of supporting high interaction strengths of t > 6.
https://doi.org/10.4218/etrij.10.0109.0266 인용 PDF KSCI

멀티코어 모바일 시스템에서 효과적인 부하 균등화 기법 (An Efficient Load Balancing Technique in a Multicore Mobile System)

조중석;조두산
- 정보처리학회논문지:컴퓨터 및 통신 시스템
- /
- 제4권5호
- /
- pp.153-160
- /
- 2015
멀티코어 시스템의 효율은 스케줄러가 태스크 할당을 코어들에게 얼마나 효율적으로 분배하느냐에 달려있다. 이기종 멀티코어 플랫폼에서 애플리케이션의 실행시간은 어느 코어에서 실행되느냐에 따라 결정된다. 즉, 태스크 할당의 효율이 멀티 코어 시스템의 성능을 결정하는 중요한 요소 중의 하나이다. 본 연구에서는 프로파일링을 통하여 각 태스크의 실행시간을 분석하고 이를 이용하는 부하 균등화 기법을 제안하고 있다. 프로파일링 결과는 최상의 성능을 제공할 수 있는 태스크 할당을 예측하는 기본적인 정보를 제공한다. 이러한 정보를 이용하여 제안하는 기법을 통해 약 26%의 성능이득을 가질 수 있다.
https://doi.org/10.3745/KTCCS.2015.4.5.153 인용 PDF KSCI

멀티코어 프로세서 상에서 에너지 효율을 고려한 실시간 병렬 작업들의 결함 포용 스케쥴링 (Fault-tolerant Scheduling of Real-time Parallel Tasks with Energy Efficiency on Multicore Processors)

이관우
- 정보처리학회논문지:컴퓨터 및 통신 시스템
- /
- 제3권6호
- /
- pp.173-178
- /
- 2014
제시된 스케줄링 기법은 병렬처리 기법을 활용하여 실시간 작업들의 데드라인 제약과 결함 포용 제약을 만족하면서 멀티코어 프로세서의 에너지 소모 효율성을 향상시켰다. 최소 에너지 소모량 스케줄을 찾는 것은 NP-hard 문제이므로, 제시된 기법은 다항식의 시간 내에 최소 에너지 소모량에 근접하는 스케줄을 찾는다. 제시된 기법은 연관된 최신 기법과 비교하여 높은 병렬처리 속도는 물론 낮은 병렬처리 속도에서도 에너지 소모량이 현격하게 낮았으며, 에너지 소모량을 최대 86% 줄였다.
https://doi.org/10.3745/KTCCS.2014.3.6.173 인용 PDF KSCI

Concentric Core Fiber Design for Optical Fiber Communication

Nadeem, Iram;Choi, Dong-You
- Journal of information and communication convergence engineering
- /
- 제14권3호
- /
- pp.163-170
- /
- 2016
Because of rapid technological advancements, increased data rate support has become the key criterion for future communication medium selection. Multimode optical fibers and multicore optical fibers are well matched to high data rate throughput requirements because of their tendency to support multiple modes through one core at a time, which results in higher data rates. Using the numerical mode solver OptiFiber, we have designed a concentric core fiber by investigating certain design parameters, namely core diameter (µm), wavelength (nm), and refractive index profile, and as a result, the number of channels, material losses, bending losses, polarization mode dispersion, and the effective nonlinear refractive index have been determined. Space division multiplexing is a promising future technology that uses few-mode fibers in parallel to form a multicore fiber. The experimental tests are conducted using the standard second window wavelength of 1,550 nm and simulated results are presented.
https://doi.org/10.6109/jicce.2016.14.3.163 인용 KSCI KPUBS HTML

Time-Predictable Java Dynamic Compilation on Multicore Processors

Sun, Yu;Zhang, Wei
- Journal of Computing Science and Engineering
- /
- 제6권1호
- /
- pp.26-38
- /
- 2012
Java has been increasingly used in programming for real-time systems. However, some of Java's features such as automatic memory management and dynamic compilation are harmful to time predictability. If these problems are not solved properly then it can fundamentally limit the usage of Java for real-time systems, especially for hard real-time systems that require very high time predictability. In this paper, we propose to exploit multicore computing in order to reduce the timing unpredictability that is caused by dynamic compilation and adaptive optimization. Our goal is to retain high performance comparable to that of traditional dynamic compilation, while at the same time, obtain better time predictability for Java virtual machine (JVM). We have studied pre-compilation techniques to utilize another core more efficiently, preoptimization on another core (PoAC) scheme to replace the adaptive optimization system (AOS) in Jikes JVM and the counter based optimization (CBO). Our evaluation reveals that the proposed approaches are able to attain high performance while greatly reducing the variation of the execution time for Java applications.
https://doi.org/10.5626/JCSE.2012.6.1.26 인용 PDF KSCI KPUBS

검색결과 143건 처리시간 0.025초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)