Search | Korea Science

L-shaped Submesh Allocation Scheme for Mesh-Connected Multicomputers (메쉬 멀티컴퓨터에서 L-모양 서브메쉬 할당기법)

서경희;김성천
- Journal of KIISE:Computer Systems and Theory
- /
- v.30 no.1
- /
- pp.1-11
- /
- 2003
Fragmentation is the main performance bottleneck of large, multi-user multicomputer system. This paper presents an L-Shaped Submesh Allocation(LSSA) strategy, which lifts the restriction on the rectangular shape formed by allocated processors in order to address the problem of fragmentation. LSSA can manipulate the shape of the required submesh to fit into the fragmented mesh system. Thus, LSSA accommodates incoming jobs faster than other strategies and results in the reduction of job response time. Extensive simulations show that LSSA performs more efficiently than other strategies in terms of the external fragmentation, the job response time and the system utilization.
PDF KSCI

Determination of a Grain Size for Reducing Cache Miss Rate of Direct-Mapped Caches (직접 사상 캐쉬의 캐쉬 실패율을 감소시키기 위한 성김도 정책)

Jung, In-Bum;Kong, Ki-Sok;Lee, Joon-Won
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.7
- /
- pp.665-674
- /
- 2000
In data parallel programs incurring high cache locality, the choice of grain sizes affects cache performance. Though the grain sizes chosen provide fair load balance among processors, the grain sizes that ignore underlying caching effect result in address interferences between grains allocated to a processor. These address interferences appear to have a negative impact on the cache locality, since they result in cache conflict misses. To address this problem, we propose a best grain size driven from a cache size and the number of processors based on direct mapped cache's characteristic. Since the proposed method does not map the grains to the same location in the cache, cache conflict misses are reduced. Simulation results show that the proposed best grain size substantially improves the performance of tested data parallel programs through the reduction of cache misses on direct-mapped caches.
PDF

Energy-Aware Task Scheduling for Multiprocessors using Dynamic Voltage Scaling and Power Shutdown (멀티프로세서상의 에너지 소모를 고려한 동적 전압 스케일링 및 전력 셧다운을 이용한 태스크 스케줄링)

Kim, Hyun-Jin;Hong, Hye-Jeong;Kim, Hong-Sik;Kang, Sung-Ho
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.46 no.7
- /
- pp.22-28
- /
- 2009
As multiprocessors have been widely adopted in embedded systems, task computation energy consumption should be minimized with several low power techniques supported by the multiprocessors. This paper proposes an energy-aware task scheduling algorithm that adopts both dynamic voltage scaling and power shutdown in multiprocessor environments. Considering the timing and energy overhead of power shutdown, the proposed algorithm performs an iterative task assignment and task ordering for multiprocessor systems. In this case, the iterative priority-based task scheduling is adopted to obtain the best solution with the minimized total energy consumption. Total energy consumption is calculated by considering a linear programming model and threshold time of power shutdown. By analyzing experimental results for standard task graphs based on real applications, the resource and timing limitations were analyzed to maximize energy savings. Considering the experimental results, the proposed energy-aware task scheduling provided meaningful performance enhancements over the existing priority-based task scheduling approaches.
PDF KSCI

Mileage-based Asymmetric Multi-core Scheduling for Mobile Devices (모바일 디바이스를 위한 마일리지 기반 비대칭 멀티코어 스케줄링)

Lee, Se Won;Lee, Byoung-Hoon;Lim, Sung-Hwa
- Journal of Korea Society of Industrial Information Systems
- /
- v.26 no.5
- /
- pp.11-19
- /
- 2021
In this paper, we proposed an asymmetric multi-core processor scheduling scheme which is based on the mileage of each core. We considered a big-LITTLE multi-core processor structure, which consists of low power consuming LITTLE cores with general performance and high power consuming big cores with high performance. If a task needs to be processed, the processor decides a core type (big or LITTLE) to handle the task, and then investigate the core with the shortest mileage among unoccupied cores. Then assigns the task to the core. We developed a mileage-based balancing algorithm for asymmetric multi-core assignment and showed that the proposed scheduling scheme is more cost-effective compared to the traditional scheme from a management perspective. Simulation is also conducted for the purpose of performance evaluation of our proposed algorithm.
https://doi.org/10.9723/jksiis.2021.26.5.002 인용 PDF KSCI

Analysis of GPU Performance and Memory Efficiency according to Task Processing Units (작업 처리 단위 변화에 따른 GPU 성능과 메모리 접근 시간의 관계 분석)

Son, Dong Oh;Sim, Gyu Yeon;Kim, Cheol Hong
- Smart Media Journal
- /
- v.4 no.4
- /
- pp.56-63
- /
- 2015
Modern GPU can execute mass parallel computation by exploiting many GPU core. GPGPU architecture, which is one of approaches exploiting outstanding computational resources on GPU, executes general-purpose applications as well as graphics applications, effectively. In this paper, we investigate the impact of memory-efficiency and performance according to number of CTAs(Cooperative Thread Array) on a SM(Streaming Multiprocessors), since the analysis of relation between number of CTA on a SM and them provides inspiration for researchers who study the GPU to improve the performance. Our simulation results show that almost benchmarks increasing the number of CTAs on a SM improve the performance. On the other hand, some benchmarks cannot provide performance improvement. This is because the number of CTAs generated from same kernel is a little or the number of CTAs executed simultaneously is not enough. To precisely classify the analysis of performance according to number of CTA on a SM, we also analyze the relations between performance and memory stall, dram stall due to the interconnect congestion, pipeline stall at the memory stage. We expect that our analysis results help the study to improve the parallelism and memory-efficiency on GPGPU architecture.
PDF KSCI

A New Register Allocation Technique for Performance Enhancement of Embedded Software (내장형 소프트웨어의 성능 향상을 위한 새로운 레지스터 할당 기법)

Jong-Yeol, Lee
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.41 no.10
- /
- pp.85-94
- /
- 2004
In this paper, a register allocation techlique that translates memory accesses to register accesses Is presented to enhance embedded software performance. In the proposed method, a source code is profiled to generate a memory trace. From the profiling results, target functions with high dynamic call counts are selected, and the proposed register allocation technique is applied only to the target functions to save the compilation time. The memory trace of the target functions is searched for the memory accesses that result in cycle count reduction when replaced by register accesses, and they are translated to register accesses by modifying the intermediate code and allocating Promotion registers. The experiments where the performance is measured in terms of the cycle count on MediaBench and DSPstone benchmark programs show that the proposed method increases the performance by 14% and 18% on the average for ARM and MCORE, respectively.
PDF KSCI

Parallel Computing Strategies for High-Speed Impact into Ceramic/Metal Plates (세라믹/금속판재의 고속충돌 파괴 유한요소 병렬 해석기법)

Moon, Ji-Joong;Kim, Seung-Jo;Lee, Min-Hyung
- Journal of the Computational Structural Engineering Institute of Korea
- /
- v.22 no.6
- /
- pp.527-532
- /
- 2009
In this paper simulations for the impact into ceramics and/or metal materials have been discussed. To model discrete nature for fracture and damage of brittle materials, we implemented cohesive-law fracture model with a node separation algorithm for the tensile failure and Mohr-Coulomb model for the compressive loading. The drawback of this scheme is that it requires a heavy computational time. This is because new nodes are generated continuously whenever a new crack surface is created. In order to reduce the amount of calculation, parallelization with MPI library has been implemented. For the high-speed impact problems, the mesh configuration and contact calculation changes continuously as time step advances and it causes unbalance of computational load of each processor. Dynamic load balancing technique which re-allocates the loading dynamically is used to achieve good parallel performance. Some impact problems have been simulated and the parallel performance and accuracy of the solutions are discussed.
PDF KSCI

Allocation Priority Scheme for Multiprocessor Systems (다중프로세서 시스템에 적합한 우선순위 할당 결정기법에 관한 연구)

Park Yeong-Seon;Kim Hwa-Su
- Journal of the military operations research society of Korea
- /
- v.17 no.2
- /
- pp.113-122
- /
- 1991
This paper presents the Allocation Priority Scheme (APS) for multiprocessor system. The objective of APS is to reduce the time-complexity on a Physical Mapping Scheme(PMS). The PMS is to allocate the nodes of the Data Dependency Graph (DDG) to the multprocessors efficiently and effectively. The APS provides the priority to each node (vertex) in the DDG. In other words, the goal of the APS is to find a request resource mapping such that the total cost (time-complexity) is minimized. The special case in which all requests have equal priorities and all resoruces have equal precedences, and the comparisons between our APS and other schems are discussed in the paper. The APS provides the heuristic rules which are based on maximum height (MH), number of children nodes ($N_c$), number of father nodes ($N_f$), and computation time ($T_c$). The estimation moth of the computaion time is in the paper.
PDF

A Parallel Speech Recognition System based on Hidden Markov Model (은닉 마코프 모델 기반 병렬음성인식 시스템)

Jeong, Sang-Hwa;Park, Min-Uk
- Journal of KIISE:Computer Systems and Theory
- /
- v.27 no.12
- /
- pp.951-959
- /
- 2000
본 논문의 병렬음성인식 모델은 연속 은닉 마코프 모델(HMM; hidden Markov model)에 기반한 병렬 음소인식모듈과 계층구조의 지식베이스에 기반한 병렬 문장인식모듈로 구성된다. 병렬 음소인식 모듈은 수천개의 HMM을 병렬 프로세서에 분산시킨 수, 할당된 HMM에 대한 출력확률 계산과 Viterbi 알고리즘을 담당한다. 지식베이스 기반 병렬 문장인식모듈은 음소모듈에서 공급되는 음소열과 지안하는 병렬 음성인식 알고리즘은 분산메모리 MIMD 구조의 다중 트랜스퓨터와 Parsytec CC 상에 구현되었다. 실험결과, 병렬 음소인식모듈을 통한 실행시간 향상과 병렬 문장인식모듈을 통한 인식률 향상을 얻을 수 있었으며 병렬 음성인식 시스템의 실시간 구현 가능성을 확인하였다.
PDF

A Parallel Task Oriented Memory Manager for Dynamic Objects (동적 객체에 대한 병렬 타스크 중심의 메모리 관리기)

Kim, Eun-Jeong;Bae, Jong-Min
- The Transactions of the Korea Information Processing Society
- /
- v.4 no.5
- /
- pp.1391-1400
- /
- 1997
공유 메모리 다중 프로세서상에서 많은 동적 객체를 생성하는 언어가 실행될 때, 동적 객체에 대한 메모리 관리 알고리즘은 프로그램의 실행 속도에 큰 영향을 미친다. 본 논문에서는 이러한 환경에서 프로그램의 성능을 향상 시킬 수 있는 새로운 메모리 관리 알고리즘을 제안한다. 이를 위해 힘 영역의 할당 및 회수 작업을 병렬 타스크 중심으로 행한다. 또한 동적 객체를 병렬 타스크사이에 공유 되는 객체(shared data) 와 비공유 객체(mon-shared data)로 구분하고, 힘 영역을 공동 영역과 전용 영역으로 분리 한다. 이는 병렬 타스크가 동적으로 스케줄링되는 것을 자유롭게 하고 창조 지역성 을 높이는 효과가 있으며, 전용 영역에 대한 메모리 재사용으로 인하여 볼용 셀수집기의 수행 횟수를 줄일 수 있다.
PDF

Search Result 141, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)