• Title/Summary/Keyword: 프로세서 할당

Search Result 141, Processing Time 0.028 seconds

Developing Integrated Model of Eclipse Plugins for Software Process Implementation of Small Organizations (소규모 조직의 소프트웨어 프로세서 구현을 위한 이클립스 플러그인의 통합 모델 개발)

  • Sung Ryong Do;Hyuk Soo Han;Sang Eun Lee;Hyuk Jae Lee;Moon Sik Bae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.578-581
    • /
    • 2008
  • 소프트웨어 프로세스는 소프트웨어와 이에 관련된 산출물을 개발, 유지하기 위해 사용하는 활동, 방법, 절차의 집합이라고 할 수 있다. 프로세스를 기반으로 작업하는 조직은 필요한 프로세스들을 파악하고, 각 프로세스들을 구현하기 위해, 담당자를 할당하고, 수행 활동을 정의한 후, 이를 기반으로 작업을 수행한다. 이 때 보다 효과적으로 작업하기 위해 적절한 도구들을 활용하기도 한다. 소프트웨어 개발에서 도구의 활용은 이미 그 효과가 검증되었고, 많은 상업용 제품들이 개발되어 현장에서 사용되고 있다. 이러한 도구들 중에는 독자적으로 하나의 프로세스를 지원하는 독립형(Standard Alone) 도구들과 여러 프로세스를 지원하는 통합형 도구들이 있다. 통합형 도구들은 여러 프로세스를 연결하고 통합 관리하기 때문에 효과가 크지만, 주로 가격이 비싼 상업용 제품들이고, 대규모 프로젝트에 적합한 복잡한 기능이 많아 소규모 조직이 채택하기에는 어려운 경향이 있다. 독립형 도구들은 통합형 도구보다 상대적으로 기능이 복잡하지 않고, 공개 소프트웨어로도 제공되고 있기 때문에 소규모 조직들도 사용해 왔지만 통합형 도구와 같은 효과를 내기는 쉽지 않았다. 본 논문에서는 이클립스 플랫폼 기반에 독립형으로 존재하는 플러그인들을 통합하여, 여러 프로세스를 지원하는 이클립스 플러그인 모델을 개발하고, 그 효과를 살펴보았다.

A Massively Parallel Algorithm for Fuzzy Vector Quantization (퍼지 벡터 양자화를 위한 대규모 병렬 알고리즘)

  • Huynh, Luong Van;Kim, Cheol-Hong;Kim, Jong-Myon
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.411-418
    • /
    • 2009
  • Vector quantization algorithm based on fuzzy clustering has been widely used in the field of data compression since the use of fuzzy clustering analysis in the early stages of a vector quantization process can make this process less sensitive to its initialization. However, the process of fuzzy clustering is computationally very intensive because of its complex framework for the quantitative formulation of the uncertainty involved in the training vector space. To overcome the computational burden of the process, this paper introduces an array architecture for the implementation of fuzzy vector quantization (FVQ). The arrayarchitecture, which consists of 4,096 processing elements (PEs), provides a computationally efficient solution by employing an effective vector assignment strategy during the clustering process. Experimental results indicatethat the proposed parallel implementation providessignificantly greater performance and efficiency than appropriately scaled alternative array systems. In addition, the proposed parallel implementation provides 1000x greater performance and 100x higher energy efficiency than other implementations using today's ARMand TI DSP processors in the same 130nm technology. These results demonstrate that the proposed parallel implementation shows the potential for improved performance and energy efficiency.

A Novel Task Scheduling Algorithm Based on Critical Nodes for Distributed Heterogeneous Computing System (분산 이기종 컴퓨팅 시스템에서 임계노드를 고려한 태스크 스케줄링 알고리즘)

  • Kim, Hojoong;Song, Inseong;Jeong, Yong Su;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.3
    • /
    • pp.116-126
    • /
    • 2015
  • In a distributed heterogeneous computing system, the performance of a parallel application greatly depends on its task scheduling algorithm. Therefore, in order to improve the performance, it is essential to consider some factors that can have effect on the performance of the parallel application in a given environment. One of the most important factors that affects the total execution time is a critical path. In this paper, we propose the CLTS algorithm for a task scheduling. The CLTS sets the priorities of all nodes to improve overall performance by applying leveling method to improve parallelism of task execution and by reducing the delay caused by waiting for execution of critical nodes in priority phase. After that, it conditionally uses insertion based policy or duplication based policy in processor allocation phase to reduce total schedule time. To evaluate the performance of the CLTS, we compared the CLTS with the DCPD and the HCPFD in our simulation. The results of the simulations show that the CLTS is better than the HCPFD by 7.29% and the DCPD by 8.93%. with respect to the average SLR, and also better than the HCPFD by 9.21% and the DCPD by 7.66% with respect to the average speedup.

A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization (GPGPU 자원 활용 개선을 위한 블록 지연시간 기반 워프 스케줄링 기법)

  • Thuan, Do Cong;Choi, Yong;Kim, Jong Myon;Kim, Cheol Hong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.5
    • /
    • pp.219-230
    • /
    • 2017
  • General-Purpose Graphics Processing Units (GPGPUs) build massively parallel architecture and apply multithreading technology to explore parallelism. By using programming models like CUDA, and OpenCL, GPGPUs are becoming the best in exploiting plentiful thread-level parallelism caused by parallel applications. Unfortunately, modern GPGPU cannot efficiently utilize its available hardware resources for numerous general-purpose applications. One of the primary reasons is the inefficiency of existing warp/thread block schedulers in hiding long latency instructions, resulting in lost opportunity to improve the performance. This paper studies the effects of hardware thread scheduling policy on GPGPU performance. We propose a novel warp scheduling policy that can alleviate the drawbacks of the traditional round-robin policy. The proposed warp scheduler first classifies the warps of a thread block into two groups, warps with long latency and warps with short latency and then schedules the warps with long latency before the warps with short latency. Furthermore, to support the proposed warp scheduler, we also propose a supplemental technique that can dynamically reduce the number of streaming multiprocessors to which will be assigned thread blocks when encountering a high contention degree at the memory and interconnection network. Based on our experiments on a 15-streaming multiprocessor GPGPU platform, the proposed warp scheduling policy provides an average IPC improvement of 7.5% over the baseline round-robin warp scheduling policy. This paper also shows that the GPGPU performance can be improved by approximately 8.9% on average when the two proposed techniques are combined.

MNFS: Design of Mobile Multimedia File System based on NAND FLASH Memory (MNFS : NAND 플래시메모리를 기반으로 하는 모바일 멀티미디어 파일시스템의 설계)

  • Kim, Hyo-Jin;Won, You-Jip;Kim, Yo-Hwan
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.11
    • /
    • pp.497-508
    • /
    • 2008
  • Mobile Multimedia File System, MNFS, is a file system which extensively exploits NAND FLASH Memory, Since general Flash file systems does not precisely meet the criteria of mobile devices such as MP3 Player, PMP, Digital Camcorder, MNFS is designed to guarantee the optimal performance of FLASH Memory file system. Among many features MNFS provides, there are three distinguishable characteristics. MNFS guarantees, first, constant response time in sequential write requests of the file system, second, fast file system mounting time, and lastly least memory footprint. MNFS implements four schemes to provide such features, Hybrid mapping scheme to map file system metadata and user data, manipulation of user data allocation to fit allocation unit of file data into allocation unit of NAND FLASH Memory, iBAT (in core only Block Allocation Table) to minimize the metadata, and bottom-up representation of directory. Prototype implementation of MNFS was tested and measured its performance on ARM9 processor and 1Gbit NAND FLASH Memory environment. Its performance was compared with YAFFS, NAND FLASH File system, and FAT file system which use FTL. This enables to observe constant request time for sequential write request. It shows 30 times faster mounting time to YAFFS, and reduces 95% of HEAP memory consumption compared to YAFFS.

Scheduling Algorithm using DAG Leveling in Optical Grid Environment (옵티컬 그리드 환경에서 DAG 계층화를 통한 스케줄링 알고리즘)

  • Yoon, Wan-Oh;Lim, Hyun-Soo;Song, In-Seong;Kim, Ji-Won;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.71-81
    • /
    • 2010
  • In grid system, Task scheduling based on list scheduling models has showed low complexity and high efficiency in fully connected processor set environment. However, earlier schemes did not consider sufficiently the communication cost among tasks and the composition process of lightpath for communication in optical gird environment. In this thesis, we propose LSOG (Leveling Selection in Optical Grid) which sets task priority after forming a hierarchical directed acyclic graph (DAG) that is optimized in optical grid environment. To determine priorities of task assignment in the same level, proposed algorithm executes the task with biggest communication cost between itself and its predecessor. Then, it considers the shortest route for communication between tasks. This process improves communication cost in scheduling process through optimizing link resource usage in optical grid environment. We compared LSOG algorithm with conventional ELSA (Extended List Scheduling Algorithm) and SCP (Scheduled Critical Path) algorithm. We could see the enhancement in overall scheduling performance through increment in CCR value and smoothing network environment.

An Energy-Delay Efficient System with Adaptive Victim Caches (선택적 희생 캐쉬를 이용한 저전력 고성능 시스템 설계 방안)

  • Kim Cheol Hong;Shim Sunghoon;Jhon Chu Shik;Jhang Seong Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.663-674
    • /
    • 2005
  • We propose a system aimed at achieving high energy-delay efficiency by using adaptive victim caches. Particularly, we investigate methods to improve the hit rates in the first level of memory hierarchy, which reduces the number of accesses to mort power consuming memory structures such as L2 cache. Victim cache is a memory element for reducing conflict misses in a direct-mapped L1 cache. We present two techniques to fill the victim cache with the blocks that have higher probability to be re-reqeusted by processor. Hit-based victim cache ks tilled with the blocks which were referenced frequently by processor. Replacement-based victim cache is filled with the blocks which were evicted from the sets where block replacements had happened frequently According to our simulations, replacement-based victim cache scheme outperforms the conventional victim cache scheme about $2\%$ on average and refutes the power consumption by up to $8\%$.

Run-time Memory Optimization Algorithm for the DDMB Architecture (DDMB 구조에서의 런타임 메모리 최적화 알고리즘)

  • Cho, Jeong-Hun;Paek, Yun-Heung;Kwon, Soo-Hyun
    • The KIPS Transactions:PartA
    • /
    • v.13A no.5 s.102
    • /
    • pp.413-420
    • /
    • 2006
  • Most vendors of digital signal processors (DSPs) support a Harvard architecture, which has two or more memory buses, one for program and one or more for data and allow the processor to access multiple words of data from memory in a single instruction cycle. We already addressed how to efficiently assign data to multi-memory banks in our previous work. This paper reports on our recent attempt to optimize run-time memory. The run-time environment for dual data memory banks (DBMBs) requires two run-time stacks to control activation records located in two memory banks corresponding to calling procedures. However, activation records of two memory banks for a procedure are able to have different size. As a consequence, dual run-time stacks can be unbalanced whenever a procedure is called. This unbalance between two memory banks causes that usage of one memory bank can exceed the extent of on-chip memory area although there is free area in the other memory bank. We attempt balancing dual run-time slacks to enhance efficiently utilization of on-chip memory in this paper. The experimental results have revealed that although our algorithm is relatively quite simple, it still can utilize run-time memories efficiently; thus enabling our compiler to run extremely fast, yet minimizing the usage of un-time memory in the target code.

A Study on the Efficient Task Scheduling by the Reconstructed Task Graph (태스크 그래프의 재구성에 의한 효율적 태스크 스케줄링에 관한 연구)

  • Byun, Seung-Hwan;Yoo, Kwan-Jong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.9
    • /
    • pp.2235-2246
    • /
    • 1997
  • This paper presents an effective heuristic task scheduling algorithm for multiprocessor systems. To execute task scheduling effectively which is defined as an allocation of m's tasks onto n's processors(m > n), several problems almost at NP-hard should be cleaned up. The purpose of the task scheduling obtains the minimum execution time by mapping the tasks on a system topology or reduces the total execution time to give a minimum system topology. In order to solve this problem, in this paper, the task scheduling is done by redefining a task graph to a reconstructed task graph (RTG). An RTG is obtained by merging or copying nodes to equal the number of nodes on each level of the task graph to the number of processors of the system topology and then directly scheduled to the system topology. This method obtains a fast scheduling time and a simple scheduling method, and near-optimal execution time without executing steps such as the refinement step and the duplication step after the task scheduling.

  • PDF

Code Size Reduction Through Efficient use of Multiple Load/store Instructions (복수의 메모리 접근 명령어의 효율적인 이용을 통한 코드 크기의 감소)

  • Ahn Minwook;Cho Doosan;Paek Yunheung;Cho Jeonghun
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.8
    • /
    • pp.819-833
    • /
    • 2005
  • Code size reduction is ever becoming more important for compilers targeting embedded processors because these processors are often severely limited by storage constraints and thus the reduced code size can have a positively significant Impact on their performance. Various code size reduction techniques have different motivations and a variety of application contexts utilizing special hardware features of their target processors. In this work, we propose a novel technique that fully utilizes a set of hardware instructions, called the multiple load/store (MLS), that are specially featured for reducing code size by minimizing the number of memory operations in the code. To take advantage of this feature, many microprocessors support the MLS instructions, whereas no existing compilers fully exploit the potential benefit of these instructions but only use them for some limited cases. This is mainly because optimizing memory accesses with MLS instructions for general cases is an NP-hard problem that necessitates complex assignments of registers and memory off-sets for variables in a stack frame. Our technique uses a couple of heuristics to efficiently handle this problem in a polynomial time bound.