• Title/Summary/Keyword: parallel computer processing

Search Result 652, Processing Time 0.027 seconds

Accelerating Gaussian Hole-Filling Algorithm using GPU (GPU를 이용한 Gaussian Hole-Filling Algorithm 가속)

  • Park, Jun-Ho;Han, Tack-Don
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2012.07a
    • /
    • pp.79-82
    • /
    • 2012
  • 3차원 멀티미디어 서비스에 대한 관심이 높아짐에 따라 관련 연구들이 현재 다양하게 논의되고 있다. Stereoscopy영상을 생성하기 위한 기존의 방법으로는 두 대의 촬영용 카메라를 일정한 간격으로 띄워놓고 피사체를 촬영한 후 해당 좌시점과 우시점을 생성하는 방법을 이용하였다. 하지만 이는 영상 대역폭의 부담을 가져오게 된다. 이를 해결하기 위하여 Depth정보와 한 장의 영상을 이용한 DIBR(Depth Image Based Rendering) Algorithm에 대한 연구가 많이 이루어지고 있다. 그중 Gaussian Depth Map을 이용한 Hole-Filling 방법은 DIBR에서 가장 자연스러운 결과를 보여주지만 다른 DIBR Algorithm들에 비해 속도가 현저히 느리다는 단점이 있다. 본 논문에서는 영상 생성의 고속화를 위해 GPU를 이용한 Gaussian Hole-Filling Algorithm의 병렬처리 구조를 제안하고 이를 이용한 DIBR Algorithm 생성과정을 제시한다.

  • PDF

A Dynamic Work Manager for Heterogeneous Cluster Systems (DWM: 이기종 클러스터 시스템의 동적 자원 관리자)

  • Park, Jong-Hyun;Kim, Jun-Seong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.6
    • /
    • pp.56-62
    • /
    • 2009
  • Inexpensive high performance computer systems combined with high speed networks and machine independent communication libraries have made cluster computing a viable option for parallel applications. In a heterogeneous cluster environment, efficient resource management is critically important since the computing power of the individual computer system is a significant performance factor when executing applications in parallel. This paper presents a dynamic task manager, called DWM (dynamic work manager). It makes a heterogeneous cluster system fully utilize the different computing power of its individual computer system. We measure the performance of DWM in a heterogeneous cluster environment with several kernel-level benchmark programs and their programming complexity quantitatively. From the experiments, we found that DWM provides competitive performance with a notable reduction in programming effort.

A Processor Allocation Policy using Program Characteristics on Shared Bus (공유 버스상에서 프로그램 특성을 사용한 프로세서 할당 정책)

  • Jeong, In-Beom;Lee, Jun-Won
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.9
    • /
    • pp.1073-1082
    • /
    • 1999
  • 본 논문에서는 시스템 내의 프로세서들을 효과적으로 사용하기 위한 적응적 프로세서 할당 정책을 제안한다. 프로그램의 병렬성을 향상시키기 위하여 일반적으로 병렬 처리에 사용될 프로세서 개수를 증가시킨다. 그러나 증가된 프로세서들은 그레인 크기에 변화를 일으키며 이는 캐쉬 성능에 영향을 미친다. 특히 대역이 제한된 공유 버스를 사용하는 시스템에서는 프로세서 개수의 증가는 공유 버스에 대한 접근 경쟁을 크게 증가하므로 버스에서 대기하는 시간이 프로세서 증가에 의한 계산 능력 이득을 상쇄시키는 주요한 원인이 되고 있다. 본 논문에서 제안한 적응적 프로세서 할당 정책은 프로그램이 수행되는 도중에 임의의 기간동안 공유버스에 대기중인 프로세서 분포에 관한 정보를 얻는다. 그리고 이 정보를 바탕으로 프로세서 개수를 변경하는 방법이다. 모의 시험에서 적응적 프로세서 할당 정책은 프로그램들의 버스 트래픽 특성에 따른 최적의 적합한 프로세서 개수를 발견함을 보인다. 그리고 적응적 프로세서 할당 정책은 고정된 프로세서 개수를 사용한 가장 좋은 성능보다는 다소 떨어진 성능을 나타내었으나 시스템의 프로세서 활용성을 높여 효과적 시스템 사용에 기여함을 보인다. Abstract In this paper, the adaptive processor allocation policy is suggested to make effective use of processors in system. To enhance the parallelism, the number of processors used in the parallel computing may be increased. However, increasing the number of processors affects the grain size of the parallel program. Therefore, it affects the cache performance. In particular, when the shared bus is employed, since increasing the number of processors can result in a significant amount of contention to achieve the shared-bus, the increased computing power is offset by the bus waiting time due to these contentions. The adaptive processor allocation policy acquires the information about the distribution of waiting processors on shared bus for any execution period of programs. And it changes the number of processors working in parallel processing during the program's run. Our simulation results show that the adaptive processor allocation policy finds the optimum feasible number of processors based on the bus traffic characteristic of programs. Thus, it contributes to effective system utilization, even though it performs slightly less efficiently than using a fixed number of processors with the best performance.

GPU-based Parallel Ant Colony System for Traveling Salesman Problem

  • Rhee, Yunseok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.1-8
    • /
    • 2022
  • In this paper, we design and implement a GPU-based parallel algorithm to effectively solve the traveling salesman problem through an ant color system. The repetition process of generating hundreds or thousands of tours simultaneously in TSP utilizes GPU's task-level parallelism, and the update process of pheromone trails data actively exploits data parallelism by 32x32 thread blocks. In particular, through simultaneous memory access of multiple threads, the coalesced accesses on continuous memory addresses and concurrent accesses on shared memory are supported. This experiment used 127 to 1002 city data provided by TSPLIB, and compared the performance of sequential and parallel algorithms by using Intel Core i9-9900K CPU and Nvidia Titan RTX system. Performance improvement by GPU parallelization shows speedup of about 10.13 to 11.37 times.

Assistant Professor, Department of Computer Engineering Pukyong Universisty (한국형 방송 프로그램 시스템 디코더 ASSP의 개발)

  • Jo, Gyeong-Yeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.5
    • /
    • pp.1229-1239
    • /
    • 1996
  • The increase of additional information broadcasting of TV demands a graphic overlay processor. This paper is about the design, implementation and testing of a graphic overlay processor called by KBPS decoder ASSP (Applicatio n Specific Standard Product) which is compliance with Korea Broadcast Programming System. KBPS decoder ASSP consists of embedded 8 bit microprocessor Z80, graphic overlay controller, KBPS schedule decoder, memory controller, priority interrupt controller, MIDI controller, infrared raccoon receiver, async scrial communication controller, timer, bus controller, universal parallel input-output port and serial-parallel interface. The 0.8 micron CMOS Sea of Gate is used to implement the ASSP in amount of about 31,500 gates, and it is running at 14.318MHz.

  • PDF

A Parallel Implementation of Multiple Non-overlapping Cameras for Robot Pose Estimation

  • Ragab, Mohammad Ehab;Elkabbany, Ghada Farouk
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.11
    • /
    • pp.4103-4117
    • /
    • 2014
  • Image processing and computer vision algorithms are gaining larger concern in a variety of application areas such as robotics and man-machine interaction. Vision allows the development of flexible, intelligent, and less intrusive approaches than most of the other sensor systems. In this work, we determine the location and orientation of a mobile robot which is crucial for performing its tasks. In order to be able to operate in real time there is a need to speed up different vision routines. Therefore, we present and evaluate a method for introducing parallelism into the multiple non-overlapping camera pose estimation algorithm proposed in [1]. In this algorithm the problem has been solved in real time using multiple non-overlapping cameras and the Extended Kalman Filter (EKF). Four cameras arranged in two back-to-back pairs are put on the platform of a moving robot. An important benefit of using multiple cameras for robot pose estimation is the capability of resolving vision uncertainties such as the bas-relief ambiguity. The proposed method is based on algorithmic skeletons for low, medium and high levels of parallelization. The analysis shows that the use of a multiprocessor system enhances the system performance by about 87%. In addition, the proposed design is scalable, which is necaccery in this application where the number of features changes repeatedly.

A Load Balancing Technique Combined with Mean-Field Annealing and Genetic Algorithms (평균장 어닐링과 유전자 알고리즘을 결합한 부하균형기법)

  • Hong Chul-Eui;Park Kyeong-Mo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.8
    • /
    • pp.486-494
    • /
    • 2006
  • In this paper, we introduce a new solution for the load balancing problem, an important issue in parallel processing. Our heuristic load balancing technique called MGA effectively combines the benefit of both mean-field annealing (MFA) and genetic algorithms (GA). We compare the proposed MGA algorithm with other mapping algorithms (MFA, GA-l, and GA-2). A multiprocessor mapping algorithm simulation has been developed to measure performance improvement ratio of these algorithms. Our experimental results show that our new technique, the composition of heuristic mapping methods improves performance over the conventional ones, in terms of solution quality with a longer run time.

Global Internet Computing Environment based on Java (자바를 기반으로 한 글로벌 인터넷 컴퓨팅 환경)

  • Kim, Hui-Cheol;Sin, Pil-Seop;Park, Yeong-Jin;Lee, Yong-Du
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.9
    • /
    • pp.2320-2331
    • /
    • 1999
  • Over the Internet, in order to utilize a collection of idle computers as a parallel computing platform, we propose a new scheme called GICE(Global Internet Computing Environment). GICE is motivated to obtain high programmability, efficient support for heterogeneous computing resources, system scalability, and finally high performance. The programming model of GICE is based on a single address space. GICE is featured with a Java based programming environment, a dynamic resource management scheme, and efficient parallel task scheduling and execution mechanisms. Based on a prototype implementation of GICE, we address the concept, feasibility, complexity and performance of Internet computing.

  • PDF

Multiview Stereo Matching on Mobile Devices Using Parallel Processing on Embedded GPU (임베디드 GPU에서의 병렬처리를 이용한 모바일 기기에서의 다중뷰 스테레오 정합)

  • Jeon, Yun Bae;Park, In Kyu
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.1064-1071
    • /
    • 2019
  • Multiview stereo matching algorithm is used to reconstruct 3D shape from a set of 2D images. Conventional multiview stereo algorithms have been implemented on high-performance hardware due to the heavy complexity that contains a large number of calculations in each step. However, as the performance of mobile graphics processors has recently increased rapidly, complex computer vision algorithms can now be implemented on mobile devices like a smartphone and an embedded board. In this paper we parallelize an multiview stereo algorithm using OpenCL on mobile GPU and provide various optimization techniques on the embedded hardware with limited resource.

FUZZY HYPERCUBES: A New Inference Machines

  • Kang, Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.2 no.2
    • /
    • pp.34-41
    • /
    • 1992
  • A robust and reliable learning and reasoning mechanism is addressed based upon fuzzy set theory and fuzzy associative memories. The mechanism stores a priori an initial knowledge base via approximate learning and utilizes this information for decision-making systems via fuzzy inferencing. We called this fuzzy computer architecture a 'fuzzy hypercube' processing all the rules in one clock period in parallel. Fuzzy hypercubes can be applied to control of a class of complex and highly nonlinear systems which suffer from vagueness uncertainty. Moreover, evidential aspects of a fuzzy hypercube are treated to assess the degree of certainty or reliability together with parameter sensitivity.

  • PDF