• Title/Summary/Keyword: parallel computer processing

Search Result 652, Processing Time 0.027 seconds

Design of RMESH Parallel Algorithms for Median Filters (Median 필터를 위한 RMESH 병렬 알고리즘의 설계)

  • Jeon, Byeong-Moon;Jeong, Chang-Sung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.11
    • /
    • pp.2845-2854
    • /
    • 1998
  • Median filter can be implemented in the binary domain based on threshold decomposition, stacking property, and linear separability. In this paper, we develop one-dimensional and two-dimensional parallel algorithms for the median filter on a reconfigurable mesh with buses(RMESH) which is suitable for VLSI implementation. And we evaluate their performance by comparing the time complexities of RMESH algorithms with those of algorithms on mesh-connected computer. When the length of M-valued 1-D signal is N and w is the window width, the RMESH algorithm is done in O(Mw) time and mesh algorithm is done in $O(Mw^2)$ time. Beside, when the size of M-valued 2-D image is $N{\times}N$ and the window size is $w{\times}w$, our algorithm on $N{\times}N$ RMESH can be computed in O(Mw) time which is a significant improvement over the $O(Mw^2)$ complexity on $N{\times}N$ mesh.

  • PDF

Digital Image based Real-time Sea Fog Removal Technique using GPU (GPU를 이용한 영상기반 고속 해무제거 기술)

  • Choi, Woon-sik;Lee, Yoon-hyuk;Seo, Young-ho;Choi, Hyun-jun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.12
    • /
    • pp.2355-2362
    • /
    • 2016
  • Seg fog removal is an important issue concerned by both computer vision and image processing. Sea fog or haze removal is widely used in lots of fields, such as automatic control system, CCTV, and image recognition. Color image dehazing techniques have been extensively studied, and expecially the dark channel prior(DCP) technique has been widely used. This paper propose a fast and efficient image prior - dark channel prior to remove seg-fog from a single digital image based on the GPU. We implement the basic parallel program and then optimize it to obtain performance acceleration with more than 250 times. While paralleling and the optimizing the algorithm, we improve some parts of the original serial program or basic parallel program according to the characteristics of several steps. The proposed GPU programming algorithm and implementation results may be used with advantages as pre-processing in many systems, such as safe navigation for ship, topographical survey, intelligent vehicles, etc.

The PALM system : Architecture and Network Performance (PALM시스템의 구조와 네트웍 성능)

  • Kim, Suk-Il
    • The Transactions of the Korea Information Processing Society
    • /
    • v.1 no.1
    • /
    • pp.105-113
    • /
    • 1994
  • This paper introduces the Parallel Advanced Loosely coupled Multiprocessor (PALM) architecture, which is based on HCH(m,p), where m is number of links per a communication processor (CP) and p is the number of application processors (APs) connected to the CP. communication links between a pair of CPs and/or between a CP and an AP, are made of dual-Port RAMs, which provide fast and reliable word-parallel communication between processors. Among the wide spectrum of HCH networks, HCH(m,2) is also known to be a cost optimal topology, such that HCH(m,2) consists of the largest number of APs retaining the minimal number of CPs and communication links. We also implement a testbed based on HCH(2,2). The experiment result shows that the small communication/computation ratio of the PALM system would realize fine-grain parallelism on message-passing MIMD systems.

  • PDF

GLSL based Additional Learning Nearest Neighbor Algorithm suitable for Locating Unpaved Road (추가 학습이 빈번히 필요한 비포장도로에서 주행로 탐색에 적합한 GLSL 기반 ALNN Algorithm)

  • Ku, Bon Woo;Kim, Jun kyum;Rhee, Eun Joo
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.1
    • /
    • pp.29-36
    • /
    • 2019
  • Unmanned Autonomous Vehicle's driving road in the national defense includes not only paved roads, but also unpaved roads which have rough and unexpected changes. This Unmanned Autonomous Vehicles monitor and recon rugged or remote areas, and defend own position, they frequently encounter environments roads of various and unpredictable. Thus, they need additional learning to drive in this environment, we propose a Additional Learning Nearest Neighbor (ALNN) which is modified from Approximate Nearest Neighbor to allow for quick learning while avoiding the 'Forgetting' problem. In addition, since the Execution speed of the ALNN algorithm decreases as the learning data accumulates, we also propose a solution to this problem using GPU parallel processing based on OpenGL Shader Language. The ALNN based on GPU algorithm can be used in the field of national defense and other similar fields, which require frequent and quick application of additional learning in real-time without affecting the existing learning data.

A Study on Distributed Processing of Big Data and User Authentication for Human-friendly Robot Service on Smartphone (인간 친화적 로봇 서비스를 위한 대용량 분산 처리 기술 및 사용자 인증에 관한 연구)

  • Choi, Okkyung;Jung, Wooyeol;Lee, Bong Gyou;Moon, Seungbin
    • Journal of Internet Computing and Services
    • /
    • v.15 no.1
    • /
    • pp.55-61
    • /
    • 2014
  • Various human-friendly robot services have been developed and mobile cloud computing is a real time computing service that allows users to rent IT resources what they want over the internet and has become the new-generation computing paradigm of information society. The enterprises and nations are actively underway of the business process using mobile cloud computing and they are aware of need for implementing mobile cloud computing to their business practice, but it has some week points such as authentication services and distributed processing technologies of big data. Sometimes it is difficult to clarify the objective of cloud computing service. In this study, the vulnerability of authentication services on mobile cloud computing is analyzed and mobile cloud computing model is constructed for efficient and safe business process. We will also be able to study how to process and analyze unstructured data in parallel to this model, so that in the future, providing customized information for individuals may be possible using unstructured data.

The Design and Implementation of OSF/1 AD3 Based-Microkernel Initialization for SPAX (SPAX를 위한 OSF/1 AD3 기반의 마이크로 커널 초기화 설계 및 구현)

  • Kim, Jeong-Nyeo;Cho, Il-Yeon;Lee, Jae-Kyung;Kim, Hae-Jin
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.5
    • /
    • pp.1333-1344
    • /
    • 1998
  • In comparison to traditional monolithic kernel, the microkernel based operating system has slower speed. But Microkernel based OS suites for multi-computer system, because It has benefits in the modularity and portability point of view. Each unit and memory of a processor must be initialized by using the boot information so that the multi-computer system OS can actively run the function of the system. This paper describes the microkernel initialization of OSF/1 AD3 MISIX that is based on OSF/1 AD3 for SPAX. It will introduce the initialization of microkernel for the SPAX which is High-speed Parallel Processing system in terms of Boot, Initialization related hardware and memory address space construction. This paper will also state the test result based on test environments. Microkernel tested in single node system that has 4 processors.

  • PDF

A Design of 4×4 Block Parallel Interpolation Motion Compensation Architecture for 4K UHD H.264/AVC Decoder (4K UHD급 H.264/AVC 복호화기를 위한 4×4 블록 병렬 보간 움직임보상기 아키텍처 설계)

  • Lee, Kyung-Ho;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.5
    • /
    • pp.102-111
    • /
    • 2013
  • In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Motion Compensation in 4K UHD($3840{\times}2160$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $9{\times}9$ reference data for interpolation, we design 2D cache buffer which consists of the $9{\times}9$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The maximum operation frequency is 150MHz. The gate count is 161Kgates. The proposed H.264/AVC Motion Compensation can support 4K UHD at 72 frames per second by running at 150MHz.

Design of a Partitionable Single-Stage Shuffle-Exchange Network (분할 가능한 단단계(Single-Stage) Shuffle-Exchange 네트워크의 설계)

  • Lee, Jae-Dong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.3_4
    • /
    • pp.130-137
    • /
    • 2003
  • This paper presents the problem of partitioning the Single-Stage Shuffle-Exchange Network(SSEN). An algorithm, named SSEN_to_PSEN, is devised to transform an SSEN into a Partitionable Shuffle-Exchange Network (PSEN). The proposed algorithm presents that the SSEN can be partitioned into independent sub-networks without additional links for N $\leq$ 8. Additional links are needed in order to partition an SSEN, but only when N $\geq$ 16. The running time of the algorithm SSEN_to_PSEN is $\theta$(NlogN). By comparing with a hypercube network, the PSEN is less expensive than a hypercube network even when some additional links are added. By partitioning, a large PSEN in a massively parallel machine can compute various problems for multiple users simultaneously, thereby the processing efficiency of the machine is improved.

The optimization of output coupler reflectivity of high repetitive pulsed Nd:YAG laser system adopted 3-mesh parallel sequential charge and discharge method (3단 병렬 충.방전 방식을 적용한 고반복 펄스형 Nd:YAG 레이저 출력거울 반사율의 최적화)

  • 김휘영;홍수열;김동수
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.3
    • /
    • pp.369-376
    • /
    • 2001
  • The optimization of resonator and laser power supply has been considered to be significant for improving the efficiency of a pulsed Nd:YAG laser system. We have proposed a new method of 3-mesh parallel sequential charge and discharge circuit as a laser power supply; more compact than conventional power supply, competitive in price, easy to control the laser power density according to various material processing, and equipped with the optimum reflectivity of output coupler. In this study, we could find that the maximum laser output was obtained by using 85% of reflectivity in the case of 50[W]-class. In addition using the power supply of new method, it's possible to charge each capacitor bank with a higher energy within the given charging time adopted a new method mentioned above; namely, we can allow each capacitor to have much more charging time and storage energy. So, higher laser output was obtained than conventional power supply.

  • PDF

Embedding Multiple Meshes into a Crossed Cube (다중 메쉬의 교차큐브에 대한 임베딩)

  • Kim, Sook-Yeon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.5
    • /
    • pp.335-343
    • /
    • 2009
  • The crossed cube has received great attention because it has equal or superior properties compared to the hypercube that is widely known as a versatile parallel processing system. It has been known that disjoint two copies of a mesh of size $4{\times}2^m$ or disjoint four copies of a mesh of size $8{\times}2^m$ can be embedded into a crossed cube with dilation 1 and expansion 1 [Dong, Yang, Zhao, and Tang, 2008]. However, it is not known that disjoint multiple copies of a mesh with more than eight rows and columns can be embedded into a crossed cube with dilation 1 and expansion 1. In this paper, we show that disjoint $2^{n-1}$ copies of a mesh of size $2^n{\times}2^m$ can be embedded into a crossed cube with dilation 1 and expansion 1 where $n{\geq}1$ and $m{\geq}3$. Our result is optimal in terms of dilation and expansion that are important measures of graph embedding. In addition, our result is practically usable in allocating multiple jobs of mesh structure on a parallel computer of crossed cube structure.