• Title/Summary/Keyword: parallel computer processing

Search Result 652, Processing Time 0.023 seconds

Distributed Parallel Computing Environment for Java (자바를 위한 분산된 병렬 컴퓨팅 환경)

  • 이상윤;김승호
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.6
    • /
    • pp.23-37
    • /
    • 2004
  • Since java thread is an object which is treated as independent process within one execution space in the multiprocessing environment, we can use it for independent process of parallel processing. Using thread and synchronization mechanism of java enables us to write parallel application program easily. Therefore, a lot of results are exist which is apply the feature of java that support parallel processing to the distributed computing environment. In this paper, we introduce a system of environment that support parallel execution of thread which is included in legacy java program. The system named TORB(Transparent Object Request Broker) enables us parallel execution of legacy java program after simple converting process, since it support the feature of programming transparency. TORB is extended version of distributed programming tool that is published by our research team. And it had only typical distributed processing feature that is execute a specified function at the specified computer.

A Multi-Scale Parallel Convolutional Neural Network Based Intelligent Human Identification Using Face Information

  • Li, Chen;Liang, Mengti;Song, Wei;Xiao, Ke
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1494-1507
    • /
    • 2018
  • Intelligent human identification using face information has been the research hotspot ranging from Internet of Things (IoT) application, intelligent self-service bank, intelligent surveillance to public safety and intelligent access control. Since 2D face images are usually captured from a long distance in an unconstrained environment, to fully exploit this advantage and make human recognition appropriate for wider intelligent applications with higher security and convenience, the key difficulties here include gray scale change caused by illumination variance, occlusion caused by glasses, hair or scarf, self-occlusion and deformation caused by pose or expression variation. To conquer these, many solutions have been proposed. However, most of them only improve recognition performance under one influence factor, which still cannot meet the real face recognition scenario. In this paper we propose a multi-scale parallel convolutional neural network architecture to extract deep robust facial features with high discriminative ability. Abundant experiments are conducted on CMU-PIE, extended FERET and AR database. And the experiment results show that the proposed algorithm exhibits excellent discriminative ability compared with other existing algorithms.

A Study of Printed Score Recognition and its Parallel Algorithm (인쇄 악보의 인식과 병렬 알고리즘에 관한 연구)

  • 황영길;김성천
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.5
    • /
    • pp.959-970
    • /
    • 1994
  • In this thesis, a printed score is read by using handy scanner and the recognition process is excuted in parallel, finally, on Mesh-Connected Computer. What is read is classified into certain patterns and is recognized, based on knowledge. The preprocessing steps are minimized and simple operations are used in the algorithm proposed in this thesis. The score symbols on a printed score can be recognized irrespective of their sizes but their diversity males it difficult to recognize them all, so it is programmed so as to recognize some symbols that is used necessarily and frequently. The recognized result is transformed into the MIDI standard file format. It is required to use a parallel processing system with multiprocessors because the high speed image processing is required. A digitized two-dimensional image is appropriate in processing on the SIMD Mesh-Connected Computer(MCC). Therefore, we explain this architecture and present parallel algorithm using SIMD MCC with n processors that achieves time complexity0(n).

  • PDF

A Low Memory Bandwidth Motion Estimation Core for H.264/AVC Encoder Based on Parallel Current MB Processing (병렬처리 기반의 H.264/AVC 인코더를 위한 저 메모리 대역폭 움직임 예측 코어설계)

  • Kim, Shi-Hye;Choi, Jun-Rim
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.48 no.2
    • /
    • pp.28-34
    • /
    • 2011
  • In this paper, we present integer and fractional motion estimation IP for H.264/AVC encoder by hardware-oriented algorithm. In integer motion engine, the reference block is used to share for consecutive current macro blocks in parallel processing which exploits data reusability and reduces off-chip bandwidth. In fractional motion engine, instead of two-step sequential refinement, half and quarter pel are processed in parallel manner in order to discard unnecessary candidate positions and double throughput. The H.264/AVC motion estimation chip is fabricated on a MPW(Multi-Project Wafer) chip using the chartered $0.18{\mu}m$ standard CMOS 1P5M technology and achieves high throughput supporting HDTV 720p 30 fps.

Parallel LDPC Decoding on a Heterogeneous Platform using OpenCL

  • Hong, Jung-Hyun;Park, Joo-Yul;Chung, Ki-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2648-2668
    • /
    • 2016
  • Modern mobile devices are equipped with various accelerated processing units to handle computationally intensive applications; therefore, Open Computing Language (OpenCL) has been proposed to fully take advantage of the computational power in heterogeneous systems. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes on an embedded heterogeneous platform using an OpenCL framework. The LDPC code is one of the most popular and strongest error correcting codes for mobile communication systems. Each step of LDPC decoding has different parallelization characteristics. In the proposed LDPC decoder, steps suitable for task-level parallelization are executed on the multi-core central processing unit (CPU), and steps suitable for data-level parallelization are processed by the graphics processing unit (GPU). To improve the performance of OpenCL kernels for LDPC decoding operations, explicit thread scheduling, vectorization, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance and high power efficiency by using heterogeneous multi-core processors on a unified computing framework.

Fault Tolerant Static Shuffle-Exchange Network (결함 포용 정적 Shuffle-Exchange 네트워크)

  • Choi Hong In
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.3_4
    • /
    • pp.160-167
    • /
    • 2003
  • A static shuffle-exchange network is not only useful for several parallel applications but also use less hardware than the popular multi-stage network or hypercube. Even though it has a lot of advantages, it has never been used in any implemented parallel machine. One of the reasons is there has not been any techniques to make the network fault-tolerant. In this paper multiple fault-tolerant static shuffle-exchange networks are presented. In order to recover from k faulty processing elements, a network needs at least 2 k additional processing elements and at most 4 k additional shuffle ports for each processing elements. By decomposing the k fault-tolerant static shuffle-exchange network into m identical modules, this paper shows that the reliability of the network can be increased.

Design of Line Scratch Detection and Restoration Algorithm using GPU (GPU를 이용한 선형 스크래치 탐지와 복원 알고리즘의 설계)

  • Lee, Joon-Goo;Shim, She-Yong;You, Byoung-Moon;Hwang, Doo-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.4
    • /
    • pp.9-16
    • /
    • 2014
  • This paper proposes a linear scratch detection and restoration algorithm using pixel data comparison in a single frame or consecutive frames. There exists a high parallelism in that a scratch detection and restoration algorithm needs a large amount of comparison operations. The proposed scratch detection and restoration algorithm is designed with a GPU for fast computation. We test the proposed algorithm in sequential and parallel processing with the set of digital videos in National Archive of Korea. In the experiments, the scratch detection rate of consecutive frames is as fast as about 20% for that of a single frame. The detection and restoration rates of a GPU-based algorithm are similar to those of a CPU-based algorithm, but the parallel implementation speeds up to about 50 times.

Parallel Algorithm for Optimal Stack Filters on MCC and CCC (MCC 및 CCC에서의 최적 스택 필터를 위한 병렬 알고리즘)

  • Jeon, Byeong-Mun;Jeong, Chang-Seong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.10
    • /
    • pp.1185-1193
    • /
    • 1999
  • 최적 스택 필터는 시그널 또는 영상의 임의의 특성 정보를 보존하고자 하는 요구조건에 의해 강제된 구조적 제약 하에서 최대의 잡음제거 효과를 얻을 수 있다. 그리고 임계치 분할 특성과 양의 부울 함수에 기반한 이진 영역에서의 처리 특성은 이 필터가 높은 병렬성을 갖고 있음을 보여준다. 본 논문에서는 두 개의 병렬 계산 모델 MCC(Mesh-Connected Computer)와 CCC(Cube-Connected Computer)에서 최적 스택 필터를 위한 1차원 병렬 알고리즘을 개발한다. 최적 스택 필터의 실행 시간은 주로 이진 median 연산에 의해 결정되고 본 논문에서 제안된 알고리즘은 선형 분리성에 의해 이 연산을 구현한다. 이를 바탕으로, M 레벨의 1-D 시그널의 길이가 L이고 윈도우 폭이 N이라고 가정할 때, 제안된 알고리즘은 {{{{root M times root M`` MCC에서 O(L sqrt{M}`) 시간에 그리고 M 개의 PE를 갖는 CCC에서 O(L log M)시간에 수행될 수 있다. 또한 잡음을 더욱 효과적으로 제거하기 위해 윈도우 폭 N을 증가시킬 때, 제안된 병렬 알고리즘의 계산 시간은 일정하게 유지됨을 보인다.Abstract An optimal stack filter achieves the maximum noise attenuation under the structural constraints imposed by the requirement of preserving certain signal or image features. And the filter provides a high parallelism due to the principles of threshold decomposition and binary processing based on positive Boolean functions(PBFs). In this paper, we develop an one-dimensional parallel algorithm for the optimal stack filter on two parallel computation models, MCC(Mesh-Connected Computer) and CCC(Cube-Connected Computer). The running time of the optimal stack filter depends mainly on the binary median operation and our algorithm realizes this operation by the linear separability. Based on this scheme, our parallel algorithm can be performed in {{{{O(L sqrt{M}`) MCC and inO(L log M) time on CCC with M PEs, when the length of M``-valued 1-D signal is L`` and window width is N`` Also, we show that the computation time of our parallel algorithm keeps constant when the window width N increases in order to achieve the best noise attenuation.

PARALLEL IMAGE RECONSTRUCTION FOR NEW VACUUM SOLAR TELESCOPE

  • Li, Xue-Bao;Wang, Feng;Xiang, Yong Yuan;Zheng, Yan Fang;Liu, Ying Bo;Deng, Hui;Ji, Kai Fan
    • Journal of The Korean Astronomical Society
    • /
    • v.47 no.2
    • /
    • pp.43-47
    • /
    • 2014
  • Many advanced ground-based solar telescopes improve the spatial resolution of observation images using an adaptive optics (AO) system. As any AO correction remains only partial, it is necessary to use post-processing image reconstruction techniques such as speckle masking or shift-and-add (SAA) to reconstruct a high-spatial-resolution image from atmospherically degraded solar images. In the New Vacuum Solar Telescope (NVST), the spatial resolution in solar images is improved by frame selection and SAA. In order to overcome the burden of massive speckle data processing, we investigate the possibility of using the speckle reconstruction program in a real-time application at the telescope site. The code has been written in the C programming language and optimized for parallel processing in a multi-processor environment. We analyze the scalability of the code to identify possible bottlenecks, and we conclude that the presented code is capable of being run in real-time reconstruction applications at NVST and future large aperture solar telescopes if care is taken that the multi-processor environment has low latencies between the computation nodes.

An Iterative Algorithm for the Bottom Up Computation of the Data Cube using MapReduce (맵리듀스를 이용한 데이터 큐브의 상향식 계산을 위한 반복적 알고리즘)

  • Lee, Suan;Jo, Sunhwa;Kim, Jinho
    • Journal of Information Technology and Architecture
    • /
    • v.9 no.4
    • /
    • pp.455-464
    • /
    • 2012
  • Due to the recent data explosion, methods which can meet the requirement of large data analysis has been studying. This paper proposes MRIterativeBUC algorithm which enables efficient computation of large data cube by distributed parallel processing with MapReduce framework. MRIterativeBUC algorithm is developed for efficient iterative operation of the BUC method with MapReduce, and overcomes the limitations about the storage size and processing ability caused by large data cube computation. It employs the idea from the iceberg cube which computes only the interesting aspect of analysts and the distributed parallel process of cube computation by partitioning and sorting. Thus, it reduces data emission so that it can reduce network overload, processing amount on each node, and eventually the cube computation cost. The bottom-up cube computation and iterative algorithm using MapReduce, proposed in this paper, can be expanded in various way, and will make full use of many applications.