• Title/Summary/Keyword: 병렬 연산 처리

Search Result 554, Processing Time 0.024 seconds

Design and Performance Analysis of a Parallel Optimal Branch-and-Bound Algorithm for MIN-based Multiprocessors (MIN-based 다중 처리 시스템을 위한 효율적인 병렬 Branch-and-Bound 알고리즘 설계 및 성능 분석)

  • Yang, Myung-Kook
    • Journal of IKEEE
    • /
    • v.1 no.1 s.1
    • /
    • pp.31-46
    • /
    • 1997
  • In this paper, a parallel Optimal Best-First search Branch-and-Bound(B&B) algorithm(pobs) is designed and evaluated for MIN-based multiprocessor systems. The proposed algorithm decomposes a problem into G subproblems, where each subproblem is processed on a group of P processors. Each processor group uses tile sub-Global Best-First search technique to find a local solution. The local solutions are broadcasted through the network to compute the global solution. This broadcast provides not only the comparison of G local solutions but also the load balancing among the processor groups. A performance analysis is then conducted to estimate the speed-up of the proposed parallel B&B algorithm. The analytical model is developed based on the probabilistic properties of the B&B algorithm. It considers both the computation time and communication overheads to evaluate the realistic performance of the algorithm under the parallel processing environment. In order to validate the proposed evaluation model, the simulation of the parallel B&B algorithm on a MIN-based system is carried out at the same time. The results from both analysis and simulation match closely. It is also shown that the proposed Optimal Best-First search B&B algorithm performs better than other reported schemes with its various advantageous features such as: less subproblem evaluations, prefer load balancing, and limited scope of remote communication.

  • PDF

Parallel Processing of Satellite Images using CUDA Library: Focused on NDVI Calculation (CUDA 라이브러리를 이용한 위성영상 병렬처리 : NDVI 연산을 중심으로)

  • LEE, Kang-Hun;JO, Myung-Hee;LEE, Won-Hee
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.19 no.3
    • /
    • pp.29-42
    • /
    • 2016
  • Remote sensing allows acquisition of information across a large area without contacting objects, and has thus been rapidly developed by application to different areas. Thus, with the development of remote sensing, satellites are able to rapidly advance in terms of their image resolution. As a result, satellites that use remote sensing have been applied to conduct research across many areas of the world. However, while research on remote sensing is being implemented across various areas, research on data processing is presently insufficient; that is, as satellite resources are further developed, data processing continues to lag behind. Accordingly, this paper discusses plans to maximize the performance of satellite image processing by utilizing the CUDA(Compute Unified Device Architecture) Library of NVIDIA, a parallel processing technique. The discussion in this paper proceeds as follows. First, standard KOMPSAT(Korea Multi-Purpose Satellite) images of various sizes are subdivided into five types. NDVI(Normalized Difference Vegetation Index) is implemented to the subdivided images. Next, ArcMap and the two techniques, each based on CPU or GPU, are used to implement NDVI. The histograms of each image are then compared after each implementation to analyze the different processing speeds when using CPU and GPU. The results indicate that both the CPU version and GPU version images are equal with the ArcMap images, and after the histogram comparison, the NDVI code was correctly implemented. In terms of the processing speed, GPU showed 5 times faster results than CPU. Accordingly, this research shows that a parallel processing technique using CUDA Library can enhance the data processing speed of satellites images, and that this data processing benefits from multiple advanced remote sensing techniques as compared to a simple pixel computation like NDVI.

A Code-level Parallelization Methodology to Enhance Interactivity of Smartphone Entertainment Applications (스마트폰 엔터테인먼트 애플리케이션의 상호작용성 개선을 위한 코드 수준 병렬화 방법론)

  • Kim, Byung-Cheol
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.381-390
    • /
    • 2015
  • One of the fundamental requirements of entertainment applications is interactivity with users. The mobile device such as the smartphone, however, does not guarantee it due to the limit of the application processor's computing power, memory size and available electric power of the battery. This paper proposes a methodology to boost responsiveness of interactive applications by taking advantage of the parallel architecture of mobile devices which, for instance, have dual-core, quad-core or octa-core. To harness the multi-core architecture, it exploits the POSIX thread, a platform-independent thread library to be able to be used in various mobile platforms such as Android, iOS, etc. As a useful application example of the methodology, a heavy matrix calculation function was transformed to a parallelized version which showed around 2.5 ~ 3 times faster than the original version in a real-world usage environment.

A Parallel Algorithm for Large DOF Structural Analysis Problems (대규모 자유도 문제의 구조해석을 위한 병렬 알고리즘)

  • Kim, Min-Seok;Lee, Jee-Ho
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.23 no.5
    • /
    • pp.475-482
    • /
    • 2010
  • In this paper, an efficient two-level parallel domain decomposition algorithm is suggested to solve large-DOF structural problems. Each subdomain is composed of the coarse problem and local problem. In the coarse problem, displacements at coarse nodes are computed by the iterative method that does not need to assemble a stiffness matrix for the whole coarse problem. Then displacements at local nodes are computed by Multi-Frontal Sparse Solver. A parallel version of PCG(Preconditioned Conjugate Gradient Method) is developed to solve the coarse problem iteratively, which minimizes the data communication amount between processors to increase the possible problem DOF size while maintaining the computational efficiency. The test results show that the suggested algorithm provides scalability on computing performance and an efficient approach to solve large-DOF structural problems.

Preliminary Performance Testing of Geo-spatial Image Parallel Processing in the Mobile Cloud Computing Service (모바일 클라우드 컴퓨팅 서비스를 위한 위성영상 병렬 정보처리 성능 예비실험)

  • Kang, Sang-Goo;Lee, Ki-Won;Kim, Yong-Seung
    • Korean Journal of Remote Sensing
    • /
    • v.28 no.4
    • /
    • pp.467-475
    • /
    • 2012
  • Cloud computing services are known that they have many advantages from the point of view in economic saving, scalability, security, sharing and accessibility. So their applications are extending from simple office systems to the expert system for scientific computing. However, research or computing technology development in the geo-spatial fields including remote sensing applications are the beginning stage. In this work, the previously implemented smartphone app for image processing was first migrated to mobile cloud computing linked to Amazon web services. As well, parallel programming was applied for improving operation performance. Industrial needs and technology development cases in terms of mobile cloud computing services are being increased. Thus, a performance testing on a satellite image processing module was carried out as the main purpose of this study. Types of implementation or services for mobile cloud varies. As the result of this testing study in a given condition, the performance of cloud computing server was higher than that of the single server without cloud service. This work is a preliminary case study for the further linkage approach for mobile cloud and satellite image processing.

Design of Low Complexity and High Throughput Encoder for Structured LDPC Codes (구조적 LDPC 부호의 저복잡도 및 고속 부호화기 설계)

  • Jung, Yong-Min;Jung, Yun-Ho;Kim, Jae-Seok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.10
    • /
    • pp.61-69
    • /
    • 2009
  • This paper presents the design results of a low complexity and high throughput LDPC encoder structure. In order to solve the high complexity problem of the LDPC encoder, a simplified matrix-vector multiplier is proposed instead of the conventional complex matrix-vector multiplier. The proposed encoder also adopts a partially parallel structure and performs column-wise operations in matrix-vector multiplication to achieve high throughput. Implementation results show that the proposed architecture reduces the number of logic gates and memory elements by 37.4% and 56.7%, compared with existing five-stage pipelined architecture. The proposed encoder also supports 800Mbps throughput at 40MHz clock frequency which is improved about three times more than the existing architecture.

High Performance Coprocessor Architecture for Real-Time Dense Disparity Map (실시간 Dense Disparity Map 추출을 위한 고성능 가속기 구조 설계)

  • Kim, Cheong-Ghil;Srini, Vason P.;Kim, Shin-Dug
    • The KIPS Transactions:PartA
    • /
    • v.14A no.5
    • /
    • pp.301-308
    • /
    • 2007
  • This paper proposes high performance coprocessor architecture for real time dense disparity computation based on a phase-based binocular stereo matching technique called local weighted phase-correlation(LWPC). The algorithm combines the robustness of wavelet based phase difference methods and the basic control strategy of phase correlation methods, which consists of 4 stages. For parallel and efficient hardware implementation, the proposed architecture employs SIMD(Single Instruction Multiple Data Stream) architecture for each functional stage and all stages work on pipelined mode. Such that the newly devised pipelined linear array processor is optimized for the case of row-column image processing eliminating the need for transposed memory while preserving generality and high throughput. The proposed architecture is implemented with Xilinx HDL tool and the required hardware resources are calculated in terms of look up tables, flip flops, slices, and the amount of memory. The result shows the possibility that the proposed architecture can be integrated into one chip while maintaining the processing speed at video rate.

Design of Lightweight Artificial Intelligence System for Multimodal Signal Processing (멀티모달 신호처리를 위한 경량 인공지능 시스템 설계)

  • Kim, Byung-Soo;Lee, Jea-Hack;Hwang, Tae-Ho;Kim, Dong-Sun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.5
    • /
    • pp.1037-1042
    • /
    • 2018
  • The neuromorphic technology has been researched for decades, which learns and processes the information by imitating the human brain. The hardware implementations of neuromorphic systems are configured with highly parallel processing structures and a number of simple computational units. It can achieve high processing speed, low power consumption, and low hardware complexity. Recently, the interests of the neuromorphic technology for low power and small embedded systems have been increasing rapidly. To implement low-complexity hardware, it is necessary to reduce input data dimension without accuracy loss. This paper proposed a low-complexity artificial intelligent engine which consists of parallel neuron engines and a feature extractor. A artificial intelligent engine has a number of neuron engines and its controller to process multimodal sensor data. We verified the performance of the proposed neuron engine including the designed artificial intelligent engines, the feature extractor, and a Micro Controller Unit(MCU).

A Fast Processor Architecture and 2-D Data Scheduling Method to Implement the Lifting Scheme 2-D Discrete Wavelet Transform (리프팅 스킴의 2차원 이산 웨이브릿 변환 하드웨어 구현을 위한 고속 프로세서 구조 및 2차원 데이터 스케줄링 방법)

  • Kim Jong Woog;Chong Jong Wha
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.4 s.334
    • /
    • pp.19-28
    • /
    • 2005
  • In this paper, we proposed a parallel fast 2-D discrete wavelet transform hardware architecture based on lifting scheme. The proposed architecture improved the 2-D processing speed, and reduced internal memory buffer size. The previous lifting scheme based parallel 2-D wavelet transform architectures were consisted with row direction and column direction modules, which were pair of prediction and update filter module. In 2-D wavelet transform, column direction processing used the row direction results, which were not generated in column direction order but in row direction order, so most hardware architecture need internal buffer memory. The proposed architecture focused on the reducing of the internal memory buffer size and the total calculation time. Reducing the total calculation time, we proposed a 4-way data flow scheduling and memory based parallel hardware architecture. The 4-way data flow scheduling can increase the row direction parallel performance, and reduced the initial latency of starting of the row direction calculation. In this hardware architecture, the internal buffer memory didn't used to store the results of the row direction calculation, while it contained intermediate values of column direction calculation. This method is very effective in column direction processing, because the input data of column direction were not generated in column direction order The proposed architecture was implemented with VHDL and Altera Stratix device. The implementation results showed overall calculation time reduced from $N^2/2+\alpha$ to $N^2/4+\beta$, and internal buffer memory size reduced by around $50\%$ of previous works.

Design of Web-based Parallel Processing System using Performance-based Task Allocation (성능 기반 태스크 할당을 이용한 웹 기반 병렬처리 시스템의 설계)

  • Han, Youn-Hee;Park, Chan-Yeol;Jeong, Young-Sik;Hwang, Chong-Sun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.3
    • /
    • pp.264-276
    • /
    • 2000
  • Recent advances of technologies make easy sharing various information and utilizing system resources on the Internet. Especially, code migration using applets of Java supports the distribution of programs on the web environment, and also browsers executing the applets guarantee the reliability of a migrated codes. In this paper, we describe the design and implementation of a web-based parallel processing system, which distributes migratable codes of a large job, makes the distributed codes to execute in parallel, and controls and gathers the results of each execution. The hosts participate in the computation reside on the Internet, spreaded out geographically, and the heterogeneity and the variability among them are severe. Thus, task allocation considering the performance differences and the adaptability to the severe variability are necessary. We present an adaptive task allocation algorithm applied to our system and the performance evaluation.

  • PDF