• Title/Summary/Keyword: OpenMP 구현

Search Result 33, Processing Time 0.024 seconds

An Optimized GPU based Filtered Backprojection method (범용 그래픽스 하드웨어 기반 여과후 역투사 최적화 기법에 관한 연구)

  • Park, Jong-Hyun;Lee, Byeong-Hun;Lee, Ho;Shin, Yeong-Gil
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.436-442
    • /
    • 2009
  • Tomography images reconstructed from conebeam CT make it possible to observe inside of the projected object without any damage, and so it has been widely used in the industrial and medical fields. Recent advanced imaging equipment can produce high-resolution CT images. However, it takes much time to reconstruct the obtained large dataset. To reduce the time to reconstruct CT images, we propose an accelerating method using GPU (graphics processing unit). Reconstruction consists of mainly two parts, filtering and back-projection. In filtering phase, we applied 4ch image compression method and in back-projection phase, computation reduction method using depth test is applied. The experimental results show that the proposed method accelerates the speed 50 times than the CPU-based program optimized with OpenMP by utilizing the high-computing power of parallelized GPU.

  • PDF

The Implementation of Fast Object Recognition Using Parallel Processing on CPU and GPU (CPU와 GPU의 병렬 처리를 이용한 고속 물체 인식 알고리즘 구현)

  • Kim, Jun-Chul;Jung, Young-Han;Park, Eun-Soo;Cui, Xue-Nan;Kim, Hak-Il;Huh, Uk-Youl
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.15 no.5
    • /
    • pp.488-495
    • /
    • 2009
  • This paper presents a fast feature extraction method for autonomous mobile robots utilizing parallel processing and based on OpenMP, SSE (Streaming SIMD Extension) and CUDA programming. In the first step on CPU version, the algorithms and codes are optimized and then implemented by parallel processing. The parallel algorithms are debugged to maintain the same level of performance and the process for extracting key points and obtaining dominant orientation with respect to key points is parallelized. After extraction, a parallel descriptor via SSE instructions is constructed. And the GPU version also implemented by parallel processing using CUDA based on the SIFT. The GPU-Parallel descriptor achieves an acceleration up to five times compared with the CPU-Parallel descriptor, but it shows the lower performance than CPU version. CPU version also speed-up the four and half times compared with the original SIFT while maintaining robust performance.

Benchmarking on High-speed Image Processing Techniques based on Multi-processor (멀티프로세서 기반의 고속 영상처리 기술에 대한 벤치마킹)

  • Cui, Xue-Nan;Park, Eun-Soo;Kim, Jun-Chul;Kim, Hak-Il
    • Proceedings of the KIEE Conference
    • /
    • 2007.10a
    • /
    • pp.111-112
    • /
    • 2007
  • 본 논문에서는 멀티프로세서 기반의 고속 영상처리 알고리즘 개발방법에 대해 소개한다. 영상획득 방식의 발전과 더불어 고해상도 영상의 획득이 가능해지고 영상이 컬러화가 되면서 많은 영상처리 응용분야에서 알고리즘 고속화를 필요로 하고 있다. 이러한 수요를 만족시키기 위해서는 최근에 출시되고 있는 멀티프로세서를 최대한 활용할 수 있는 알고리즘 개발이 최우선이다. 본 논문에서는 OpenMP, MIL(Matrox Image Library), OpenCV, IPP(Integrated Performance Primitives), SSE (Streaming SIMD (Single Instruction Multiple Data) Extensions)등 병렬처리와 고속 영상처리 라이브러리를 이용한 알고리즘 개발방법에 대해 소개하고, 각 개발방법에 따른 알고리즘 성능을 분석 및 평가하였다. 실험결과로부터 SSE와 IPP, MIL(Thread)을 이용하여 Mean, Dilation, Erosion, Open, Closing, Sobel등 알고리즘을 구현하여 $4057{\times}4048$크기의 영상에 적용하였을 때 $7{\sim}35msec$의 좋은 성능을 나타내어 기타 방식보다 우수함을 알 수 있었다.

  • PDF

Implementation of Real time based Multi-object recognition algorithm (실시간 다중 객체인식 알고리즘 구현)

  • Park, Tae-Ryong
    • Journal of IKEEE
    • /
    • v.17 no.1
    • /
    • pp.51-56
    • /
    • 2013
  • This thesis propose a improved matching method for implementing an ORB algorithm based multi-object recognition. SURF algorithm that is well known in the object recognition algorithms is robust in object recognition. However, there is a disadvantage for real time operation because, SURF implemention requires a lot of computation. Therefore we propose a modified ORB algorithm which shows the result of almost 70% speed improvement by improving matching part to recognize multi object on real time.

Implementation of Integrated CPU-GPU for Efficient Uniform Memory Access Method and Verification System (CPU-GPU간 긴밀성을 위한 효율적인 공유메모리 접근 방법과 검증 시스템 구현)

  • Park, Hyun-moon;Kwon, Jinsan;Hwang, Tae-ho;Kim, Dong-Sun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.2
    • /
    • pp.57-65
    • /
    • 2016
  • In this paper, we propose a system for efficient use of shared memory between CPU and GPU. The system, called Fusion Architecture, assures consistency of the shared memory and minimizes cache misses that frequently occurs on Heterogeneous System Architecture or Unified Virtual Memory based systems. It also maximizes the performance for memory intensive jobs by efficient allocation of GPU cores. To test between architectures on various scenarios, we introduce the Fusion Architecture Analyzer, which compares OpenMP, OpenCL, CUDA, and the proposed architecture in terms of memory overhead and process time. As a result, Proposed fusion architectures show that the Fusion Architecture runs benchmarks 55% faster and reduces memory overheads by 220% in average.

The Survey of Parallel Programming Techniques for Developing Optimized Software in Multi-core System (멀티코어 시스템에서 최적화된 소프트웨어 개발을 위한 병렬처리 프로그래밍 기법 조사)

  • Lee, Ki-Hong;Kim, Jee-Hong;Eom, Young-Ik
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.36-38
    • /
    • 2012
  • 이제 멀티코어 CPU가 보편화 되었지만 대다수의 프로그래밍 언어가 단일 코어를 대상으로 발전되었기 때문에 병렬화에 어려움이 있다. 이를 해결하고자 병렬처리 기법들이 연구되고 있지만 오히려 개발자는 여러 기법들 사이에서 혼란스러울 수 있다. 본 논문에서는 개발자들이 처한 상황에서 적절한 기법을 선택하는데 도움이 되고자 주요 병렬처리 기법인 OpenMP, Threading Building Blocks, Cilk Plus, Parallel Patterns Library를 비교 및 평가하였다. 각 기법마다 지원 기능, 지원 방식, 스케줄링 기법 등 개발자가 프로그램을 개발함에 있어 고려해야 할 특징들이 서로 다르고 각기 장단점이 존재한다. 따라서 병렬처리 기법을 선택하고 구현함에 있어 특정한 하나의 기법에 의존하는 것보다는 여러 기법들의 특성을 파악하여 상황에 맞는 기법을 선택한다면 보다 효율적이면서도 쉽게 병렬처리를 구현할 수 있다.

Smart Alarm Clock using Weather Information and Arduino (날씨 정보와 아두이노를 이용한 스마트 알람 시계)

  • Heo, Gyeongyong;Kim, Koang Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.8
    • /
    • pp.889-895
    • /
    • 2019
  • It is not easy to keep time promises in the complex daily lives. Especially, the increase in the number of vehicles causes traffic congestion in commuting time, which results in the delayed arrival and varies greatly depending on the weather conditions. In this paper, proposed is a smart alarm clock that automatically adjusts the alarm time according to weather conditions and suggests ways to deal with traffic congestion. The proposed smart alarm clock is designed to operate the functions of a normal alarm clock using touch functionality. In addition, it is designed to find weather information using open API and to automatically change alarm time to prepare for expected time delay. The proposed design was implemented based on Arduino Mega2560 and a touch TFT-LCD. WiFi module for internet connection, RTC module for clock function and MP3 player module for alarm sound playback were used together. The proposed design has been filed as a patent and is currently under review.

Improvement of Processing Speed for UAV Attitude Information Estimation Using ROI and Parallel Processing

  • Ha, Seok-Wun;Park, Myeong-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.155-161
    • /
    • 2021
  • Recently, researches for military purposes such as precision tracking and mission completion using UAVs have been actively conducted. In particular, if the posture information of the leading UAV is estimated and the mission UAV uses this information to follow in stealth and complete its mission, the speed of the posture information estimation of the guide UAV must be processed in real time. Until recently, research has been conducted to accurately estimate the posture information of the leading UAV using image processing and Kalman filters, but there has been a problem in processing speed due to the sequential processing of the processing process. Therefore, in this study we propose a way to improve processing speed by applying methods that the image processing area is limited to the ROI area including the object, not the entire area, and the continuous processing is distributed to OpenMP-based multi-threads and processed in parallel with thread synchronization to estimate attitude information. Based on the experimental results, it was confirmed that real-time processing is possible by improving the processing speed by more than 45% compared to the basic processing, and thus the possibility of completing the mission can be increased by improving the tracking and estimating speed of the mission UAV.

A DSP Platform for the HD Multimedia Streaming (HD급 멀티미디어 Streaming을 위한 DSP Platform)

  • Hong, Keun-Pyo;Moon, Jae-Pil;Park, Jong-Son;Kim, Dong-Hwan;Chang, Tae-Gyu
    • Proceedings of the KIEE Conference
    • /
    • 2005.10b
    • /
    • pp.409-411
    • /
    • 2005
  • 본 논문에서는 HD급 멀티미디어 streaming을 처리할 수 있는 DSP 플랫폼을 개발하였다. DSP 플랫폼은 Tl사의 C6400계열 DSP를 사용하였고 다채널의 오디오와 HD급 화질의 비디오_ 데이터를 처리할 수 있다. DSP가 decoder의 기능을 부담함으로써 하드웨어의 재구성이 용이하며 코덱을 다운로드하기 때문에 유연한 멀티미디어 컨텐츠의 재생이 가능하다. 개발한 DSP 플랫폼을 호스트 PC에 설치하여 PC로부터 DSP Configuration 파일과 멀티미디어 스트리밍 데이터를 전송받는 구조를 가진다. 소프트웨어는 실시간으로 demux를 실행하여 오디와 비디오_ 데이터를 분리하석 DSP 플랫폼의 외부메모리에 저장하고 동시에 비디오와 오디오의 디코딩을 실행한다. 오디오와 비디오 데이터의 버퍼 언더런/오버런을 극할 수 있는 buffer control 기법을 적용하였다. 호스트 PC에서 DSP 플랫폼으로의 스트리밍을 하기 위하여 Open Architecture 기반의 Windows OS에서 스트리밍 서비스 프로그램을 구현 하였다. 마지막으로 MPEG-2 video MP@ML인 비디오 코덱과 5.1ch 48kHz AC3인 오디오 코덱으 구성된 streaming 데이터를 사용하여 DSP 플랫폼을 검증하였다.

  • PDF

A Study on Parallel Performance Optimization Method for Acceleration of High Resolution SAR Image Processing (고해상도 SAR 영상처리 고속화를 위한 병렬 성능 최적화 기법 연구)

  • Lee, Kyu Beom;Kim, Gyu Bin;An, Sol Bo Reum;Cho, Jin Yeon;Lim, Byoung-Gyun;Kim, Dong-Hyun;Kim, Jeong Ho
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.46 no.6
    • /
    • pp.503-512
    • /
    • 2018
  • SAR(Synthetic Aperture Radar) is a technology to acquire images by processing signals obtained from radar, and there is an increasing demand for utilization of high-resolution SAR images. In this paper, for high-speed processing of high-resolution SAR image data, a study for SAR image processing algorithms to achieve optimal performance in multi-core based computer architecture is performed. The performance deterioration due to a large amount of input/output data for high resolution images is reduced by maximizing the memory utilization, and the parallelization ratio of the code is increased by using dynamic scheduling and nested parallelism of OpenMP. As a result, not only the total computation time is reduced, but also the upper bound of parallel performance is increased and the actual parallel performance on a multi-core system with 10 cores is improved by more than 8 times. The result of this study is expected to be used effectively in the development of high-resolution SAR image processing software for multi-core systems with large memory.