• Title/Summary/Keyword: 병렬 연산 처리

Search Result 554, Processing Time 0.024 seconds

Digital Image based Real-time Sea Fog Removal Technique using GPU (GPU를 이용한 영상기반 고속 해무제거 기술)

  • Choi, Woon-sik;Lee, Yoon-hyuk;Seo, Young-ho;Choi, Hyun-jun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.12
    • /
    • pp.2355-2362
    • /
    • 2016
  • Seg fog removal is an important issue concerned by both computer vision and image processing. Sea fog or haze removal is widely used in lots of fields, such as automatic control system, CCTV, and image recognition. Color image dehazing techniques have been extensively studied, and expecially the dark channel prior(DCP) technique has been widely used. This paper propose a fast and efficient image prior - dark channel prior to remove seg-fog from a single digital image based on the GPU. We implement the basic parallel program and then optimize it to obtain performance acceleration with more than 250 times. While paralleling and the optimizing the algorithm, we improve some parts of the original serial program or basic parallel program according to the characteristics of several steps. The proposed GPU programming algorithm and implementation results may be used with advantages as pre-processing in many systems, such as safe navigation for ship, topographical survey, intelligent vehicles, etc.

Optimization for H.264/AVC De-blocking Filter on the TMS320C64x+ DSP (TMS320C64x+ DSP에서의 H.264/AVC 디블록킹 필터 최적화)

  • Lee, Jin-Seop;Kang, Dae-Beom;Sim, Dong-Gyu;Lee, Soo-Youn
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.48 no.2
    • /
    • pp.41-52
    • /
    • 2011
  • It is important to reduce computational complexity of de-blocking filter for real-time implementation, because it accounts for a great part of total computational complexity of the decoder. Because there are a lot of conditional branches and memory accesses in a decoding loop, it is not easy to speed up the de-blocking filter. Therefore, this paper presents a new algorithm of de-blocking filter minimizing conditional branches and memory accesses. The proposed structure of de-blocking filter enables filter operation to parallelize by software pipelining. The proposed optimization method was implemented on a TMS320DM6467 EVM board and we achieved approximately 46% cycle reduction, compared with that of FFmpeg.

Accelerated Implementation of NTRU on GPU for Efficient Key Exchange in Multi-Client Environment (다중 사용자 환경에서 효과적인 키 교환을 위한 GPU 기반의 NTRU 고속구현)

  • Seong, Hyoeun;Kim, Yewon;Yeom, Yongjin;Kang, Ju-Sung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.481-496
    • /
    • 2021
  • It is imperative to migrate the current public key cryptosystem to a quantum-resistance system ahead of the realization of large-scale quantum computing technology. The National Institute of Standards and Technology, NIST, is promoting a public standardization project for Post-Quantum Cryptography(PQC) and also many research efforts have been conducted to apply PQC to TLS(Transport Layer Security) protocols, which are used for Internet communication security. In this paper, we propose a scenario in which a server and multi-clients share session keys on TLS by using the parallelized NTRU which is PQC in the key exchange process. In addition, we propose a method of accelerating NTRU using GPU and analyze its efficiency in an environment where a server needs to process large-scale data simultaneously.

Analysis of big data using Rhipe (Rhipe를 활용한 빅데이터 처리 및 분석)

  • Ko, Youngjun;Kim, Jinseog
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.975-987
    • /
    • 2013
  • The Hadoop system was developed by the Apache foundation based on GFS and MapReduce technologies of Google. Many modern systems for managing and processing the big data have been developing based on the Hadoop because the Hadoop was designed for scalability and distributed computing. The R software has been considered as a well-suited analytic tool in the Hadoop based systems because the R is flexible to other languages and has many libraries for complex analyses. We introduced Rhipe which is a R package supporting MapReduce programming easily under the Hadoop system, and implemented a MapReduce program using Rhipe for multiple regression especially. In addition, we compared the computing speeds of our program with the other packages (ff and bigmemory) for processing the large data. The simulation results showed that our program was more fast than ff and bigmemory as the size of data increases.

Implementation of High-Speed Fresnelet Transform using Daubechies's Filter (드뷔시 필터를 이용한 고속 프레넬릿 변환의 구현)

  • Seo, Young-Ho;Lee, Yoon-Hyuk;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.4
    • /
    • pp.820-828
    • /
    • 2017
  • Service of digital hologram that has been recognized as a visual system for next generation requires various signal processing technologies. A transform is the most frequently used tool among signal processing techniques for 2-dimensional(D) and 3-D natural picture. A digital hologram has totally different property with a natural picture, so it is rarely efficient to apply transform tools used in 2-D image processing to a digital hologram. To overcome this a Fresnelet transform for a digital hologram has been proposed. We derive a Fresnelet transform by using the Daubechie's filter after applying an unitary Fresnel transform to a wavelet basis function. We also implement the transform as types of device and kernel code to improve operational performance. In consideration of the average time that is required for a pixel we can have observed the performance is improved up to 242 and 30 times for using the (9,7) and (5,3) filters in case of using device code.

Data Assimilation of Real-time Air Quality Forecast using CUDA (CUDA를 이용한 실시간 대기질 예보 자료동화)

  • Bae, Hyo-Sik;Yu, Suk-Hyun;Kwon, Hee-Yong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.2
    • /
    • pp.271-277
    • /
    • 2017
  • As a result of rapid industrialization, air pollutants are seriously threatening the health of the people, the forecast is becoming more and more important. In forecasting air quality, it is very important to create a reliable initial field because the initial field input to the air quality forecasting model affects the accuracy of the forecast. There are several methods for enhancing the initial field input. One of the necessary techniques is data assimilation. The number of operations and the time required for such data assimilation is exponentially increased as the forecasting area is widened and the number of observation sites increases. Therefore, as the forecast size increases, it is difficult to apply the existing sequential processing method to a field requiring fast processing speed. In this paper, we propose a method that can process Cresman's method, which is one of the data assimilation techniques, in real time using CUDA. As a result, the proposed parallel processing method using CUDA improved at least 35 times faster than the conventional sequential method and other parallel processing methods.

Array Localization for Multithreaded Code Generation (다중스레드 코드 생성을 위한 배열 지역화)

  • Yang, Chang-Mo;Yu, Won-Hui
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.6
    • /
    • pp.1407-1417
    • /
    • 1996
  • In recent researches on thread partitioning algorithms break a thread at the long latency operation and merge threads to get the longer threads under the given constraints. Due to this limitation, even a program with little parallelism is partitioned into small-sized threads and context-swithings occur frequently. In the paper, we propose another method array localization about the array name, dependence distance(the difference of accessed element index from loop index), and the element usage that indicates whether element is used or defined. Using this information we can allocate array elements to the node where the corresponding loop activation is executed. By array localization, remote accesses to array elements can be replaced with local accesses to localized array elements. As a resuit,the boundaries of some threads are removed, programs can be partitioned into the larger threads and the number of context switchings reduced.

  • PDF

Performance Comparison of Tilera Many-core and x86-64 Multi-core Systems (Tilera 다중코어와 x86-64 멀티코어 시스템의 성능 비교)

  • Choi, HeeSeok;Lyoo, TaeMuk;Park, JiSu;Jung, Daeyong;Lim, JongBeom;Lee, Jungha;Suh, Teaweon;Yu, Heonchang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.102-105
    • /
    • 2013
  • 최근 멀티코어 시스템은 컴퓨터의 성능을 향상시키기 위해 더 많은 수의 코어를 연결시키는 다중코어 시스템으로 발전하고 있다. 그러나 멀티코어 시스템은 사용하는 코어의 아키텍처 구조와 개수에 따라 성능 차이가 발생한다. 이에, 본 논문에서는 코어의 아키텍처 구조와 코어의 개수가 성능에 미치는 영향을 분석하기 위해 Tilera의 다중코어 시스템인 Tile-Gx36, TilePro64와 Intel의 x86-64 멀티코어 시스템인 Core i5의 성능을 비교하였다. 코어의 사용률이 늘어남에 따른 성능차이를 알아보기 위해 벤치마크 프로그램인 SPEC CPU 2006을 이용하여 각 시스템 내 단일코어의 성능을 측정하고, OpenMP 벤치마크 프로그램을 이용하여 시스템의 모든 코어를 사용했을 때의 입력 데이터 크기에 따른 성능을 측정하였다. 실험 결과, 단일코어에서의 성능은 정수형 데이터를 사용하여 측정하였을 경우 Core i5가 Tile-Gx36보다 약 87%, 실수형 데이터를 사용하여 측정하였을 경우 약 94% 더 빠른 것으로 나타났다. 그러나 코어 전체를 이용한 성능 결과에서는 정수형 배열 크기가 이상일 경우 Tile-Gx36 시스템의 처리 속도가 Core i5 시스템 보다 평균적으로 약 7.6배 향상됨을 확인할 수 있었다. 따라서 Tilera의 다중코어 시스템은 클럭 속도와 아키텍처 구조의 영향으로 단일코어의 성능은 떨어지나, 병렬 처리를 이용한 고속연산에서는 성능이 향상된다고 할 수 있다.

Artificial Intelligence-based Classification Scheme to improve Time Series Data Accuracy of IoT Sensors (IoT 센서의 시계열 데이터 정확도 향상을 위한 인공지능 기반 분류 기법)

  • Kim, Jin-Young;Sim, Isaac;Yoon, Sung-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.4
    • /
    • pp.57-62
    • /
    • 2021
  • As the parallel computing capability for artificial intelligence improves, the field of artificial intelligence technology is expanding in various industries. In particular, artificial intelligence is being introduced to process data generated from IoT sensors that have enoumous data. However, the limitation exists when applying the AI techniques on IoT network because IoT has time series data, where the importance of data changes over time. In this paper, we propose time-weighted and user-state based artificial intelligence processing techniques to effectively process IoT sensor data. This technique aims to effectively classify IoT sensor data through a data pre-processing process that personalizes time series data and places a weight on the time series data before artificial intelligence learning and use status of personal data. Based on the research, it is possible to propose a method of applying artificial intelligence learning in various fields.

A Hardware Barrier Synchronization using Multi -drop Scheme in Parallel Computer Systems (병렬 컴퓨터 시스템에서의 Multi-drop 방식을 사용한 하드웨어 장벽 동기화)

  • Lee, June-Bum;Kim, Sung-Chun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.5
    • /
    • pp.485-495
    • /
    • 2000
  • The parallel computer system that uses parallel program on the application such as a large scale business or complex operation is required. One of crucial operation of parallel computer system is synchronization. A representative method of synchronization is barrier synchronization. A barrier forces all process to wait until all the process reach the barrier and then releases all of the processes. There are software schemes, hardware scheme, or combinations of these mechanism to achieve barrier synchronization which tends to use hardware scheme. Besides, barrier synchronization lets parallel computer system fast because it has fewer start-up overhead. In this paper, we propose a new switch module that can implement fast and fault-tolerant barrier synchronization in hardware scheme. A proposed barrier synchronization is operated not in full-switch-driven method but in processor-driven method. An effective barrier synchronization is executed with inexpensive hardware supports. Therefore, a new proposed hardware barrier synchronization is designed that it is operated in arbitrary network topology. In this paper, we only show comparison of barrier synchronization on Multistage Interconnection Network. This research results in 24.6-24.8% reduced average delay. Through this result, we can expect lower average delay in irregular network.

  • PDF