• Title/Summary/Keyword: GPU Parallel Processing

Search Result 226, Processing Time 0.026 seconds

Redundant Parallel Hopfield Network Configurations: A New Approach to the Two-Dimensional Face Recognitions (병렬 다중 홉 필드 네트워크 구성으로 인한 2-차원적 얼굴인식 기법에 대한 새로운 제안)

  • Kim, Yong Taek;Deo, Kiatama
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.2
    • /
    • pp.63-68
    • /
    • 2018
  • Interests in face recognition area have been increasing due to diverse emerging applications. Face recognition algorithm from a two-dimensional source could be challenging in dealing with some circumstances such as face orientation, illuminance degree, face details such as with/without glasses and various expressions, like, smiling or crying. Hopfield Network capabilities have been used specially within the areas of recalling patterns, generalizations, familiarity recognitions and error corrections. Based on those abilities, a specific experimentation is conducted in this paper to apply the Redundant Parallel Hopfield Network on a face recognition problem. This new design has been experimentally confirmed and tested to be robust in any kind of practical situations.

An Analytical Evaluation of 2D Mesh-connected SIMD Architecture for Parallel Matrix Multiplication (2D Mesh SIMD 구조에서의 병렬 행렬 곱셈의 수치적 성능 분석)

  • Kim, Cheong-Ghil
    • Journal of The Institute of Information and Telecommunication Facilities Engineering
    • /
    • v.10 no.1
    • /
    • pp.7-13
    • /
    • 2011
  • Matrix multiplication is a fundamental operation of linear algebra and arises in many areas of science and engineering. This paper introduces an efficient parallel matrix multiplication scheme on N ${\times}$ N mesh-connected SIMD array processor, called multiple hierarchical SIMD architecture (HMSA). The architectural characteristic of HMSA is the hierarchically structured control units which consist of a global control unit, N local control units configured diagonally, and $N^2$ processing elements (PEs) arranged in an N ${\times}$ N array. PEs are communicating through local buses connecting four adjacent neighbor PEs in mesh-torus networks and global buses running across the rows and columns called horizontal buses and vertical buses, respectively. This architecture enables HMSA to have the features of diagonally indexed concurrent broadcast and the accessibility to either rows (row control mode) or columns (column control mode) of 2D array PEs alternately. An algorithmic mapping method is used for performance evaluation by mapping matrix multiplication on the proposed architecture. The asymptotic time complexities of them are evaluated and the result shows that paralle matrix multiplication on HMSA can provide significant performance improvement.

  • PDF

Real-time Color Recognition Based on Graphic Hardware Acceleration (그래픽 하드웨어 가속을 이용한 실시간 색상 인식)

  • Kim, Ku-Jin;Yoon, Ji-Young;Choi, Yoo-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.1
    • /
    • pp.1-12
    • /
    • 2008
  • In this paper, we present a real-time algorithm for recognizing the vehicle color from the indoor and outdoor vehicle images based on GPU (Graphics Processing Unit) acceleration. In the preprocessing step, we construct feature victors from the sample vehicle images with different colors. Then, we combine the feature vectors for each color and store them as a reference texture that would be used in the GPU. Given an input vehicle image, the CPU constructs its feature Hector, and then the GPU compares it with the sample feature vectors in the reference texture. The similarities between the input feature vector and the sample feature vectors for each color are measured, and then the result is transferred to the CPU to recognize the vehicle color. The output colors are categorized into seven colors that include three achromatic colors: black, silver, and white and four chromatic colors: red, yellow, blue, and green. We construct feature vectors by using the histograms which consist of hue-saturation pairs and hue-intensity pairs. The weight factor is given to the saturation values. Our algorithm shows 94.67% of successful color recognition rate, by using a large number of sample images captured in various environments, by generating feature vectors that distinguish different colors, and by utilizing an appropriate likelihood function. We also accelerate the speed of color recognition by utilizing the parallel computation functionality in the GPU. In the experiments, we constructed a reference texture from 7,168 sample images, where 1,024 images were used for each color. The average time for generating a feature vector is 0.509ms for the $150{\times}113$ resolution image. After the feature vector is constructed, the execution time for GPU-based color recognition is 2.316ms in average, and this is 5.47 times faster than the case when the algorithm is executed in the CPU. Our experiments were limited to the vehicle images only, but our algorithm can be extended to the input images of the general objects.

Processing Speed Improvement of Software for Automatic Corner Radius Analysis of Laminate Composite using CUDA (CUDA를 이용한 적층 복합재 구조물 코너 부의 자동 구조 해석 소프트웨어의 처리 속도 향상)

  • Hyeon, Ju-Ha;Kang, Moon-Hyae;Moon, Yong-Ho;Ha, Seok-Wun
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.7
    • /
    • pp.33-40
    • /
    • 2019
  • As aerospace industry has been activated recently, it is required to commercialize composite analysis software. Until now, commercial software has been mainly used for analyzing composites, but it has been difficult to use due to high price and limited functions. In order to solve this problem, automatic analysis software for both in-plane and corner radius strength, which are all made on-line and generalized, has recently been developed. However, these have the disadvantage that they can not be analyzed simultaneously with multiple failure criteria. In this paper, we propose a method to greatly improve the processing speed while simultaneously handling the analysis of multiple failure criteria using a parallel processing platform that only works with a GPU equipped with a CUDA core. We have obtained satisfactory results when the analysis speed is experimented on the vast structure data.

An Algorithm for Finding Surface Atoms of a Protein Molecule Based on Voxel Map Representation (복셀 맵을 이용한 단백질 표면 원자의 발견 알고리즘)

  • Kim, Byung-Joo;Kim, Ku-Jin;Seong, Joon-Kyung
    • The KIPS Transactions:PartA
    • /
    • v.19A no.2
    • /
    • pp.73-76
    • /
    • 2012
  • In this paper, we propose an efficient method to extract surface atoms from a protein molecule. Surface atoms are defined as a set of atoms who can contact given probe solvent $P$, where $P$ does not collide with the molecule. The atoms contained in the molecule are represented as a set of spheres with van der Waals radii. The probe solvent also is represented as a sphere. We propose a method to extract the surface atoms by computing the offset surface of the molecule with respect to the radius of $P$. For efficient computation of the offset surface of a molecule, a voxel map is constructed for the offset surfaces of the spheres. Based on GPU (graphic processor unit) acceleration, a data parallel algorithm is used to extract the surface atoms in 42.87 milliseconds for the molecule containing up to 6,412 atoms.

An Implementation of a Convolutional Accelerator based on a GPGPU for a Deep Learning (Deep Learning을 위한 GPGPU 기반 Convolution 가속기 구현)

  • Jeon, Hee-Kyeong;Lee, Kwang-yeob;Kim, Chi-yong
    • Journal of IKEEE
    • /
    • v.20 no.3
    • /
    • pp.303-306
    • /
    • 2016
  • In this paper, we propose a method to accelerate convolutional neural network by utilizing a GPGPU. Convolutional neural network is a sort of the neural network learning features of images. Convolutional neural network is suitable for the image processing required to learn a lot of data such as images. The convolutional layer of the conventional CNN required a large number of multiplications and it is difficult to operate in the real-time on the embedded environment. In this paper, we reduce the number of multiplications through Winograd convolution operation and perform parallel processing of the convolution by utilizing SIMT-based GPGPU. The experiment was conducted using ModelSim and TestDrive, and the experimental results showed that the processing time was improved by about 17%, compared to the conventional convolution.

Random Partial Haar Wavelet Transformation for Single Instruction Multiple Threads (단일 명령 다중 스레드 병렬 플랫폼을 위한 무작위 부분적 Haar 웨이블릿 변환)

  • Park, Taejung
    • Journal of Digital Contents Society
    • /
    • v.16 no.5
    • /
    • pp.805-813
    • /
    • 2015
  • Many researchers expect the compressive sensing and sparse recovery problem can overcome the limitation of conventional digital techniques. However, these new approaches require to solve the l1 norm optimization problems when it comes to signal reconstruction. In the signal reconstruction process, the transform computation by multiplication of a random matrix and a vector consumes considerable computing power. To address this issue, parallel processing is applied to the optimization problems. In particular, due to huge size of original signal, it is hard to store the random matrix directly in memory, which makes one need to design a procedural approach in handling the random matrix. This paper presents a new parallel algorithm to calculate random partial Haar wavelet transform based on Single Instruction Multiple Threads (SIMT) platform.

Real-time Style Transfer for Video (실시간 비디오 스타일 전이 기법에 관한 연구)

  • Seo, Sang Hyun
    • Smart Media Journal
    • /
    • v.5 no.4
    • /
    • pp.63-68
    • /
    • 2016
  • Texture transfer is a method to transfer the texture of an input image into a target image, and is also used for transferring artistic style of the input image. This study presents a real-time texture transfer for generating artistic style video. In order to enhance performance, this paper proposes a parallel framework using T-shape kernel used in general texture transfer on GPU. To accelerate motion computation time which is necessarily required for maintaining temporal coherence, a multi-scaled motion field is proposed in parallel concept. Through these approach, an artistic texture transfer for video with a real-time performance is archived.

A Study on Improved Image Matching Method using the CUDA Computing (CUDA 연산을 이용한 개선된 영상 매칭 방법에 관한 연구)

  • Cho, Kyeongrae;Park, Byungjoon;Yoon, Taebok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.4
    • /
    • pp.2749-2756
    • /
    • 2015
  • Recently, Depending on the quality of data increases, the problem of time-consuming to process the image is raised by being required to accelerate the image processing algorithms, in a traditional CPU and CUDA(Compute Unified Device Architecture) based recognition system for computing speed and performance gains compared to OpenMP When character recognition has been learned by the system to measure the input by the character data matching is implemented in an environment that recognizes the region of the well, so that the font of the characters image learning English alphabet are each constant and standardized in size and character an image matching method for calculating the matching has also been implemented. GPGPU (General Purpose GPU) programming platform technology when using the CUDA computing techniques to recognize and use the four cores of Intel i5 2500 with OpenMP to deal quickly and efficiently an algorithm, than the performance of existing CPU does not produce the rate of four times due to the delay of the data of the partition and merge operation proposed a method of improving the rate of speed of about 3.2 times, and the parallel processing of the video card that processes a result, the sequential operation of the process compared to CPU-based who performed the performance gain is about 21 tiems improvement in was confirmed.

Efficient Parallel Processing for Depth-Map Estimation in Real-Time (실시간 깊이 지도 획득을 위한 효율적인 병렬 처리)

  • Cho, Chil-Suk;Jun, Ji-In;Choo, Hyun-Gon;Park, Jong-Il
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.07a
    • /
    • pp.44-46
    • /
    • 2012
  • Depth map를 구하는 방법 중 많이 사용되어지는 방법으로 stripe 패턴을 이용하는 방법이 존재한다. 이 방법은 Pro-Cam 시스템을 이용하며 프로젝터로 조사한 패턴을 카메라로 촬영하여 원래의 패턴과 촬영된 패턴간의 기하학적인 관계를 구하여 depth map를 구하는 방법이다. 본 논문에서는 이와 같이 구조광을 이용하여 depth map 획득 시스템을 효과적으로 multi-thread를 사용하여 실시간 처리하는 것을 제안한다. 일반적으로 자주 사용되는 multi-threading 기법에는 CPU의 thread를 이용하는 OpenMP와 GPU의 thread를 이용하는 CUDA가 있다. 이 두 가지 기법은 수행하는데 차이점이 존재하기 때문에 상황에 따라 OpenMP가 더 좋은 효율을 보이는 부분이 있고 CUDA가 더 좋은 효율을 보이는 부분이 있다. 때문에 우리는 이 두 가지에 대해서 각 부분의 특성에 맞게 더 좋은 효율을 보이는 multi-thread를 이용하였다. 결과적으로 우리는 $1280{\times}800$의 영상에 대해 25fps 이상의 depth map를 획득하였다.

  • PDF