• Title/Summary/Keyword: GPU algorithm

Search Result 266, Processing Time 0.024 seconds

Real-time Steel Surface Defects Detection Appliocation based on Yolov4 Model and Transfer Learning (Yolov4와 전이학습을 기반으로한 실시간 철강 표면 결함 검출 연구)

  • Bok-Kyeong Kim;Jun-Hee Bae;NGUYEN VIET HOAN;Yong-Eun Lee;Young Seok Ock
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.31-41
    • /
    • 2022
  • Steel is one of the most fundamental components to mechanical industry. However, the quality of products are greatly impacted by the surface defects in the steel. Thus, researchers pay attention to the need for surface defects detector and the deep learning methods are the current trend of object detector. There are still limitations and rooms for improvements, for example, related works focus on developing the models but don't take into account real-time application with practical implication on industrial settings. In this paper, a real-time application of steel surface defects detection based on YOLOv4 is proposed. Firstly, as the aim of this work to deploying model on real-time application, we studied related works on this field, particularly focusing on one-stage detector and YOLO algorithm, which is one of the most famous algorithm for real-time object detectors. Secondly, using pre-trained Yolov4-Darknet platform models and transfer learning, we trained and test on the hot rolled steel defects open-source dataset NEU-DET. In our study, we applied our application with 4 types of typical defects of a steel surface, namely patches, pitted surface, inclusion and scratches. Thirdly, we evaluated YOLOv4 trained model real-time performance to deploying our system with accuracy of 87.1 % mAP@0.5 and over 60 fps with GPU processing.

GPU-based dynamic point light particles rendering using 3D textures for real-time rendering (실시간 렌더링 환경에서의 3D 텍스처를 활용한 GPU 기반 동적 포인트 라이트 파티클 구현)

  • Kim, Byeong Jin;Lee, Taek Hee
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.3
    • /
    • pp.123-131
    • /
    • 2020
  • This study proposes a real-time rendering algorithm for lighting when each of more than 100,000 moving particles exists as a light source. Two 3D textures are used to dynamically determine the range of influence of each light, and the first 3D texture has light color and the second 3D texture has light direction information. Each frame goes through two steps. The first step is to update the particle information required for 3D texture initialization and rendering based on the Compute shader. Convert the particle position to the sampling coordinates of the 3D texture, and based on this coordinate, update the colour sum of the particle lights affecting the corresponding voxels for the first 3D texture and the sum of the directional vectors from the corresponding voxels to the particle lights for the second 3D texture. The second stage operates on a general rendering pipeline. Based on the polygon world position to be rendered first, the exact sampling coordinates of the 3D texture updated in the first step are calculated. Since the sample coordinates correspond 1:1 to the size of the 3D texture and the size of the game world, use the world coordinates of the pixel as the sampling coordinates. Lighting process is carried out based on the color of the sampled pixel and the direction vector of the light. The 3D texture corresponds 1:1 to the actual game world and assumes a minimum unit of 1m, but in areas smaller than 1m, problems such as stairs caused by resolution restrictions occur. Interpolation and super sampling are performed during texture sampling to improve these problems. Measurements of the time taken to render a frame showed that 146 ms was spent on the forward lighting pipeline, 46 ms on the defered lighting pipeline when the number of particles was 262144, and 214 ms on the forward lighting pipeline and 104 ms on the deferred lighting pipeline when the number of particle lights was 1,024766.

Fast Hologram Generating of 3D Object with Super Multi-Light Source using Parallel Distributed Computing (병렬 분산 컴퓨팅을 이용한 초다광원 3차원 물체의 홀로그램 고속 생성)

  • Song, Joongseok;Kim, Changseob;Park, Jong-Il
    • Journal of Broadcast Engineering
    • /
    • v.20 no.5
    • /
    • pp.706-717
    • /
    • 2015
  • The computer generated hologram (CGH) method is the technology which can generate a hologram by using only a personal computer (PC) commonly used. However, the CGH method requires a huge amount of calculational time for the 3D object with a super multi-light source or a high-definition hologram. Hence, some solutions are obviously necessary for reducing the computational complexity of a CGH algorithm or increasing the computing performance of hardware. In this paper, we propose a method which can generate a digital hologram of the 3D object with a super multi-light source using parallel distributed computing. The traditional methods has the limitation of improving CGH performance by using a single PC. However, the proposed method where a server PC efficiently uses the computing power of client PCs can quickly calculate the CGH method for 3D object with super multi-light source. In the experimental result, we verified that the proposed method can generate the digital hologram with 1,5361,536 resolution size of 3D object with 157,771 light source in 121 ms. In addition, in the proposed method, we verify that the proposed method can reduce generation time of a digital hologram in proportion to the number of client PCs.

Development and Validation of the GPU-based 3D Dynamic Analysis Code for Simulating Rock Fracturing Subjected to Impact Loading (충격 하중 시 암석의 파괴거동해석을 위한 GPGPU 기반 3차원 동적해석기법의 개발과 검증 연구)

  • Min, Gyeong-Jo;Fukuda, Daisuke;Oh, Se-Wook;Cho, Sang-Ho
    • Explosives and Blasting
    • /
    • v.39 no.2
    • /
    • pp.1-14
    • /
    • 2021
  • Recently, with the development of high-performance processing devices such as GPGPU, a three-dimensional dynamic analysis technique that can replace expensive rock material impact tests has been actively developed in the defense and aerospace fields. Experimentally observing or measuring fracture processes occurring in rocks subjected to high impact loads, such as blasting and earth penetration of small-diameter missiles, are difficult due to the inhomogeneity and opacity of rock materials. In this study, a three-dimensional dynamic fracture process analysis technique (3D-DFPA) was developed to simulate the fracture behavior of rocks due to impact. In order to improve the operation speed, an algorithm capable of GPGPU operation was developed for explicit analysis and contact element search. To verify the proposed dynamic fracture process analysis technique, the dynamic fracture toughness tests of the Straight Notched Disk Bending (SNDB) limestone samples were simulated and the propagation of the reflection and transmission of the stress waves at the rock-impact bar interfaces and the fracture process of the rock samples were compared. The dynamic load tests for the SNDB sample applied a Pulse Shape controlled Split Hopkinson presure bar (PS-SHPB) that can control the waveform of the incident stress wave, the stress state, and the fracture process of the rock models were analyzed with experimental results.

Deep Learning Architectures and Applications (딥러닝의 모형과 응용사례)

  • Ahn, SungMahn
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.127-142
    • /
    • 2016
  • Deep learning model is a kind of neural networks that allows multiple hidden layers. There are various deep learning architectures such as convolutional neural networks, deep belief networks and recurrent neural networks. Those have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks. Among those architectures, convolutional neural networks and recurrent neural networks are classified as the supervised learning model. And in recent years, those supervised learning models have gained more popularity than unsupervised learning models such as deep belief networks, because supervised learning models have shown fashionable applications in such fields mentioned above. Deep learning models can be trained with backpropagation algorithm. Backpropagation is an abbreviation for "backward propagation of errors" and a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the gradient of an error function with respect to all the weights in the network. The gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the error function. Convolutional neural networks use a special architecture which is particularly well-adapted to classify images. Using this architecture makes convolutional networks fast to train. This, in turn, helps us train deep, muti-layer networks, which are very good at classifying images. These days, deep convolutional networks are used in most neural networks for image recognition. Convolutional neural networks use three basic ideas: local receptive fields, shared weights, and pooling. By local receptive fields, we mean that each neuron in the first(or any) hidden layer will be connected to a small region of the input(or previous layer's) neurons. Shared weights mean that we're going to use the same weights and bias for each of the local receptive field. This means that all the neurons in the hidden layer detect exactly the same feature, just at different locations in the input image. In addition to the convolutional layers just described, convolutional neural networks also contain pooling layers. Pooling layers are usually used immediately after convolutional layers. What the pooling layers do is to simplify the information in the output from the convolutional layer. Recent convolutional network architectures have 10 to 20 hidden layers and billions of connections between units. Training deep learning networks has taken weeks several years ago, but thanks to progress in GPU and algorithm enhancement, training time has reduced to several hours. Neural networks with time-varying behavior are known as recurrent neural networks or RNNs. A recurrent neural network is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. Early RNN models turned out to be very difficult to train, harder even than deep feedforward networks. The reason is the unstable gradient problem such as vanishing gradient and exploding gradient. The gradient can get smaller and smaller as it is propagated back through layers. This makes learning in early layers extremely slow. The problem actually gets worse in RNNs, since gradients aren't just propagated backward through layers, they're propagated backward through time. If the network runs for a long time, that can make the gradient extremely unstable and hard to learn from. It has been possible to incorporate an idea known as long short-term memory units (LSTMs) into RNNs. LSTMs make it much easier to get good results when training RNNs, and many recent papers make use of LSTMs or related ideas.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.