• Title/Summary/Keyword: and Parallel Processing

Search Result 2,013, Processing Time 0.027 seconds

Virtual Flutter Test of Spanwise Curved Wings Using CFD/CSD Coupled Dynamic Method (CFD/CSD 정밀 연계해석기법을 이용한 3차원 곡면날개의 가상 플러터 시험)

  • Kim, Dong-Hyun;Oh, Se-Won;Kim, Hyun-Jung
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2005.11a
    • /
    • pp.457-464
    • /
    • 2005
  • The coupled time-integration method with a staggered algorithm based on computational structural dynamics (CSD), finite element method (FEM) and computational fluid dynamics (CFD) has been developed in order to demonstrate physical vibration phenomena due to dynamic aeroelastic excitations. Virtual flutter tests for the spanwise curved wing model have been effectively conducted using the present advanced computational methods with high speed parallel processing technique. In addition, the present system can simultaneously give a recorded data fie to generate virtual animation for the flutter safety test. The results for virtual flutter test are compared with the experimental data of wind tunnel test. It is shown from the results that the effect of spanwise curvature have a tendency to decrease the flutter dynamic pressure for the same flight condition.

  • PDF

A Study on Airborne SAR System and Image Formation (항공탑재 SAR 시스템 및 영상형성 연구)

  • Hyo-I Moon;Jae-Hyoung Cho;Dong-Ju Lim;Min-Ho Go
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.3
    • /
    • pp.475-482
    • /
    • 2023
  • Synthetic Aperture Radar (SAR), which provides images of targets using radio signals, enables monitoring at all times regardless of weather conditions. In this paper, the SAR system was installed on the test aircraft to collect SAR raw data on the ground and the sea, and the results of image formation using the backprojection algorithm were presented.

Fine-scalable SPIHT Hardware Design for Frame Memory Compression in Video Codec

  • Kim, Sunwoong;Jang, Ji Hun;Lee, Hyuk-Jae;Rhee, Chae Eun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.17 no.3
    • /
    • pp.446-457
    • /
    • 2017
  • In order to reduce the size of frame memory or bus bandwidth, frame memory compression (FMC) recompresses reconstructed or reference frames of video codecs. This paper proposes a novel FMC design based on discrete wavelet transform (DWT) - set partitioning in hierarchical trees (SPIHT), which supports fine-scalable throughput and is area-efficient. In the proposed design, multi-cores with small block sizes are used in parallel instead of a single core with a large block size. In addition, an appropriate pipelining schedule is proposed. Compared to the previous design, the proposed design achieves the processing speed which is closer to the target system speed, and therefore it is more efficient in hardware utilization. In addition, a scheme in which two passes of SPIHT are merged into one pass called merged refinement pass (MRP) is proposed. As the number of shifters decreases and the bit-width of remained shifters is reduced, the size of SPIHT hardware significantly decreases. The proposed FMC encoder and decoder designs achieve the throughputs of 4,448 and 4,000 Mpixels/s, respectively, and their gate counts are 76.5K and 107.8K. When the proposed design is applied to high efficiency video codec (HEVC), it achieves 1.96% lower average BDBR and 0.05 dB higher average BDPSNR than the previous FMC design.

Multi-Sever based Distributed Coding based on HEVC/H.265 for Studio Quality Video Editing

  • Kim, Jongho;Lim, Sung-Chang;Jeong, Se-Yoon;Kim, Hui-Yong
    • Journal of Multimedia Information System
    • /
    • v.5 no.3
    • /
    • pp.201-208
    • /
    • 2018
  • High Efficiency Video Coding range extensions (HEVC RExt) is a kind of extension model of HEVC. HEVC RExt was specially designed for dealing the high quality images. HEVC RExt is very essential for studio editing which handle the very high quality and various type of images. There are some problems to dealing these massive data in studio editing. One of the most important procedure is re-encoding and decoding procedure during the editing. Various codecs are widely used for studio data editing. But most of the codecs have common problems to dealing the massive data in studio editing. First, the re-encoding and decoding processes are frequently occurred during the studio data editing and it brings enormous time-consuming and video quality loss. This paper, we suggest new video coding structure for the efficient studio video editing. The coding structure which is called "ultra-low delay (ULD)". It has the very simple and low-delayed referencing structure. To simplify the referencing structure, we can minimize the number of the frames which need decoding and re-encoding process. It also prevents the quality degradation caused by the frequent re-encoding. Various fast coding algorithms are also proposed for efficient editing such as tool-level optimization, multi-serve based distributed coding and SIMD (Single instruction, multiple data) based parallel processing. It can reduce the enormous computational complexity during the editing procedure. The proposed method shows 9500 times faster coding speed with negligible loss of quality. The proposed method also shows better coding gain compare to "intra only" structure. We can confirm that the proposed method can solve the existing problems of the studio video editing efficiently.

Processing Speed Improvement of Software for Automatic Corner Radius Analysis of Laminate Composite using CUDA (CUDA를 이용한 적층 복합재 구조물 코너 부의 자동 구조 해석 소프트웨어의 처리 속도 향상)

  • Hyeon, Ju-Ha;Kang, Moon-Hyae;Moon, Yong-Ho;Ha, Seok-Wun
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.7
    • /
    • pp.33-40
    • /
    • 2019
  • As aerospace industry has been activated recently, it is required to commercialize composite analysis software. Until now, commercial software has been mainly used for analyzing composites, but it has been difficult to use due to high price and limited functions. In order to solve this problem, automatic analysis software for both in-plane and corner radius strength, which are all made on-line and generalized, has recently been developed. However, these have the disadvantage that they can not be analyzed simultaneously with multiple failure criteria. In this paper, we propose a method to greatly improve the processing speed while simultaneously handling the analysis of multiple failure criteria using a parallel processing platform that only works with a GPU equipped with a CUDA core. We have obtained satisfactory results when the analysis speed is experimented on the vast structure data.

Performance Analysis of Distributed Hadoop Systems (분산 하둡 시스템의 성능 비교 분석)

  • Bae, Byoung-Jin;Kim, Young-Joo;Kim, Young-Kuk
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.479-482
    • /
    • 2014
  • Nowadays open-source hadoop systems have been using widely to efficiently manage a fast-growing big data. Hadoop systems consist of distributed file processing system called HDFS (Hadoop Distributed File System) and distributed parallel processing system called MapReduce. The MapReduce reads and processes big data from HDFS and then processed results are written in HDFS again by the MapReduce. Such a processing method has different system structure respectively according to hadoop version. Therefore, this paper shows analysis results for performance of hadoop systems. For this, we devise a way which monitors hadoop systems and measure occurrence frequency of processes, threads, and variables generated in hadoop system itself using the devised way. So, by using the measured results as analysis indicator, we help the indicator predict inner performance of hadoop systems.

  • PDF

Implementation of Channel Coding System using Viterbi Decoder of Pipeline-based Multi-Window (파이프라인 기반 다중윈도방식의 비터비 디코더를 이용한 채널 코딩 시스템의 구현)

  • Seo Young-Ho;Kim Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.3
    • /
    • pp.587-594
    • /
    • 2005
  • In the paper, after we propose a viterbi decoder which has multiple buffering and parallel processing decoding scheme through expanding time-divided imput signal, and map a FPGA, we implement a channel coding system together with PC-based software. Continuous input signal is buffered as order of decoding length and is parallel decoded using a high speed cell for viterbi decoding. Output data rate increases linearly with the cell formed the viterbi decoder, and flexible operation can be satisfied by programming controller and modifying input buffer. The tell for viterbi decoder consists of HD block for calculating hamming distance, CM block for calculating value in each state, TB block for trace-back operation, and LIFO. The implemented cell of viterbi decoder used 351 LAB(Logic Arrary Block) and stably operated in maximum 139MHz in APEX20KC EP20K600CB652-7 FPGA of ALTERA. The whole viterbi decoder including viterbi decoding cells, input/output buffers, and a controller occupied the hardware resource of $23\%$ and has the output data rate of 1Gbps.

Improving the Calculation Speed of Ray-tracing Based Simulator for Analyzing an Integrating Sphere with OpenMP Directive and Guaranteeing the Randomness of Monte Carlo Method (광선추적법 기반의 적분구 분석 시뮬레이터에서 OpenMP 지시어를 이용한 속도 향상 및 몬테카를로 방법의 무작위성 보장)

  • Kim, Seung-Yong;Kim, Dae-Chan;O, Beom-Hoan;Park, Se-Geun;Lee, El-Hang;Lee, Seung-Gol
    • Korean Journal of Optics and Photonics
    • /
    • v.22 no.2
    • /
    • pp.83-89
    • /
    • 2011
  • In order to improve the calculation speed of an integrating-sphere simulator based on a ray-tracing method, parallel processing with OpenMP directive was implemented into the simulator and the randomness of Monte Carlo method was guaranteed by utilizing a parallel random number generator. It was confirmed that simulation results obtained with more than $10^7$ rays showed good agreement with theoretical results within the error range of 0.5%, and that the calculation speed improved as the number of threads increased. Finally, the spatial response distribution functions of a real integrating sphere were simulated and compared with previous results.

Embedding Multiple Meshes into a Crossed Cube (다중 메쉬의 교차큐브에 대한 임베딩)

  • Kim, Sook-Yeon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.5
    • /
    • pp.335-343
    • /
    • 2009
  • The crossed cube has received great attention because it has equal or superior properties compared to the hypercube that is widely known as a versatile parallel processing system. It has been known that disjoint two copies of a mesh of size $4{\times}2^m$ or disjoint four copies of a mesh of size $8{\times}2^m$ can be embedded into a crossed cube with dilation 1 and expansion 1 [Dong, Yang, Zhao, and Tang, 2008]. However, it is not known that disjoint multiple copies of a mesh with more than eight rows and columns can be embedded into a crossed cube with dilation 1 and expansion 1. In this paper, we show that disjoint $2^{n-1}$ copies of a mesh of size $2^n{\times}2^m$ can be embedded into a crossed cube with dilation 1 and expansion 1 where $n{\geq}1$ and $m{\geq}3$. Our result is optimal in terms of dilation and expansion that are important measures of graph embedding. In addition, our result is practically usable in allocating multiple jobs of mesh structure on a parallel computer of crossed cube structure.

Implementation of handwritten digit recognition CNN structure using GPGPU and Combined Layer (GPGPU와 Combined Layer를 이용한 필기체 숫자인식 CNN구조 구현)

  • Lee, Sangil;Nam, Kihun;Jung, Jun Mo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.3 no.4
    • /
    • pp.165-169
    • /
    • 2017
  • CNN(Convolutional Nerual Network) is one of the algorithms that show superior performance in image recognition and classification among machine learning algorithms. CNN is simple, but it has a large amount of computation and it takes a lot of time. Consequently, in this paper we performed an parallel processing unit for the convolution layer, pooling layer and the fully connected layer, which consumes a lot of handling time in the process of CNN, through the SIMT(Single Instruction Multiple Thread)'s structure of GPGPU(General-Purpose computing on Graphics Processing Units).And we also expect to improve performance by reducing the number of memory accesses and directly using the output of convolution layer not storing it in pooling layer. In this paper, we use MNIST dataset to verify this experiment and confirm that the proposed CNN structure is 12.38% better than existing structure.