• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.033 seconds

Three-dimensional Wave Propagation Modeling using OpenACC and GPU (OpenACC와 GPU를 이용한 3차원 파동 전파 모델링)

  • Kim, Ahreum;Lee, Jongwoo;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.20 no.2
    • /
    • pp.72-77
    • /
    • 2017
  • We calculated 3D frequency- and Laplace-domain wavefields using time-domain modeling and Fourier transform or Laplace transform. We adopted OpenACC and GPU for an efficient parallel calculation. The OpenACC makes it easy to use GPU accelerators by adding directives in conventional C, C++, and Fortran programming languages. Accordingly, one doesn't have to learn new GPGPU programming languages such as CUDA or OpenCL to use GPU. An OpenACC program allocates GPU memory, transfers data between the host CPU and GPU devices and performs GPU operations automatically or following user-defined directives. We compared performance of 3D wave propagation modeling programs using OpenACC and GPU to that using single-core CPU through numerical tests. Results using a homogeneous model and the SEG/EAGE salt model show that the OpenACC programs are approximately 53 and 30 times faster than those using single-core CPU.

Design and Implementation of Accelerator Architecture for Binary Weight Network on FPGA with Limited Resources (한정된 자원을 갖는 FPGA에서의 이진가중치 신경망 가속처리 구조 설계 및 구현)

  • Kim, Jong-Hyun;Yun, SangKyun
    • Journal of IKEEE
    • /
    • v.24 no.1
    • /
    • pp.225-231
    • /
    • 2020
  • In this paper, we propose a method to accelerate BWN based on FPGA with limited resources for embedded system. Because of the limited number of logic elements available, a single computing unit capable of handling Conv-layer, FC-layer of various sizes must be designed and reused. Also, if the input feature map can not be parallel processed at one time, the output must be calculated by reading the inputs several times. Since the number of available BRAM modules is limited, the number of data bits in the BWN accelerator must be minimized. The image classification processing time of the BWN accelerator is superior when compared with a embedded CPU and is faster than a desktop PC and 50% slower than a GPU system. Since the BWN accelerator uses a slow clock of 50MHz, it can be seen that the BWN accelerator is advantageous in performance versus power.

Real-time Eye Contact System Using a Kinect Depth Camera for Realistic Telepresence (Kinect 깊이 카메라를 이용한 실감 원격 영상회의의 시선 맞춤 시스템)

  • Lee, Sang-Beom;Ho, Yo-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.4C
    • /
    • pp.277-282
    • /
    • 2012
  • In this paper, we present a real-time eye contact system for realistic telepresence using a Kinect depth camera. In order to generate the eye contact image, we capture a pair of color and depth video. Then, the foreground single user is separated from the background. Since the raw depth data includes several types of noises, we perform a joint bilateral filtering method. We apply the discontinuity-adaptive depth filter to the filtered depth map to reduce the disocclusion area. From the color image and the preprocessed depth map, we construct a user mesh model at the virtual viewpoint. The entire system is implemented through GPU-based parallel programming for real-time processing. Experimental results have shown that the proposed eye contact system is efficient in realizing eye contact, providing the realistic telepresence.

Speech Recognition Performance Improvement using a convergence of GMM Phoneme Unit Parameter and Vocabulary Clustering (GMM 음소 단위 파라미터와 어휘 클러스터링을 융합한 음성 인식 성능 향상)

  • Oh, SangYeob
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.8
    • /
    • pp.35-39
    • /
    • 2020
  • DNN error is small compared to the conventional speech recognition system, DNN is difficult to parallel training, often the amount of calculations, and requires a large amount of data obtained. In this paper, we generate a phoneme unit to estimate the GMM parameters with each phoneme model parameters from the GMM to solve the problem efficiently. And it suggests ways to improve performance through clustering for a specific vocabulary to effectively apply them. To this end, using three types of word speech database was to have a DB build vocabulary model, the noise processing to extract feature with Warner filters were used in the speech recognition experiments. Results using the proposed method showed a 97.9% recognition rate in speech recognition. In this paper, additional studies are needed to improve the problems of improved over fitting.

Implementation of a Cluster VOD Server and an Embedded Client based on Linux (리눅스 기반의 클러스터 VOD서버와 내장형에 클라이언트의 구현)

  • Seo Dongmahn;Bang Cheolseok;Lee Joahyoung;Kim Byounggil;Jung Inbum
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.6
    • /
    • pp.435-447
    • /
    • 2004
  • For VOD systems, it is important to provide QoS to more users under the limited resources. To analyze QoS issues in real environment, we implement clustered VOD server and embedded client system based on the Linux open source platform. The parallel processing of MPEG data, load balancing for nodes and VCR like functions are implemented in the server side. To provide more user friendly interface, the general TV is used for a VOD client's terminal and the embedded board is used supporting for VCR functions. In this paper, we measure the performance of the implemented VOD system under the various user requirement features and evaluate the sources of performance limitations. From these analyses, we propose the dynamic admission control method based on the availability memory and network bandwidth. The proposed method enhances the utilization of the system resource for the more QoS media streams.

Mutual Authentication Protocol for Safe Data Transmission of Multi-distributed Web Cluster Model (다중 분산 웹 클러스터모델의 안전한 데이터 전송을 위한 상호 인증 프로토콜)

  • Lee, Kee-Jun;Kim, Chang-Won;Jeong, Chae-Yeong
    • The KIPS Transactions:PartC
    • /
    • v.8C no.6
    • /
    • pp.731-740
    • /
    • 2001
  • Multi-distributed web cluster model expanding conventional cluster system is the cluster system which processes large-scaled work demanded from users with parallel computing method by building a number of system nodes on open network into a single imaginary network. Multi-distributed web cluster model on the structured characteristics exposes internal system nodes by an illegal third party and has a potential that normal job performance is impossible by the intentional prevention and attack in cooperative work among system nodes. This paper presents the mutual authentication protocol of system nodes through key division method for the authentication of system nodes concerned in the registration, requirement and cooperation of service code block of system nodes and collecting the results and then designs SNKDC which controls and divides symmetrical keys of the whole system nodes safely and effectively. SNKDC divides symmetrical keys required for performing the work of system nodes and the system nodes transmit encoded packet based on the key provided. Encryption packet given and taken between system nodes is decoded by a third party or can prevent the outflow of information through false message.

  • PDF

Design of Pipeline-based Failure Recovery Method for VOD Server (파이프라인 개념을 이용한 VOD 서버의 장애 복구 방법 연구)

  • Lee, Joa-Hyoung;Park, Chong-Myoung;Jung, In-Bum
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.5
    • /
    • pp.942-947
    • /
    • 2008
  • A cluster server usually consists of a front end node and multiple backend nodes. Though increasing the number of bookend nodes can result in the more QoS(Quality of Service) streams for clients, the possibility of failures in backend nodes is proportionally increased. The failure causes not only the stop of all streaming service but also the loss of the current playing positions. In this paper, when a backend node becomes a failed state, the recovery mechanisms are studied to support the unceasing streaming service. The basic techniques are hewn as providing very high speed data transfer rates suitable for the video streaming. However, without considering the architecture of cluster-based VOD server, the application of these basic techniques causes the performance bottleneck of the internal network for recovery and also results in the inefficiency CPU usage of backend nodes. To resolve these problems, we propose a new failure recovery mechanism based on the pipeline computing concept.

A Design of Authentication/Security Processor IP for Wireless USB (무선 USB 인증/보안용 프로세서 IP 설계)

  • Yang, Hyun-Chang;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.11
    • /
    • pp.2031-2038
    • /
    • 2008
  • A small-area and high-speed authentication/security processor (WUSB_Sec) IP is designed, which performs the 4-way handshake protocol for authentication between host and device, and data encryption/decryption of wireless USB system. The PRF-256 and PRF-64 are implemented by CCM (Counter mode with CBC-MAC) operation, and the CCM is designed with two AES (Advanced Encryption Standard) encryption coles working concurrently for parallel processing of CBC mode and CTR mode operations. The AES core that is an essential block of the WUSB_Sec processor is designed by applying composite field arithmetic on AF$(((2^2)^2)^2)$. Also, S-Box sharing between SubByte block and key scheduler block reduces the gate count by 10%. The designed WUSB_Sec processor has 25,000 gates and the estimated throughput rate is about 480Mbps at 120MHz clock frequency.

Two-Level Hierarchical Production Planning for a Semiconductor Probing Facility (반도체 프로브 공정에서의 2단계 계층적 생산 계획 방법 연구)

  • Bang, June-Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.4
    • /
    • pp.159-167
    • /
    • 2015
  • We consider a wafer lot transfer/release planning problem from semiconductor wafer fabrication facilities to probing facilities with the objective of minimizing the deviation of workload and total tardiness of customers' orders. Due to the complexity of the considered problem, we propose a two-level hierarchical production planning method for the lot transfer problem between two parallel facilities to obtain an executable production plan and schedule. In the higher level, the solution for the reduced mathematical model with Lagrangian relaxation method can be regarded as a coarse good lot transfer/release plan with daily time bucket, and discrete-event simulation is performed to obtain detailed lot processing schedules at the machines with a priority-rule-based scheduling method and the lot transfer/release plan is evaluated in the lower level. To evaluate the performance of the suggested planning method, we provide computational tests on the problems obtained from a set of real data and additional test scenarios in which the several levels of variations are added in the customers' demands. Results of computational tests showed that the proposed lot transfer/planning architecture generates executable plans within acceptable computational time in the real factories and the total tardiness of orders can be reduced more effectively by using more sophisticated lot transfer methods, such as considering the due date and ready times of lots associated the same order with the mathematical formulation. The proposed method may be implemented for the problem of job assignment in back-end process such as the assignment of chips to be tested from assembly facilities to final test facilities. Also, the proposed method can be improved by considering the sequence dependent setup in the probing facilities.

Efficient GPU Framework for Adaptive and Continuous Signed Distance Field Construction, and Its Applications

  • Kim, Jong-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.63-69
    • /
    • 2022
  • In this paper, we propose a new GPU-based framework for quickly calculating adaptive and continuous SDF(Signed distance fields), and examine cases related to rendering/collision processing using them. The quadtree constructed from the triangle mesh is transferred to the GPU memory, and the Euclidean distance to the triangle is processed in parallel for each thread by using it to find the shortest continuous distance without discontinuity in the adaptive grid space. In this process, it is shown through experiments that the cut-off view of the adaptive distance field, the distance value inquiry at a specific location, real-time raytracing, and collision handling can be performed quickly and efficiently. Using the proposed method, the adaptive sign distance field can be calculated quickly in about 1 second even on a high polygon mesh, so it is a method that can be fully utilized not only for rigid bodies but also for deformable bodies. It shows the stability of the algorithm through various experimental results whether it can accurately sample and represent distance values in various models.