• Title/Summary/Keyword: openMP

Search Result 178, Processing Time 0.021 seconds

Parallelizing 3D Frequency-domain Acoustic Wave Propagation Modeling using a Xeon Phi Coprocessor (제온 파이 보조 프로세서를 이용한 3차원 주파수 영역 음향파 파동 전파 모델링 병렬화)

  • Ryu, Donghyun;Jo, Sang Hoon;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.20 no.3
    • /
    • pp.129-136
    • /
    • 2017
  • 3D seismic data processing methods such as full waveform inversion or reverse-time migration require 3D wave propagation modeling and heavy calculations. We compared efficiency and accuracy of a Xeon Phi coprocessor to those of a high-end server CPU using 3D frequency-domain wave propagation modeling. We adopted the OpenMP parallel programming to the time-domain finite difference algorithm by considering the characteristics of the Xeon Phi coprocessors. We applied the Fourier transform using a running-integration to obtain the frequency-domain wavefield. A numerical test on frequency-domain wavefield modeling was performed using the 3D SEG/EAGE salt velocity model. Consequently, we could obtain an accurate frequency-domain wavefield and attain a 1.44x speedup using the Xeon Phi coprocessor compared to the CPU.

Detecting the First Race in OpenMP Program with Nested Parallelism (내포 병렬성을 가지는 OpenMP 프로그램의 최초 경합 탐지)

  • Chon, Byoung-Gyu;Woo, Jong-Jung;Jun, Yong-Kee
    • The KIPS Transactions:PartA
    • /
    • v.8A no.3
    • /
    • pp.253-260
    • /
    • 2001
  • It is important to detect races for debugging shared-memoy parallel programs, because the races cause unintended nondeterministic program execution. Previous on-the-fly techniques to detect races can not guarantee the first race detection in nested parallel programs. Detecting the first race is important for debugging parallel programs, since the removal of the first race may make the next occurred races disappear. In this paper, we presents an on-the-fly detection technique to detect all of the first races through the reexecution of the debugged programs. We assume that the debugged parallel program may have one-way nested parallel programs. The number of reexecution is at the least the nesting depth of the program in the worst case. The space complexity is O(VT) and the time complexity to detect race in each access of access history is O(T), where V is number of shared variables and T is the maximum parallelism of the program. This efficiency of our technique in each execution is the same with the previous on-the-fly detection techniques. Therefore, this technique makes debugging parallel programs more effective and practical.

  • PDF

Study on Parallel Processing for Efficient Flexible Multibody Analysis based on Subsystem Synthesis Method (병렬 처리를 이용한 부분 시스템 기반 유연다물체 동역학의 효율적인 해석 연구)

  • Han, Jong-Boo;Song, Hajun;Kim, Sung-Soo
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.41 no.6
    • /
    • pp.507-515
    • /
    • 2017
  • Flexible multibody simulations are widely used in the industry to design mechanical systems. In flexible multibody dynamics, deformation coordinates are described either relatively in the body reference frame that is floating in the space or in the inertial reference frame. Moreover, these deformation coordinates are generated based on the discretization of the body according to the finite element approach. Therefore, the formulation of the flexible multibody system always deals with a huge number of degrees of freedom and the numerical solution methods require a substantial amount of computational time. Parallel computational methods are a solution for efficient computation. However, most of the parallel computational methods are focused on the efficient solution of large-sized linear equations. For multibody analysis, we need to develop an efficient formulation that could be suitable for parallel computation. In this paper, we developed a subsystem synthesis method for a flexible multibody system and proposed efficient parallel computational schemes based on the OpenMP API in order to achieve efficient computation. Simulations of a rotating blade system, which consists of three identical blades, were carried out with two different parallel computational schemes. Actual CPU times were measured to investigate the efficiency of the proposed parallel schemes.

Multi-core-based Parallel Query of 3D Point Cloud Indexed in Octree (옥트리로 색인한 3차원 포인트 클라우드의 다중코어 기반 병렬 탐색)

  • Han, Soohee
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.31 no.4
    • /
    • pp.301-310
    • /
    • 2013
  • The aim of the present study is to enhance query speed of large 3D point cloud indexed in octree by parallel query using multi-cores. Especially, it is focused on developing methods of accessing multiple leaf nodes in octree concurrently to query points residing within a radius from a given coordinates. To the end, two parallel query methods are suggested using different strategies to distribute query overheads to each core: one using automatic division of 'for routines' in codes controlled by OpenMP and the other considering spatial division. Approximately 18 million 3D points gathered by a terrestrial laser scanner are indexed in octree and tested in a system with a 8-core CPU to evaluate the performances of a non-parallel and the two parallel methods. In results, the performances of the two parallel methods exceeded non-parallel one by several times and the two parallel rivals showed competing aspects confronting various query radii. Parallel query is expected to be accelerated by anticipated improvements of distribution strategies of query overhead to each core.

Parallel Computing on Intensity Offset Tracking Using Synthetic Aperture Radar for Retrieval of Glacier Velocity

  • Hong, Sang-Hoon
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.1
    • /
    • pp.29-37
    • /
    • 2019
  • Synthetic Aperture Radar (SAR) observations are powerful tools to monitor surface's displacement very accurately, induced by earthquake, volcano, ground subsidence, glacier movement, etc. Especially, radar interferometry (InSAR) which utilizes phase information related to distance from sensor to target, can generate displacement map in line-of-sight direction with accuracy of a few cm or mm. Due to decorrelation effect, however, degradation of coherence in the InSAR application often prohibit from construction of differential interferogram. Offset tracking method is an alternative approach to make a two-dimensional displacement map using intensity information instead of the phase. However, there is limitation in that the offset tracking requires very intensive computation power and time. In this paper, efficiency of parallel computing has been investigated using high performance computer for estimation of glacier velocity. Two TanDEM-X SAR observations which were acquired on September 15, 2013 and September 26, 2013 over the Narsap Sermia in Southwestern Greenland were collected. Atotal of 56 of 2.4 GHz Intel Xeon processors(28 physical processors with hyperthreading) by operating with linux environment were utilized. The Gamma software was used for application of offset tracking by adjustment of the number of processors for the OpenMP parallel computing. The processing times of the offset tracking at the 256 by 256 pixels of window patch size at single and 56 cores are; 26,344 sec and 2,055 sec, respectively. It is impressive that the processing time could be reduced significantly about thirteen times (12.81) at the 56 cores usage. However, the parallel computing using all the processors prevent other background operations or functions. Except the offset tracking processing, optimum number of processors need to be evaluated for computing efficiency.

Smart Alarm Clock using Weather Information and Arduino (날씨 정보와 아두이노를 이용한 스마트 알람 시계)

  • Heo, Gyeongyong;Kim, Koang Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.8
    • /
    • pp.889-895
    • /
    • 2019
  • It is not easy to keep time promises in the complex daily lives. Especially, the increase in the number of vehicles causes traffic congestion in commuting time, which results in the delayed arrival and varies greatly depending on the weather conditions. In this paper, proposed is a smart alarm clock that automatically adjusts the alarm time according to weather conditions and suggests ways to deal with traffic congestion. The proposed smart alarm clock is designed to operate the functions of a normal alarm clock using touch functionality. In addition, it is designed to find weather information using open API and to automatically change alarm time to prepare for expected time delay. The proposed design was implemented based on Arduino Mega2560 and a touch TFT-LCD. WiFi module for internet connection, RTC module for clock function and MP3 player module for alarm sound playback were used together. The proposed design has been filed as a patent and is currently under review.

Improvement of Processing Speed for UAV Attitude Information Estimation Using ROI and Parallel Processing

  • Ha, Seok-Wun;Park, Myeong-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.155-161
    • /
    • 2021
  • Recently, researches for military purposes such as precision tracking and mission completion using UAVs have been actively conducted. In particular, if the posture information of the leading UAV is estimated and the mission UAV uses this information to follow in stealth and complete its mission, the speed of the posture information estimation of the guide UAV must be processed in real time. Until recently, research has been conducted to accurately estimate the posture information of the leading UAV using image processing and Kalman filters, but there has been a problem in processing speed due to the sequential processing of the processing process. Therefore, in this study we propose a way to improve processing speed by applying methods that the image processing area is limited to the ROI area including the object, not the entire area, and the continuous processing is distributed to OpenMP-based multi-threads and processed in parallel with thread synchronization to estimate attitude information. Based on the experimental results, it was confirmed that real-time processing is possible by improving the processing speed by more than 45% compared to the basic processing, and thus the possibility of completing the mission can be increased by improving the tracking and estimating speed of the mission UAV.

Effect of Iron(II)-ascorbate Complex on Protein and DNA of Phages (파아지 단백질 및 DNA에 대한 2가철-아스코르빈산착체의 영향)

  • Lho, Il-Hwan;Murata, Akira
    • Korean Journal of Food Science and Technology
    • /
    • v.25 no.1
    • /
    • pp.46-51
    • /
    • 1993
  • The inactivating effect of iron(II)-ascorbate complex (Fe-Asc) on various phages was previously reported. This paper describes the molecular target in the phage virion attacked by Fe-Asc. The effect of Fe-Asc on protein was investigated with bovine serum albumin and the structural protein of phage J1. There were no differences in the SDS-polyacrylamide gel electrophoresis (patterns of these two proteins when either they were treated) with Fe-Asc or not. Also, there were no changes in the amino acid composition and ultraviolet spectrum of the proteins. The effects of Fe-Asc on DNA was investigated with pUC18 DNA, M13mpB DNA and ${\lambda}$ DNA as well as DNA from phage J1. Fe-Asc caused initially nicking of the subsequently form of pUC18 DNA to yield the open circular form and then subsequently the linear form. Strand breaks were also confirmed with M13mp8 DNA and ${\lambda}$ DNA as well as J1 DNA. The results indicate that the strand breaks in phage DNA could be responsible for the inactivation of phages by Fe-Asc.

  • PDF

The Study on Development of a Digital Internet Radio Receiver (디지털 인터넷 라디오 수신기 구현에 대한 연구)

  • Park, In-Gyu
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.12 no.2
    • /
    • pp.102-110
    • /
    • 2006
  • This paper explains the design and development of the stand-alone high sound quality Internet Radio system, which is aimed for a small embedded type audio device rather than a general PC type. This device is designed to work with an Internet connection. This kind of system is not standardized so far, and also the related algorithm is not open to the public. So it is necessary to analyze several receiving algorithms of current radio receivers, and develop our own hardware in order to overcome these obstacles, finally to get the high quality of sound radio. The main electronic components of this Internet Radio are TCP/IP interfaces, an audio MP3 decoder, an I/O interface, and a Flash Memory Card with advanced audio multicasting for the next-generation Internet Radio. Basic structures and implementation issues of the next-generation most-versatile digital music player, and Internet Radio receivers, are discussed.

pVC, a Small Cryptic Plasmid from the Environmental Isolate of Vibrio cholerae MP-1

  • Zhang, Ruifu;Wang, Yanling;Leung, Pak Chow;Gu, Ji-Dong
    • Journal of Microbiology
    • /
    • v.45 no.3
    • /
    • pp.193-198
    • /
    • 2007
  • A marine bacterium was isolated from Mai Po Nature Reserve of Hong Kong and identified as Vibrio cholerae MP-1. It contains a small plasmid designated as pVC of 3.8 kb. Four open reading frames (ORFs) are identified on the plasmid, but none of them shows homology to any known protein. Database search indicated that a 440 bp fragment is 96% identical to a fragment found in a small plasmid of another V. cholerae. Further experiments demonstrated that a 2.3 kb EcoRI fragment containing the complete ORF1, partial ORF4 and their intergenic region could self-replicate. Additional analyses revealed that sequence upstream of ORF1 showed the features characteristic of theta type replicons. Protein encoded by ORF1 has two characteristic motifs existed in most replication initiator proteins (Rep): the leucine zipper (LZ) motif located at the N-terminal region and the alpha helix-turn-alpha helix motif (HTH) located at the C-terminal end. The results suggest that pVC replicates via the theta type mechanism and is likely a novel type of theta replicon.