• Title/Summary/Keyword: openMP

Search Result 178, Processing Time 0.025 seconds

Horizon Run 5: the largest cosmological hydrodynamic simulation

  • Kim, Juhan;Shin, Jihye;Snaith, Owain;Lee, Jaehyun;Kim, Yonghwi;Kwon, Oh-Kyung;Park, Chan;Park, Changbom
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.1
    • /
    • pp.33.2-33.2
    • /
    • 2019
  • Horizon Run 5 is the most massive cosmological hydrodynamic simulation ever performed until now. Owing to the large spatial volume ($717{\times}80{\times}80[cMpc/h]^3$) and the high resolution down to 1 kpc, we may study the cosmological effects on star and galaxy formations over a wide range of mass scales from the dwarf to the cluster. We have modified the public available Ramses code to harness the power of the OpenMP parallelism, which is necessary for running simulations in such a huge KISTI supercomputer called Nurion. We have reached z=2.3 from z=200 for a given simulation period of 50 days using 2500 computing nodes of Nurion. During the simulation run, we have saved snapshot data at 97 redshifts and two light cone space data, which will be used later for the study of various research fields in galaxy formation and cosmology. We will close this talk by listing possible research topics that will play a crucial role in helping us take lead in those areas.

  • PDF

Improvement and verification of the DeCART code for HTGR core physics analysis

  • Cho, Jin Young;Han, Tae Young;Park, Ho Jin;Hong, Ser Gi;Lee, Hyun Chul
    • Nuclear Engineering and Technology
    • /
    • v.51 no.1
    • /
    • pp.13-30
    • /
    • 2019
  • This paper presents the recent improvements in the DeCART code for HTGR analysis. A new 190-group DeCART cross-section library based on ENDF/B-VII.0 was generated using the KAERI library processing system for HTGR. Two methods for the eigen-mode adjoint flux calculation were implemented. An azimuthal angle discretization method based on the Gaussian quadrature was implemented to reduce the error from the azimuthal angle discretization. A two-level parallelization using MPI and OpenMP was adopted for massive parallel computations. A quadratic depletion solver was implemented to reduce the error involved in the Gd depletion. A module to generate equivalent group constants was implemented for the nodal codes. The capabilities of the DeCART code were improved for geometry handling including an approximate treatment of a cylindrical outer boundary, an explicit border model, the R-G-B checker-board model, and a super-cell model for a hexagonal geometry. The newly improved and implemented functionalities were verified against various numerical benchmarks such as OECD/MHTGR-350 benchmark phase III problems, two-dimensional high temperature gas cooled reactor benchmark problems derived from the MHTGR-350 reference design, and numerical benchmark problems based on the compact nuclear power source experiment by comparing the DeCART solutions with the Monte-Carlo reference solutions obtained using the McCARD code.

New application programming approach for MPSoC programming platform (MPSoC 프로그래밍 플랫폼과 재겨냥성 컴파일러 연동을 위한 새로운 응용 기술방법)

  • Yongjoo Kim;Jongwon Lee;Sanghyun Park;Jonghee Yoon;Doosan Cho;Yongin Kwon;Yunheung Paek
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.846-848
    • /
    • 2008
  • 최근들어 MPSoC 프로그래밍 방법에 대한 많은 연구들이 이루어지고 있다. 예전부터 연구가 진행된 모델 기반 프로그래밍 접근이나 UML 같은 모델기반 언어부터 최근에 많이 연구되고 있는 MPI[1] 나 OpenMP[2] 기반의 프로그래밍 방법, 그리고 그 외에도 다양한 접근 방식의 방법론이 연구되어 있다. 하지만 현재까지 대부분의 연구는 최종 결과물이 C 언어 형태로 나오게 되어 있다. 즉 MPSoC 환경을 위한 컴파일러가 따로 제작되어야 하고 이 점은 다양한 이종 MPSoC 환경이 존재한다는 점에서 컴파일러 제작에 많은 부담이 발생한다. 본 논문 본인이 이전에 연구했던 MPSoC 프로그래밍 플랫폼과 플랫폼에서 사용되는 입력 정보의 형태를 설명한다. 그리고 입력정보 형태를 변형하여 재겨냥성(retargetable) 컴파일러와 연동이 가능하게 하여 최종 결과물을 바이너리 형태로 생성할 수 있도록 한다.

A Research about Open Source Distributed Computing System for Realtime CFD Modeling (SU2 with OpenCL and MPI) (실시간 CFD 모델링을 위한 오픈소스 분산 컴퓨팅 기술 연구)

  • Lee, Jun-Yeob;Oh, Jong-woo;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.171-171
    • /
    • 2017
  • 전산유체역학(CFD: Computational Fluid Dynamics)를 이용한 스마트팜 환경 내부의 정밀 제어 연구가 진행 중이다. 시계열 데이터의 난해한 동적 해석을 극복하기위해, 비선형 모델링 기법의 일종인 인공신경망을 이용하는 방안을 고려하였다. 선행 연구를 통하여 환경 데이터의 비선형 모델링을 위한 Tensorflow활용 방법이 하드웨어 가속 기능을 바탕으로 월등한 성능을 보임을 확인하였다. 그럼에도 오프라인 일괄(Offline batch)처리 방식의 한계가 있는 인공신경망 모델링 기법과 현장 보급이 불가능한 고성능 하드웨어 연산 장치에 대한 대안 마련이 필요하다고 판단되었다. CFD 해석을 위한 Solver로 SU2(http://su2.stanford.edu)를 이용하였다. 운영 체제 및 컴파일러는 1) Mac OS X Sierra 10.12.2 Apple LLVM version 8.0.0 (clang-800.0.38), 2) Windows 10 x64: Intel C++ Compiler version 16.0, update 2, 3) Linux (Ubuntu 16.04 x64): g++ 5.4.0, 4) Clustered Linux (Ubuntu 16.04 x32): MPICC 3.3.a2를 선정하였다. 4번째 개발환경인 병렬 시스템의 경우 하드웨어 가속는 OpenCL(https://www.khronos.org/opencl/) 엔진을 이용하고 저전력 ARM 프로세서의 일종인 옥타코어 Samsung Exynos5422 칩을 장착한 ODROID-XU4(Hardkernel, AnYang, Korea) SBC(Single Board Computer)를 32식 병렬 구성하였다. 분산 컴퓨팅을 위한 환경은 Gbit 로컬 네트워크 기반 NFS(Network File System)과 MPICH(http://www.mpich.org/)로 구성하였다. 공간 분해능을 계측 주기보다 작게 분할할 경우 발생하는 미지의 바운더리 정보를 정의하기 위하여 3차원 Kriging Spatial Interpolation Method를 실험적으로 적용하였다. 한편 병렬 시스템 구성이 불가능한 1,2,3번 환경의 경우 내부적으로 이미 존재하는 멀티코어를 활용하고자 OpenMP(http://www.openmp.org/) 라이브러리를 활용하였다. 64비트 병렬 8코어로 동작하는 1,2,3번 운영환경의 경우 32비트 병렬 128코어로 동작하는 환경에 비하여 근소하게 2배 내외로 연산 속도가 빨랐다. 실시간 CFD 수행을 위한 분산 컴퓨팅 기술이 프로세서의 속도 및 운영체제의 정보 분배 능력에 따라 결정된다고 판단할 수 있었다. 이를 검증하기 위하여 4번 개발환경에서 운영체제를 64비트로 개선하여 5번째 환경을 구성하여 검증하였다. 상반되는 결과로 64비트 72코어로 동작하는 분산 컴퓨팅 환경에서 단일 프로세서 기반 멀티 코어(1,2,3번) 환경보다 보다 2.5배 내외 연산속도 향상이 있었다. ARM 프로세서용 64비트 운영체제의 완성도가 낮은 시점에서 추후 성공적인 실시간 CFD 모델링을 위한 지속적인 검토가 필요하다.

  • PDF

Postpartum Reproductive Management Based on the Routine Farm Records of a Dairy Herd: Relationship between the Metabolic Parameters and Postpartum Ovarian Activity

  • Takagi, Mitsuhiro;Hirai, Toshiya;Moriyama, Naoki;Ohtani, Masayuki;Miyamoto, Akio;Wijayagunawardane, Missaka P.B.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.18 no.6
    • /
    • pp.787-794
    • /
    • 2005
  • The aim of this study was 1) to confirm the practical efficiency of a routine milk P4 monitoring system for postpartum reproductive management of a dairy herd, and 2) to evaluate the relationship between the blood metabolic profiles, milk quality and body weight of individual cows in the farm records, which may reflect the postpartum nutritional condition, and the time of postpartum resumption of ovarian activity of dairy cows. A total of 116 Holstein cows was used in the present study. First, during the period of Experiment 1, postpartum reproductive management based on weekly measured milk P4 concentration from individual cows was conducted. Compared with the reproductive records of the past two years without P4 monitoring, although the day from calving to first AI did not change, both the number of AI until pregnant (with P4; 1.9 times vs. without P4; 2.9 times) and the days open (with P4; 95.1 days vs. without P4; 135.8 days and 133.8 days) were significantly decreased. In Experiment 2, the measurement of blood constituents such as albumin, blood urea nitrogen, packed cell volume, ammonia, glucose, total cholesterol, non-esterified, AST and $\gamma$-GTP was performed on the blood samples taken once approximately 14 days postpartum, to monitor both health and nutritional conditions. The milk constituent parameters, such as milk protein (MP), milk fat (MF), SNF and lactose, collected from the monthly progeny test of individual cows, were used to monitor the postpartum nutritional status. Furthermore, the data obtained from the routine measurements of body weight were used to calculate the rate of peripartum body weight loss. The resumption day of the postpartum estrous cycle was assumed from the milk P4 profiles of individual cows. There was no clear relationship between each parameter from blood examination and those from resumption time. However, the cows had low values of MP, and SNF, which significantly affected the resumption of the postpartum estrous cycle. Similarly, a higher rate of body weight loss indicated a significant delay (more than 1 month) in the resumption of the postpartum estrous cycle, compared with the groups that had a medium or lower rate of body weight loss. The results of the present study demonstrated that the implementation of routine milk P4 monitoring-based postpartum reproductive management, together with milk quality parameters and routine BW data available in field conditions may be utilized as a practical approach for increasing the postpartum reproductive efficiency of a high yielding dairy herd.

Retrieval of Legal Information Through Discovery Layers: A Case Study Related to Indian Law Libraries

  • Kushwah, Shivpal Singh;Singh, Ritu
    • Journal of Information Science Theory and Practice
    • /
    • v.4 no.3
    • /
    • pp.71-83
    • /
    • 2016
  • Purpose. The purpose of this paper is to analyze and evaluate discovery layer search tools for retrieval of legal information in Indian law libraries. This paper covers current practices in legal information retrieval with special reference to Indian academic law libraries, and analyses its importance in the domain of law.Design/Methodology/Approach. A web survey and observational study method are used to collect the data. Data related to the discovery tools were collected using email and further discussion held with the discovery layer/ tool /product developers and their representatives.Findings. Results show that most of the Indian law libraries are subscribing to bundles of legal information resources such as Hein Online, JSTOR, LexisNexis Academic, Manupatra, Westlaw India, SCC web, AIR Online (CDROM), and so on. International legal and academic resources are compatible with discovery tools because they support various standards related to online publishing and dissemination such as OAI/PMH, Open URL, MARC21, and Z39.50, but Indian legal resources such as Manupatra, Air, and SCC are not compatible with the discovery layers. The central index is one of the important components in a discovery search interface, and discovery layer services/tools could be useful for Indian law libraries also if they can include multiple legal and academic resources in their central index. But present practices and observations reveal that discovery layers are not providing facility to cover legal information resources. Therefore, in the present form, discovery tools are not very useful; they are an incomplete and half solution for Indian libraries because all available Indian legal resources available in the law libraries are not covered.Originality/Value. Very limited research or published literature is available in the area of discovery layers and their compatibility with legal information resources.

CPU Parallel Processing and GPU-accelerated Processing of UHD Video Sequence using HEVC (HEVC를 이용한 UHD 영상의 CPU 병렬처리 및 GPU가속처리)

  • Hong, Sung-Wook;Lee, Yung-Lyul
    • Journal of Broadcast Engineering
    • /
    • v.18 no.6
    • /
    • pp.816-822
    • /
    • 2013
  • The latest video coding standard HEVC was developed by the joint work of JCT-VC(Joint Collaborative Team on Video Coding) from ITU-T VCEG and ISO/IEC MPEG. The HEVC standard reduces the BD-Bitrate of about 50% compared with the H.264/AVC standard. However, using the various methods for obtaining the coding gains has increased complexity problems. The proposed method reduces the complexity of HEVC by using both CPU parallel processing and GPU-accelerated processing. The experiment result for UHD($3840{\times}2144$) video sequences achieves 15fps encoding/decoding performance by applying the proposed method. Sooner or later, we expect that the H/W speedup of data transfer rates between CPU and GPU will result in reducing the encoding/decoding times much more.

Parallelization and application of SACOS for whole core thermal-hydraulic analysis

  • Gui, Minyang;Tian, Wenxi;Wu, Di;Chen, Ronghua;Wang, Mingjun;Su, G.H.
    • Nuclear Engineering and Technology
    • /
    • v.53 no.12
    • /
    • pp.3902-3909
    • /
    • 2021
  • SACOS series of subchannel analysis codes have been developed by XJTU-NuTheL for many years and are being used for the thermal-hydraulic safety analysis of various reactor cores. To achieve fine whole core pin-level analysis, the input preprocessing and parallel capabilities of the code have been developed in this study. Preprocessing is suitable for modeling rectangular and hexagonal assemblies with less error-prone input; parallelization is established based on the domain decomposition method with the hybrid of MPI and OpenMP. For domain decomposition, a more flexible method has been proposed which can determine the appropriate task division of the core domain according to the number of processors of the server. By performing the calculation time evaluation for the several PWR assembly problems, the code parallelization has been successfully verified with different number of processors. Subsequent analysis results for rectangular- and hexagonal-assembly core imply that the code can be used to model and perform pin-level core safety analysis with acceptable computational efficiency.

Comparative Behavioral Correlation of High and Low-Performing Mice in the Forced Swim Test

  • Valencia, Schley;Gonzales, Edson Luck;Adil, Keremkleroo Jym;Jeon, Se Jin;Kwon, Kyoung Ja;Cho, Kyu Suk;Shin, Chan Young
    • Biomolecules & Therapeutics
    • /
    • v.27 no.4
    • /
    • pp.349-356
    • /
    • 2019
  • Behavioral analysis in mice provided important contributions in helping understand and treat numerous neurobehavioral and neuropsychiatric disorders. The behavioral performance of animals and humans is widely different among individuals but the neurobehavioral mechanism of the innate difference is seldom investigated. Many neurologic conditions share comorbid symptoms that may have common pathophysiology and therapeutic strategy. The forced swim test (FST) has been commonly used to evaluate the "antidepressant" properties of drugs yet the individual difference analysis of this test was left scantly investigated along with the possible connection among other behavioral domains. This study conducted an FST-screening in outbred CD-1 male mice and segregated them into three groups: high performers (HP) or the active swimmers, middle performers (MP), and low performers (LP) or floaters. After which, a series of behavioral experiments were performed to measure their behavioral responses in the open field, elevated plus maze, Y maze, three-chamber social assay, novel object recognition, delay discounting task, and cliff avoidance reaction. The behavioral tests battery revealed that the three groups displayed seemingly correlated differences in locomotor activity and novel object recognition but not in other behaviors. This study suggests that the HP group in FST has higher locomotor activity and novelty-seeking tendencies compared to the other groups. These results may have important implications in creating behavior database in animal models that could be used for predicting interconnections of various behavioral domains, which eventually helps to understand the neurobiological mechanism controlling the behaviors in individual subjects.

Performance Comparison of Tilera Many-core and x86-64 Multi-core Systems (Tilera 다중코어와 x86-64 멀티코어 시스템의 성능 비교)

  • Choi, HeeSeok;Lyoo, TaeMuk;Park, JiSu;Jung, Daeyong;Lim, JongBeom;Lee, Jungha;Suh, Teaweon;Yu, Heonchang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.102-105
    • /
    • 2013
  • 최근 멀티코어 시스템은 컴퓨터의 성능을 향상시키기 위해 더 많은 수의 코어를 연결시키는 다중코어 시스템으로 발전하고 있다. 그러나 멀티코어 시스템은 사용하는 코어의 아키텍처 구조와 개수에 따라 성능 차이가 발생한다. 이에, 본 논문에서는 코어의 아키텍처 구조와 코어의 개수가 성능에 미치는 영향을 분석하기 위해 Tilera의 다중코어 시스템인 Tile-Gx36, TilePro64와 Intel의 x86-64 멀티코어 시스템인 Core i5의 성능을 비교하였다. 코어의 사용률이 늘어남에 따른 성능차이를 알아보기 위해 벤치마크 프로그램인 SPEC CPU 2006을 이용하여 각 시스템 내 단일코어의 성능을 측정하고, OpenMP 벤치마크 프로그램을 이용하여 시스템의 모든 코어를 사용했을 때의 입력 데이터 크기에 따른 성능을 측정하였다. 실험 결과, 단일코어에서의 성능은 정수형 데이터를 사용하여 측정하였을 경우 Core i5가 Tile-Gx36보다 약 87%, 실수형 데이터를 사용하여 측정하였을 경우 약 94% 더 빠른 것으로 나타났다. 그러나 코어 전체를 이용한 성능 결과에서는 정수형 배열 크기가 이상일 경우 Tile-Gx36 시스템의 처리 속도가 Core i5 시스템 보다 평균적으로 약 7.6배 향상됨을 확인할 수 있었다. 따라서 Tilera의 다중코어 시스템은 클럭 속도와 아키텍처 구조의 영향으로 단일코어의 성능은 떨어지나, 병렬 처리를 이용한 고속연산에서는 성능이 향상된다고 할 수 있다.