• Title/Summary/Keyword: parallel program

Search Result 592, Processing Time 0.026 seconds

Design of 32 bit Parallel Processor Core for High Energy Efficiency using Instruction-Levels Dynamic Voltage Scaling Technique

  • Yang, Yil-Suk;Roh, Tae-Moon;Yeo, Soon-Il;Kwon, Woo-H.;Kim, Jong-Dae
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.9 no.1
    • /
    • pp.1-7
    • /
    • 2009
  • This paper describes design of high energy efficiency 32 bit parallel processor core using instruction-levels data gating and dynamic voltage scaling (DVS) techniques. We present instruction-levels data gating technique. We can control activation and switching activity of the function units in the proposed data technique. We present instruction-levels DVS technique without using DC-DC converter and voltage scheduler controlled by the operation system. We can control powers of the function units in the proposed DVS technique. The proposed instruction-levels DVS technique has the simple architecture than complicated DVS which is DC-DC converter and voltage scheduler controlled by the operation system and a hardware implementation is very easy. But, the energy efficiency of the proposed instruction-levels DVS technique having dual-power supply is similar to the complicated DVS which is DC-DC converter and voltage scheduler controlled by the operation system. We simulate the circuit simulation for running test program using Spectra. We selected reduced power supply to 0.667 times of the supplied power supply. The energy efficiency of the proposed 32 bit parallel processor core using instruction-levels data gating and DVS techniques can improve about 88.4% than that of the 32 bit parallel processor core without using those. The designed high energy efficiency 32 bit parallel processor core can utilize as the coprocessor processing massive data at high speed.

Evaluation of DES key search stability using Parallel Computing (병렬 컴퓨팅을 이용한 DES 키 탐색 안정성 분석)

  • Yoon, JunWeon;Choi, JangWon;Park, ChanYeol;Kong, Ki-Sik
    • Journal of Digital Contents Society
    • /
    • v.14 no.1
    • /
    • pp.65-72
    • /
    • 2013
  • Current and future parallel computing model has been suggested for running and solving large-scale application problems such as climate, bio, cryptology, and astronomy, etc. Parallel computing is a form of computation in which many calculations are carried out simultaneously. And we are able to shorten the execution time of the program, as well as can extend the scale of the problem that can be solved. In this paper, we perform the actual cryptographic algorithms through parallel processing and evaluate its efficiency. Length of the key, which is stable criterion of cryptographic algorithm, judged according to the amount of complete enumeration computation. So we present a detailed procedure of DES key search cryptographic algorithms for executing of enumeration computation in parallel processing environment. And then, we did the simulation through applying to clustering system. As a result, we can measure the safety and solidity of cryptographic algorithm.

A Study on the Development and Application of a Design Program for Centrifugal Turbo Fan (원심 터보홴 설계용 프로그램의 개발 및 응용에 대한 연구)

  • Kim, Jang-Kweon;Oh, Seok-Hyung
    • Journal of Power System Engineering
    • /
    • v.20 no.6
    • /
    • pp.71-79
    • /
    • 2016
  • This paper introduces the design method of the centrifugal turbo fan and the process of developing the design program of it. The developed design program confirmed the applicability by experimental performance data. Here, we proposed new velocity coefficients and considered various losses such as impeller inlet loss, vane passage flow loss, casing pressure loss, recirculation loss power, and disk friction loss power. Especially, the inlet and outlet widths of the impeller were newly determined by reflecting the experimental results. As a result, this fan design program shows a good performance result regardless of the types of impeller and is expected to be a very useful design tool.

Effect of Program Promoting Intention to Exercise Performance Based Theory of Planned Behavior in the Elderly (농촌 지역 퇴행성 관절염 노인을 대상으로 한 운동수행 의도 증진프로그램의 효과)

  • Kim, Jin-Soon;Hyun, Hye-Jin
    • Journal of Korean Biological Nursing Science
    • /
    • v.17 no.1
    • /
    • pp.1-10
    • /
    • 2015
  • Purpose: This study is aimed at grasping the benefit/effect of program promoting intention to exercise performance based theory of planned behavior in the elderly who live in the rural areas with degenerative joint diseases (DJDs). Methods: There were 2 groups; 32 people in the experimental group and 24 in the control group, all above the age of 60. Program promoting intention to exercise performance was applied to the experimental group for 12 weeks. Results: Compared to the control group, the experimental group showed a significant the increase of attitude towards exercise, subjective norm, perceived behavior control, exercising intention, and exercise performance. Also, pain as a physical function, joint stiffness, ADLs, body flexibility, parallel, perceived health state as a psychological function, and life satisfaction were significantly improved. Conclusion: We expect that program promoting intention to exercise performance is used in nursing practice for the elderly with DJDs are needed to manage lifestyle.

Dynamic Load Balancing Algorithm using Execution Time Prediction on Cluster Systems

  • Yoon, Wan-Oh;Jung, Jin-Ha;Park, Sang-Bang
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.176-179
    • /
    • 2002
  • In recent years, an increasing amount of computer network research has focused on the problem of cluster system in order to achieve higher performance and lower cost. The load unbalance is the major defect that reduces performance of a cluster system that uses parallel program in a form of SPMD (Single Program Multiple Data). Also, the load unbalance is a problem of MPP (Massive Parallel Processors), and distributed system. The cluster system is a loosely-coupled distributed system, therefore, it has higher communication overhead than MPP. Dynamic load balancing can solve the load unbalance problem of cluster system and reduce its communication cost. The cluster systems considered in this paper consist of P heterogeneous nodes connected by a switch-based network. The master node can predict the average execution time of tasks for each slave node based on the information from the corresponding slave node. Then, the master node redistributes remaining tasks to each node considering the predicted execution time and the communication overhead for task migration. The proposed dynamic load balancing uses execution time prediction to optimize the task redistribution. The various performance factors such as node number, task number, and communication cost are considered to improve the performance of cluster system. From the simulation results, we verified the effectiveness of the proposed dynamic load balancing algorithm.

  • PDF

Refined fixed granularity algorithm on Networks of Workstations (NOW 환경에서 개선된 고정 분할 단위 알고리즘)

  • Gu, Bon-Geun
    • The KIPS Transactions:PartA
    • /
    • v.8A no.2
    • /
    • pp.117-124
    • /
    • 2001
  • At NOW (Networks Of Workstations), the load sharing is very important role for improving the performance. The known load sharing strategy is fixed-granularity, variable-granularity and adaptive-granularity. The variable-granularity algorithm is sensitive to the various parameters. But Send algorithm, which implements the fixed-granularity strategy, is robust to task granularity. And the performance difference between Send and variable-granularity algorithm is not substantial. But, in Send algorithm, the computing time and the communication time are not overlapped. Therefore, long latency time at the network has influence on the execution time of the parallel program. In this paper, we propose the preSend algorithm. In the preSend algorithm, the master node can send the data to the slave nodes in advance without the waiting for partial results from the slaves. As the master node sent the next data to the slaves in advance, the slave nodes can process the data without the idle time. As stated above, the preSend algorithm can overlap the computing time and the communication time. Therefore we reduce the influence of the long latency time at the network and the execution time of the parallel program on the NOW. To compare the execution time of two algorithms, we use the $320{\times}320$ matrix multiplication. The comparison results of execution times show that the preSend algorithm has the shorter execution time than the Send algorithm.

  • PDF

Optimal Amplify-and-Forward Scheme for Parallel Relay Networks with Correlated Relay Noise

  • Liu, Binyue;Yang, Ye
    • ETRI Journal
    • /
    • v.36 no.4
    • /
    • pp.599-608
    • /
    • 2014
  • This paper studies a parallel relay network where the relays employ an amplify-and-forward (AF) relaying scheme and are subjected to individual power constraints. We consider correlated effective relay noise arising from practical scenarios when the relays are exposed to common interferers. Assuming that the noise covariance and the full channel state information are available, we investigate the problem of finding the optimal AF scheme in terms of maximum end-to-end transmission rate. It is shown that the maximization problem can be equivalently transformed to a convex semi-definite program, which can be efficiently solved. Then an upper bound on the maximum achievable AF rate of this network is provided to further evaluate the performance of the optimal AF scheme. It is proved that the upper bound can be asymptotically achieved in two special regimes when the transmit power of the source node or the relays is sufficiently large. Finally, both theoretical and numerical results are given to show that, on average, noise correlation is beneficial to the transmission rate - whether the relays know the noise covariance matrix or not.

Dynamics of moored arctic spar interacting with drifting level ice using discrete element method

  • Jang, HaKun;Kim, MooHyun
    • Ocean Systems Engineering
    • /
    • v.11 no.4
    • /
    • pp.313-330
    • /
    • 2021
  • In this study, the dynamic interaction between an Arctic Spar and drifting level ice is examined in time domain using the newly developed ice-hull-mooring coupled dynamics program. The in-house program, CHARM3D, which is the hull-riser-mooring coupled dynamic simulator is extended by coupling with the open-source discrete element method (DEM) simulator, LIGGGHTS. In the LIGGGHTS module, the parallel-bonding method is implemented to model the level ice using an assembly of multiple bonded spherical particles. As a case study, a spread-moored Artic Spar platform, whose hull surface near waterline is the inverted conical shape, is chosen. To determine the breaking-related DEM parameter (the critical bonding strength), the four-point numerical bending test is used. A series of numerical simulations is systematically performed under the various ice conditions including ice drift velocity, flexural strength, and thickness. Then, the effects of these parameters on the ice force, platform motions, and mooring tensions are discussed. The simulations reveal various features of dynamic interactions between the drifting ice and moored platform for various ice conditions including the novel synchronous resonance at low ice speed. The newly developed simulator is promising and can repeatedly be used for the future design and analysis including ice-floater-mooring coupled dynamics.

Building a Dynamic Analyzer for CUDA based System.

  • SALAH T. ALSHAMMARI
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.77-84
    • /
    • 2023
  • The utilization of GPUs on general-purpose computers is currently on the rise due to the increase in its programmability and performance requirements. The utility of tools like NVIDIA's CUDA have been designed to allow programmers to code algorithms by using C-like language for the execution process on the graphics processing units GPU. Unfortunately, many of the performance and correctness bugs will happen on parallel programs. The CUDA tool support for the parallel programs has not yet been actualized. The use of a dynamic analyzer to find performance and correctness bugs in CUDA programs facilitates the execution of sophisticated processes, especially in modern computing requirements. Any race conditions bug it will impact of program correctness and the share memory bank conflicts to improve the overall performance. The technique instruments the programs in a way that promotes accessibility of the memory locations accessed by different threads well as to check for any bugs in the code of a program. The instrumented source code will be used initiated directly in the device emulation code of CUDA to send report for the user about all errors. The current degree of automation helps programmers solve subtle bugs in highly complex programs or programs that cannot be analyzed manually.