• Title/Summary/Keyword: and Parallel Processing

Search Result 2,013, Processing Time 0.031 seconds

Embedding a Mesh of Size 2n ×2m Into a Twisted Cube (크기 2n ×2m인 메쉬의 꼬인 큐브에 대한 임베딩)

  • Kim, Sook-Yeon
    • The KIPS Transactions:PartA
    • /
    • v.16A no.4
    • /
    • pp.223-226
    • /
    • 2009
  • The twisted cube has received great attention as an interconnection network of parallel systems because it has several superior properties, especially in diameter, to the hypercube. It was recently known that, for even m, a mesh of size $2{\times}2^m$ can be embedded into a twisted cube with dilation 1 and expansion 1 and a mesh of size $4{\times}2^m$ with dilation 1 and expansion 2 [Lai and Tsai, 2008]. However, as we know, it has been a conjecture that a mesh with more than eight rows and columns can be embedded into a twisted cube with dilation 1. In this paper, we show that a mesh of size $2^n{\times}2^m$ can be embedded into a twisted cube with dilation 1 and expansion $2^{n-1}$ for even m and with dilation 1 and expansion $2^n$ for odd m where $1{\leq}n{\leq}m$.

Mutual Authentication Protocol for Safe Data Transmission of Multi-distributed Web Cluster Model (다중 분산 웹 클러스터모델의 안전한 데이터 전송을 위한 상호 인증 프로토콜)

  • Lee, Kee-Jun;Kim, Chang-Won;Jeong, Chae-Yeong
    • The KIPS Transactions:PartC
    • /
    • v.8C no.6
    • /
    • pp.731-740
    • /
    • 2001
  • Multi-distributed web cluster model expanding conventional cluster system is the cluster system which processes large-scaled work demanded from users with parallel computing method by building a number of system nodes on open network into a single imaginary network. Multi-distributed web cluster model on the structured characteristics exposes internal system nodes by an illegal third party and has a potential that normal job performance is impossible by the intentional prevention and attack in cooperative work among system nodes. This paper presents the mutual authentication protocol of system nodes through key division method for the authentication of system nodes concerned in the registration, requirement and cooperation of service code block of system nodes and collecting the results and then designs SNKDC which controls and divides symmetrical keys of the whole system nodes safely and effectively. SNKDC divides symmetrical keys required for performing the work of system nodes and the system nodes transmit encoded packet based on the key provided. Encryption packet given and taken between system nodes is decoded by a third party or can prevent the outflow of information through false message.

  • PDF

Two-Level Hierarchical Production Planning for a Semiconductor Probing Facility (반도체 프로브 공정에서의 2단계 계층적 생산 계획 방법 연구)

  • Bang, June-Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.4
    • /
    • pp.159-167
    • /
    • 2015
  • We consider a wafer lot transfer/release planning problem from semiconductor wafer fabrication facilities to probing facilities with the objective of minimizing the deviation of workload and total tardiness of customers' orders. Due to the complexity of the considered problem, we propose a two-level hierarchical production planning method for the lot transfer problem between two parallel facilities to obtain an executable production plan and schedule. In the higher level, the solution for the reduced mathematical model with Lagrangian relaxation method can be regarded as a coarse good lot transfer/release plan with daily time bucket, and discrete-event simulation is performed to obtain detailed lot processing schedules at the machines with a priority-rule-based scheduling method and the lot transfer/release plan is evaluated in the lower level. To evaluate the performance of the suggested planning method, we provide computational tests on the problems obtained from a set of real data and additional test scenarios in which the several levels of variations are added in the customers' demands. Results of computational tests showed that the proposed lot transfer/planning architecture generates executable plans within acceptable computational time in the real factories and the total tardiness of orders can be reduced more effectively by using more sophisticated lot transfer methods, such as considering the due date and ready times of lots associated the same order with the mathematical formulation. The proposed method may be implemented for the problem of job assignment in back-end process such as the assignment of chips to be tested from assembly facilities to final test facilities. Also, the proposed method can be improved by considering the sequence dependent setup in the probing facilities.

Optimization Study of Toom-Cook Algorithm in NIST PQC SABER Utilizing ARM/NEON Processor (ARM/NEON 프로세서를 활용한 NIST PQC SABER에서 Toom-Cook 알고리즘 최적화 구현 연구)

  • Song, JinGyo;Kim, YoungBeom;Seo, Seog Chung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.463-471
    • /
    • 2021
  • Since 2016, National Institute of Standards and Technology (NIST) has been conducting a post quantum cryptography standardization project in preparation for a quantum computing environment. Three rounds are currently in progress, and most of the candidates (5/7) are lattice-based. Lattice-based post quantum cryptography is evaluated to be applicable even in an embedded environment where resources are limited by providing efficient operation processing and appropriate key length. Among them, SABER KEM provides the efficient modulus and Toom-Cook to process polynomial multiplication with computation-intensive tasks. In this paper, we present the optimized implementation of evaluation and interpolation in Toom-Cook algorithm of SABER utilizing ARM/NEON in ARMv8-A platform. In the evaluation process, we propose an efficient interleaving method of ARM/NEON, and in the interpolation process, we introduce an optimized implementation methodology applicable in various embedded environments. As a result, the proposed implementation achieved 3.5 times faster performance in the evaluation process and 5 times faster in the interpolation process than the previous reference implementation.

Efficient GPU Framework for Adaptive and Continuous Signed Distance Field Construction, and Its Applications

  • Kim, Jong-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.63-69
    • /
    • 2022
  • In this paper, we propose a new GPU-based framework for quickly calculating adaptive and continuous SDF(Signed distance fields), and examine cases related to rendering/collision processing using them. The quadtree constructed from the triangle mesh is transferred to the GPU memory, and the Euclidean distance to the triangle is processed in parallel for each thread by using it to find the shortest continuous distance without discontinuity in the adaptive grid space. In this process, it is shown through experiments that the cut-off view of the adaptive distance field, the distance value inquiry at a specific location, real-time raytracing, and collision handling can be performed quickly and efficiently. Using the proposed method, the adaptive sign distance field can be calculated quickly in about 1 second even on a high polygon mesh, so it is a method that can be fully utilized not only for rigid bodies but also for deformable bodies. It shows the stability of the algorithm through various experimental results whether it can accurately sample and represent distance values in various models.

Analysis Method of X-Ray Diffraction Characteristic Values and Measured Strain for Steep Stress Gradient of Metal Material Surface Layer (금속재료 표면층의 급격한 응력구배에 대한 X-Ray회절 특성값과 측정된 변형률의 해석방법)

  • Chang-Suk Han;Chan-Woo Lee
    • Korean Journal of Materials Research
    • /
    • v.33 no.2
    • /
    • pp.54-62
    • /
    • 2023
  • The most comprehensive and particularly reliable method for non-destructively measuring the residual stress of the surface layer of metals is the sin2ψ method. When X-rays were used the relationship of εφψ-sin2ψ measured on the surface layer of the processing metal did not show linearity when the sin2ψ method was used. In this case, since the effective penetration depth changes according to the changing direction of the incident X-ray, σφ becomes a sin2ψ function. Since σφ cannot be used as a constant, the relationship in εφψ-sin2ψ cannot be linear. Therefore, in this paper, the orthogonal function method according to Warren's diffraction theory and the basic profile of normal distribution were synthesized, and the X-ray diffraction profile was calculated and reviewed when there was a linear strain (stress) gradient on the surface. When there is a strain gradient, the X-ray diffraction profile becomes asymmetric, and as a result, the peak position, the position of half-maximum, and the centroid position show different values. The difference between the peak position and the centroid position appeared more clearly as the strain (stress) gradient became larger, and the basic profile width was smaller. The weighted average strain enables stress analysis when there is a strain (stress) gradient, based on the strain value corresponding to the centroid position of the diffracted X-rays. At the 1/5 Imax max height of X-ray diffraction, the position where the diffracted X-ray is divided into two by drawing a straight line parallel to the background, corresponds approximately to the centroid position.

Resolving Memory Bottlenecks in Hardware Accelerators with Data Prefetch

  • Hyein Lee;Jinoo Joung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.6
    • /
    • pp.1-12
    • /
    • 2024
  • Deep learning with faster and more accurate results requires large amounts of storage space and large computations. Accordingly, many studies are using hardware accelerators for quick and accurate calculations. However, the performance bottleneck is due to data movement between the hardware accelerators and the CPU. In this paper, we propose a data prefetch strategy that can efficiently reduce such operational bottlenecks. The core idea of the data prefetch strategy is to predict the data needed for the next task and upload it to local memory while the hardware accelerator (Matrix Multiplication Unit, MMU) performs a task. This strategy can be enhanced by using a dual buffer to perform read and write operations simultaneously. This reduces latency and execution time of data transfer. Through simulations, we demonstrate a 24% improvement in the performance of hardware accelerators by maximizing parallel processing with dual buffers and bottlenecks between memories with data prefetch.

A Pipelined Design of the Block Cipher Algorithm SEED (SEED 블록 암호 알고리즘의 파이프라인 하드웨어 설계)

  • 엄성용;이규원;박선화
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.3_4
    • /
    • pp.149-159
    • /
    • 2003
  • The need for information security increases interests on cipher algorithms recently. Especially, a large volume of data transmission over high-band communication network requires faster encryption and decryption techniques for real-time processing. It would be a good solution for this problem that we implement the cipher algorithm in forms of hardware circuits. Though some previous researches use this approach, they focus only on repeatedly executing the core part of the algorithm to minimize the hardware chip size, while most cipher algorithms are inherently parallel. In this paper, we propose a new design for the SEED block cipher algorithm developed by KISA (Korea Information Security Agency) in 1998 as Korean standard cipher algorithm. It exploits the parallelism of the algorithm basically and implements it in a pipelined fashion. We described the design in VHDL program and performed functional simulations on the program, and then found that it worked correctly. In addition, we synthesized it and verified that it could be implemented in a single FPGA chip, implying that the new design can be Practically used for the actual hardware implementation of a high-speed and high-performance cipher system.

Fabrication of Porous Mo by Freeze-Drying and Hydrogen Reduction of MoO3/Camphene Slurry (MoO3/camphene 슬러리의 동결건조 및 수소환원에 의한 Mo 다공체 제조)

  • Lee, Wonsuk;Oh, Sung-Tag
    • Journal of Powder Materials
    • /
    • v.19 no.6
    • /
    • pp.446-450
    • /
    • 2012
  • In order to fabricate the porous Mo with controlled pore characteristics, unique processing by using $MoO_3$ powder as the source and camphene as the sublimable material is introduced. Camphene-based 15 vol% $MoO_3$ slurries, prepared by milling at $50^{\circ}C$ with a small amount of dispersant, were frozen at $-25^{\circ}C$. Pores were generated subsequently by sublimation of the camphene during drying in air for 48 h. The green body was hydrogen-reduced at $750^{\circ}C$, and sintered at $1000-1100^{\circ}C$ for 1 h. After heat treatment in hydrogen atmosphere, $MoO_3$ powders were completely converted to metallic W without any reaction phases. The sintered samples showed large pores with the size of about $150{\mu}m$ which were aligned parallel to the camphene growth direction. Also, the internal wall of large pores and near bottom part of specimen had relatively small pores due to the difference in the camphene growth rate during freezing process. The size of small pores was decreased with increase in sintering temperature, while that of large pores was unchanged. The results are strongly suggested that the porous metal with required pore characteristics can be successfully fabricated by freeze-drying process using metal oxide powders.

A Comparative Performance Analysis of Spark-Based Distributed Deep-Learning Frameworks (스파크 기반 딥 러닝 분산 프레임워크 성능 비교 분석)

  • Jang, Jaehee;Park, Jaehong;Kim, Hanjoo;Yoon, Sungroh
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.5
    • /
    • pp.299-303
    • /
    • 2017
  • By piling up hidden layers in artificial neural networks, deep learning is delivering outstanding performances for high-level abstraction problems such as object/speech recognition and natural language processing. Alternatively, deep-learning users often struggle with the tremendous amounts of time and resources that are required to train deep neural networks. To alleviate this computational challenge, many approaches have been proposed in a diversity of areas. In this work, two of the existing Apache Spark-based acceleration frameworks for deep learning (SparkNet and DeepSpark) are compared and analyzed in terms of the training accuracy and the time demands. In the authors' experiments with the CIFAR-10 and CIFAR-100 benchmark datasets, SparkNet showed a more stable convergence behavior than DeepSpark; but in terms of the training accuracy, DeepSpark delivered a higher classification accuracy of approximately 15%. For some of the cases, DeepSpark also outperformed the sequential implementation running on a single machine in terms of both the accuracy and the running time.