• Title/Summary/Keyword: 프로세서 구조

Search Result 1,042, Processing Time 0.025 seconds

Implementation of Multicore-Aware Load Balancing on Clusters through Data Distribution in Chapel (클러스터 상에서 다중 코어 인지 부하 균등화를 위한 Chapel 데이터 분산 구현)

  • Gu, Bon-Gen;Carpenter, Patrick;Yu, Weikuan
    • The KIPS Transactions:PartA
    • /
    • v.19A no.3
    • /
    • pp.129-138
    • /
    • 2012
  • In distributed memory architectures like clusters, each node stores a portion of data. How data is distributed across nodes influences the performance of such systems. The data distribution scheme is the strategy to distribute data across nodes and realize parallel data processing. Due to various reasons such as maintenance, scale up, upgrade, etc., the performance of nodes in a cluster can often become non-identical. In such clusters, data distribution without considering performance cannot efficiently distribute data on nodes. In this paper, we propose a new data distribution scheme based on the number of cores in nodes. We use the number of cores as the performance factor. In our data distribution scheme, each node is allocated an amount of data proportional to the number of cores in it. We implement our data distribution scheme using the Chapel language. To show our data distribution is effective in reducing the execution time of parallel applications, we implement Mandelbrot Set and ${\pi}$-Calculation programs with our data distribution scheme, and compare the execution times on a cluster. Based on experimental results on clusters of 8-core and 16-core nodes, we demonstrate that data distribution based on the number of cores can contribute to a reduction in the execution times of parallel programs on clusters.

Development of the HEMP Generation, Propagation Analysis, and Optimal Shelter Design Tool (고 고도 전자기파(HEMP) 발생과 전파해석 및 방호실 최적 설계 Tool 개발)

  • Kim, Dong Il;Min, Gyeong Chan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.10
    • /
    • pp.2331-2338
    • /
    • 2014
  • The HEMP threat may have acquired new, and urgent, relevance as the proliferation of nuclear weapons and missile technology accelerates of the North Korea, for example, is assessed as already having developed few atomic weapons, and is on the verge of North Korea already has missiles capable of delivering a nuclear warhead against South Korea. ITU K.78, K81 and IEC recommended its counter-measuring for the industrial facilities with navigation and sailing facilities in order to obviate the all of processor equipped system malfunctions from the EMP/HEMP but its simulation must only be done by the computer simulation which had studied on the 1960-1990 years USA/AFWL papers. This result has a significant activities to the South Korea being under the North Korea threat because all of HEMP related products was strongly limited for export. The HEMP cord which was developed newly by the KTI including the HEMP generation & propagation analysis, optimal shelter design tool, essential EM energy attenuation in multi-layered various soils and rocks and HEMP filter design tool. Especially, the least square fitting method was adopted to analysis for the EM energy attenuation in the soils and rocks because it has a various characteristics based on the many times field test reports.

A Hardware Design Space Exploration toward Low-Area and High-Performance Architecture for the 128-bit Block Cipher Algorithm SEED (128-비트 블록 암호화 알고리즘 SEED의 저면적 고성능 하드웨어 구조를 위한 하드웨어 설계 공간 탐색)

  • Yi, Kang
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.4
    • /
    • pp.231-239
    • /
    • 2007
  • This paper presents the trade-off relationship between area and performance in the hardware design space exploration for the Korean national standard 128-bit block cipher algorithm SEED. In this paper, we compare the following four hardware design types of SEED algorithm : (1) Design 1 that is 16 round fully pipelining approach, (2) Design 2 that is a one round looping approach, (3) Design 3 that is a G function sharing and looping approach, and (4) Design 4 that is one round with internal 3 stage pipelining approach. The Design 1, Design 2, and Design 3 are the existing design approaches while the Design 4 is the newly proposed design in this paper. Our new design employs the pipeline between three G-functions and adders consisting of a F function, which results in the less area requirement than Design 2 and achieves the higher performance than Design 2 and Design 3 due to pipelining and module sharing techniques. We design and implement all the comparing four approaches with real hardware targeting FPGA for the purpose of exact performance and area analysis. The experimental results show that Design 4 has the highest performance except Design 1 which pursues very aggressive parallelism at the expanse of area. Our proposed design (Design 4) shows the best throughput/area ratio among all the alternatives by 2.8 times. Therefore, our new design for SEED is the most efficient design comparing with the existing designs.

Parallel Range Query processing on R-tree with Graphics Processing Units (GPU를 이용한 R-tree에서의 범위 질의의 병렬 처리)

  • Yu, Bo-Seon;Kim, Hyun-Duk;Choi, Won-Ik;Kwon, Dong-Seop
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.5
    • /
    • pp.669-680
    • /
    • 2011
  • R-trees are widely used in various areas such as geographical information systems, CAD systems and spatial databases in order to efficiently index multi-dimensional data. As data sets used in these areas grow in size and complexity, however, range query operations on R-tree are needed to be further faster to meet the area-specific constraints. To address this problem, there have been various research efforts to develop strategies for acceleration query processing on R-tree by using the buffer mechanism or parallelizing the query processing on R-tree through multiple disks and processors. As a part of the strategies, approaches which parallelize query processing on R-tree through Graphics Processor Units(GPUs) have been explored. The use of GPUs may guarantee improved performances resulting from faster calculations and reduced disk accesses but may cause additional overhead costs caused by high memory access latencies and low data exchange rate between GPUs and the CPU. In this paper, to address the overhead problems and to adapt GPUs efficiently, we propose a novel approach which uses a GPU as a buffer to parallelize query processing on R-tree. The use of buffer algorithm can give improved performance by reducing the number of disk access and maximizing coalesced memory access resulting in minimizing GPU memory access latencies. Through the extensive performance studies, we observed that the proposed approach achieved up to 5 times higher query performance than the original CPU-based R-trees.

Persistent Page Table and File System Journaling Scheme for NVM Storage (비휘발성 메모리 저장장치를 위한 영속적 페이지 테이블 및 파일시스템 저널링 기법)

  • Ahn, Jae-hyeong;Hyun, Choul-seung;Lee, Dong-hee
    • Journal of IKEEE
    • /
    • v.23 no.1
    • /
    • pp.80-90
    • /
    • 2019
  • Even though Non-Volatile Memory (NVM) is used for data storage, a page table should be built to access data in it. And this observation leads us to the Persistent Page Table (PPT) scheme that keeps the page table in NVM persistently. By the way, processors have different page table structures and really operational page table cannot be built without virtual and physical addresses of NVM. However, those addresses are determined dynamically when NVM storage is attached to the system. Thus, the PPT should have system-independent and also address-independent structure and really working system-dependent page table should be built from the PPT. Moreover, entries of PPT should be updated atomically and, in this paper, we describe the design of PPT that meets those requirements. And we investigate how file systems can decrease the journaling overhead with the swap operation, which is a new operation created by the PPT. We modified the Ext4 file system in Linux and experiments conducted with Filebench workloads show that the swap operation enhances file system performance up to 60%.

A Security SoC embedded with ECDSA Hardware Accelerator (ECDSA 하드웨어 가속기가 내장된 보안 SoC)

  • Jeong, Young-Su;Kim, Min-Ju;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.7
    • /
    • pp.1071-1077
    • /
    • 2022
  • A security SoC that can be used to implement elliptic curve cryptography (ECC) based public-key infrastructures was designed. The security SoC has an architecture in which a hardware accelerator for the elliptic curve digital signature algorithm (ECDSA) is interfaced with the Cortex-A53 CPU using the AXI4-Lite bus. The ECDSA hardware accelerator, which consists of a high-performance ECC processor, a SHA3 hash core, a true random number generator (TRNG), a modular multiplier, BRAM, and control FSM, was designed to perform the high-performance computation of ECDSA signature generation and signature verification with minimal CPU control. The security SoC was implemented in the Zynq UltraScale+ MPSoC device to perform hardware-software co-verification, and it was evaluated that the ECDSA signature generation or signature verification can be achieved about 1,000 times per second at a clock frequency of 150 MHz. The ECDSA hardware accelerator was implemented using hardware resources of 74,630 LUTs, 23,356 flip-flops, 32kb BRAM, and 36 DSP blocks.

A Study on Pose Control for Inverted Pendulum System using PID Algorithm (PID 알고리즘을 이용한 역 진자 시스템의 자세 제어에 관한 연구)

  • Jin-Gu Kang
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.6
    • /
    • pp.400-405
    • /
    • 2023
  • Currently, inverted pendulums are being studied in many fields, including posture control of missiles, rockets, etc, and bipedal robots. In this study, the vertical posture control of the pendulum was studied by constructing a rotary inverted pendulum using a 256-pulse rotary encoder and a DC motor. In the case of nonlinear systems, complex algorithms and controllers are required, but a control method using the classic and relatively simple PID(Proportional Integral Derivation) algorithm was applied to the rotating inverted pendulum system, and a simple but desired method was studied. The rotating inverted pendulum system used in this study is a nonlinear and unstable system, and a PID controller using Microchip's dsPIC30F4013 embedded processor was designed and implemented in linear modeling. Usually, PID controllers are designed by combining one or two or more types, and have the advantage of having a simple structure compared to excellent control performance and that control gain adjustment is relatively easy compared to other controllers. In this study, the physical structure of the system was analyzed using mathematical methods and control for vertical balance of a rotating inverted pendulum was realized through modeling. In addition, the feasibility of controlling with a PID controller using a rotating inverted pendulum was verified through simulation and experiment.

A Hardware Implementation of the Underlying Field Arithmetic Processor based on Optimized Unit Operation Components for Elliptic Curve Cryptosystems (타원곡선을 암호시스템에 사용되는 최적단위 연산항을 기반으로 한 기저체 연산기의 하드웨어 구현)

  • Jo, Seong-Je;Kwon, Yong-Jin
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.88-95
    • /
    • 2002
  • In recent years, the security of hardware and software systems is one of the most essential factor of our safe network community. As elliptic Curve Cryptosystems proposed by N. Koblitz and V. Miller independently in 1985, require fewer bits for the same security as the existing cryptosystems, for example RSA, there is a net reduction in cost size, and time. In this thesis, we propose an efficient hardware architecture of underlying field arithmetic processor for Elliptic Curve Cryptosystems, and a very useful method for implementing the architecture, especially multiplicative inverse operator over GF$GF (2^m)$ onto FPGA and futhermore VLSI, where the method is based on optimized unit operation components. We optimize the arithmetic processor for speed so that it has a resonable number of gates to implement. The proposed architecture could be applied to any finite field $F_{2m}$. According to the simulation result, though the number of gates are increased by a factor of 8.8, the multiplication speed We optimize the arithmetic processor for speed so that it has a resonable number of gates to implement. The proposed architecture could be applied to any finite field $F_{2m}$. According to the simulation result, though the number of gates are increased by a factor of 8.8, the multiplication speed and inversion speed has been improved 150 times, 480 times respectively compared with the thesis presented by Sarwono Sutikno et al. [7]. The designed underlying arithmetic processor can be also applied for implementing other crypto-processor and various finite field applications.

Design and Implementation of a Bluetooth Baseband Module with DMA Interface (DMA 인터페이스를 갖는 블루투스 기저대역 모듈의 설계 및 구현)

  • Cheon, Ik-Jae;O, Jong-Hwan;Im, Ji-Suk;Kim, Bo-Gwan;Park, In-Cheol
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.3
    • /
    • pp.98-109
    • /
    • 2002
  • Bluetooth technology is a publicly available specification proposed for Radio Frequency (RF) communication for short-range :1nd point-to-multipoint voice and data transfer. It operates in the 2.4㎓ ISM(Industrial, Scientific and Medical) band and offers the potential for low-cost, broadband wireless access for various mobile and portable devices at range of about 10 meters. In this paper, we describe the structure and the test results of the bluetooth baseband module with direct memory access method we have developed. This module consists of three blocks; link controller, UART interface, and audio CODEC. This module has a bus interface for data communication between this module and main processor and a RF interface for the transmission of bit-stream between this module and RF module. The bus interface includes DMA interface. Compared with the link controller with FIFOs, The module with DMA has a wide difference in size of module and speed of data processing. The small size module supplies lorr cost and various applications. In addition, this supports a firmware upgrade capability through UART. An FPGA and an ASIC implementation of this module, designed as soft If, are tested for file and bit-stream transfers between PCs.

Cache Memory and Replacement Algorithm Implementation and Performance Comparison

  • Park, Na Eun;Kim, Jongwan;Jeong, Tae Seog
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.11-17
    • /
    • 2020
  • In this paper, we propose practical results for cache replacement policy by measuring cache hit and search time for each replacement algorithm through cache simulation. Thus, the structure of each cache memory and the four types of alternative policies of FIFO, LFU, LRU and Random were implemented in software to analyze the characteristics of each technique. The paper experiment showed that the LRU algorithm showed hit rate and search time of 36.044% and 577.936ns in uniform distribution, 45.636% and 504.692ns in deflection distribution, while the FIFO algorithm showed similar performance to the LRU algorithm at 36.078% and 554.772ns in even distribution and 45.662% and 489.574ns in bias distribution. Then LFU followed, Random algorithm was measured at 30.042% and 622.866ns at even distribution, 36.36% at deflection distribution and 553.878ns at lowest performance. The LRU replacement method commonly used in cache memory has the complexity of implementation, but it is the most efficient alternative to conventional alternative algorithms, indicating that it is a reasonable alternative method considering the reference information of data.