• Title/Summary/Keyword: implementation algorithm

Search Result 4,233, Processing Time 0.028 seconds

Efficient Collaboration Method Between CPU and GPU for Generating All Possible Cases in Combination (조합에서 모든 경우의 수를 만들기 위한 CPU와 GPU의 효율적 협업 방법)

  • Son, Ki-Bong;Son, Min-Young;Kim, Young-Hak
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.9
    • /
    • pp.219-226
    • /
    • 2018
  • One of the systematic ways to generate the number of all cases is a combination to construct a combination tree, and its time complexity is O($2^n$). A combination tree is used for various purposes such as the graph homogeneity problem, the initial model for calculating frequent item sets, and so on. However, algorithms that must search the number of all cases of a combination are difficult to use realistically due to high time complexity. Nevertheless, as the amount of data becomes large and various studies are being carried out to utilize the data, the number of cases of searching all cases is increasing. Recently, as the GPU environment becomes popular and can be easily accessed, various attempts have been made to reduce time by parallelizing algorithms having high time complexity in a serial environment. Because the method of generating the number of all cases in combination is sequential and the size of sub-task is biased, it is not suitable for parallel implementation. The efficiency of parallel algorithms can be maximized when all threads have tasks with similar size. In this paper, we propose a method to efficiently collaborate between CPU and GPU to parallelize the problem of finding the number of all cases. In order to evaluate the performance of the proposed algorithm, we analyze the time complexity in the theoretical aspect, and compare the experimental time of the proposed algorithm with other algorithms in CPU and GPU environment. Experimental results show that the proposed CPU and GPU collaboration algorithm maintains a balance between the execution time of the CPU and GPU compared to the previous algorithms, and the execution time is improved remarkable as the number of elements increases.

A Crypto-processor Supporting Multiple Block Cipher Algorithms (다중 블록 암호 알고리듬을 지원하는 암호 프로세서)

  • Cho, Wook-Lae;Kim, Ki-Bbeum;Bae, Gi-Chur;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.11
    • /
    • pp.2093-2099
    • /
    • 2016
  • This paper describes a design of crypto-processor that supports multiple block cipher algorithms of PRESENT, ARIA, and AES. The crypto-processor integrates three cores that are PRmo (PRESENT with mode of operation), AR_AS (ARIA_AES), and AES-16b. The PRmo core implementing 64-bit block cipher PRESENT supports key length 80-bit and 128-bit, and four modes of operation including ECB, CBC, OFB, and CTR. The AR_AS core supporting key length 128-bit and 256-bit integrates two 128-bit block ciphers ARIA and AES into a single data-path by utilizing resource sharing technique. The AES-16b core supporting key length 128-bit implements AES with a reduced data-path of 16-bit for minimizing hardware. Each crypto-core contains its own on-the-fly key scheduler, and consecutive blocks of plaintext/ciphertext can be processed without reloading key. The crypto-processor was verified by FPGA implementation. The crypto-processor implemented with a $0.18{\mu}m$ CMOS cell library occupies 54,500 gate equivalents (GEs), and it can operate with 55 MHz clock frequency.

Design and Analysis of a Digit-Serial $AB^{2}$ Systolic Arrays in $GF(2^{m})$ ($GF(2^{m})$ 상에서 새로운 디지트 시리얼 $AB^{2}$ 시스톨릭 어레이 설계 및 분석)

  • Kim Nam-Yeun;Yoo Kee-Young
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.4
    • /
    • pp.160-167
    • /
    • 2005
  • Among finite filed arithmetic operations, division/inverse is known as a basic operation for public-key cryptosystems over $GF(2^{m})$ and it is computed by performing the repetitive $AB^{2}$ multiplication. This paper presents a digit-serial-in-serial-out systolic architecture for performing the $AB^2$ operation in GF$(2^{m})$. To obtain L×L digit-serial-in-serial-out architecture, new $AB^{2}$ algorithm is proposed and partitioning, index transformation and merging the cell of the architecture, which is derived from the algorithm, are proposed. Based on the area-time product, when the digit-size of digit-serial architecture, L, is selected to be less than about m, the proposed digit-serial architecture is efficient than bit-parallel architecture, and L is selected to be less than about $(1/5)log_{2}(m+1)$, the proposed is efficient than bit-serial. In addition, the area-time product complexity of pipelined digit-serial $AB^{2}$ systolic architecture is approximately $10.9\%$ lower than that of nonpipelined one, when it is assumed that m=160 and L=8. Additionally, since the proposed architecture can be utilized for the basic architecture of crypto-processor and it is well suited to VLSI implementation because of its simplicity, regularity and pipelinability.

Implementation of High-radix Modular Exponentiator for RSA using CRT (CRT를 이용한 하이래딕스 RSA 모듈로 멱승 처리기의 구현)

  • 이석용;김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.10 no.4
    • /
    • pp.81-93
    • /
    • 2000
  • In a methodological approach to improve the processing performance of modulo exponentiation which is the primary arithmetic in RSA crypto algorithm, we present a new RSA hardware architecture based on high-radix modulo multiplication and CRT(Chinese Remainder Theorem). By implementing the modulo multiplier using radix-16 arithmetic, we reduced the number of PE(Processing Element)s by quarter comparing to the binary arithmetic scheme. This leads to having the number of clock cycles and the delay of pipelining flip-flops be reduced by quarter respectively. Because the receiver knows p and q, factors of N, it is possible to apply the CRT to the decryption process. To use CRT, we made two s/2-bit multipliers operating in parallel at decryption, which accomplished 4 times faster performance than when not using the CRT. In encryption phase, the two s/2-bit multipliers can be connected to make a s-bit linear multiplier for the s-bit arithmetic operation. We limited the encryption exponent size up to 17-bit to maintain high speed, We implemented a linear array modulo multiplier by projecting horizontally the DG of Montgomery algorithm. The H/W proposed here performs encryption with 15Mbps bit-rate and decryption with 1.22Mbps, when estimated with reference to Samsung 0.5um CMOS Standard Cell Library, which is the fastest among the publications at present.

A Study on AR Algorithm Modeling for Indoor Furniture Interior Arrangement Using CNN

  • Ko, Jeong-Beom;Kim, Joon-Yong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.11-17
    • /
    • 2022
  • In this paper, a model that can increase the efficiency of work in arranging interior furniture by applying augmented reality technology was studied. In the existing system to which augmented reality is currently applied, there is a problem in that information is limitedly provided depending on the size and nature of the company's product when outputting the image of furniture. To solve this problem, this paper presents an AR labeling algorithm. The AR labeling algorithm extracts feature points from the captured images and builds a database including indoor location information. A method of detecting and learning the location data of furniture in an indoor space was adopted using the CNN technique. Through the learned result, it is confirmed that the error between the indoor location and the location shown by learning can be significantly reduced. In addition, a study was conducted to allow users to easily place desired furniture through augmented reality by receiving detailed information about furniture along with accurate image extraction of furniture. As a result of the study, the accuracy and loss rate of the model were found to be 99% and 0.026, indicating the significance of this study by securing reliability. The results of this study are expected to satisfy consumers' satisfaction and purchase desires by accurately arranging desired furniture indoors through the design and implementation of AR labels.

Design and Implementation of a Fault-Tolerant Caching System for Dynamic Heterogeneous Cache Server Networks (동적 이기종 캐시 서버 네트워크에서의 내결함성 캐싱 시스템 설계 및 구현)

  • Hyeon-Gi Kim;Gyu-Sik Ham;Jin-Woo Kim;Soo-Young Jang;Chang-Beom Choi
    • Journal of IKEEE
    • /
    • v.28 no.3
    • /
    • pp.458-464
    • /
    • 2024
  • This study proposes a fault-tolerant caching system to address the issue of caching content imbalance caused by the dynamic departure and participation of cache servers in a heterogeneous cache server network, and validates it in both real and virtual environments. With the increase of large-scale media content requiring various types and resolutions, the necessity of cache servers as key components to reduce response time to user requests and alleviate network load has been growing. In particular, research on heterogeneous cache server networks utilizing edge computing and low-power devices has been actively conducted recently. However, in such environments, the irregular departure and participation of cache servers can occur frequently, leading to content imbalance among the cache servers deployed in the network, which can degrade the performance of the cache server network. The fault-tolerant caching algorithm proposed in this study ensures stable service quality by maintaining balance among media contents even when cache servers depart. Experimental results confirmed that the proposed algorithm effectively maintains content distribution despite the departure of cache servers. Additionally, we built a network composed of seven heterogeneous cache servers to verify the practicality of the proposed caching system and demonstrated its performance and scalability through a large-scale cache server network in a virtual environment.

Design and Implementation of a Concuuuency Control Manager for Main Memory Databases (주기억장치 데이터베이스를 위한 동시성 제어 관리자의 설계 및 구현)

  • Kim, Sang-Wook;Jang, Yeon-Jeong;Kim, Yun-Ho;Kim, Jin-Ho;Lee, Seung-Sun;Choi, Wan
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.4B
    • /
    • pp.646-680
    • /
    • 2000
  • In this paper, we discuss the design and implementation of a concurrency control manager for a main memory DBMS(MMDBMS). Since an MMDBMS, unlike a disk-based DBMS, performs all of data update or retrieval operations by accessing main memory only, the portion of the cost for concurrency control in the total cost for a data update or retrieval is fairly high. Thus, the development of an efficient concurrency control manager highly accelerates the performance of the entire system. Our concurrency control manager employs the 2-phase locking protocol, and has the following characteristics. First, it adapts the partition, an allocation unit of main memory, as a locking granule, and thus, effectively adjusts the trade-off between the system concurrency and locking cost through the analysis of applications. Second, it enjoys low locking costs by maintaining the lock information directly in the partition itself. Third, it provides the latch as a mechanism for physical consistency of system data. Our latch supports both of the shared and exclusive modes, and maximizes the CPU utilization by combining the Bakery algorithm and Unix semaphore facility. Fourth, for solving the deadlock problem, it periodically examines whether a system is in a deadlock state using lock waiting information. In addition, we discuss various issues arising in development such as mutual exclusion of a transaction table, mutual exclusion of indexes and system catalogs, and realtime application supports.

  • PDF

Smart Ship Container With M2M Technology (M2M 기술을 이용한 스마트 선박 컨테이너)

  • Sharma, Ronesh;Lee, Seong Ro
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38C no.3
    • /
    • pp.278-287
    • /
    • 2013
  • Modern information technologies continue to provide industries with new and improved methods. With the rapid development of Machine to Machine (M2M) communication, a smart container supply chain management is formed based on high performance sensors, computer vision, Global Positioning System (GPS) satellites, and Globle System for Mobile (GSM) communication. Existing supply chain management has limitation to real time container tracking. This paper focuses on the studies and implementation of real time container chain management with the development of the container identification system and automatic alert system for interrupts and for normal periodical alerts. The concept and methods of smart container modeling are introduced together with the structure explained prior to the implementation of smart container tracking alert system. Firstly, the paper introduces the container code identification and recognition algorithm implemented in visual studio 2010 with Opencv (computer vision library) and Tesseract (OCR engine) for real time operation. Secondly it discusses the current automatic alert system provided for real time container tracking and the limitations of those systems. Finally the paper summarizes the challenges and the possibilities for the future work for real time container tracking solutions with the ubiquitous mobile and satellite network together with the high performance sensors and computer vision. All of those components combine to provide an excellent delivery of supply chain management with outstanding operation and security.

Parallel Distributed Implementation of GHT on Ethernet Multicluster (이더넷 다중 클러스터에서 GHT의 병렬 분산 구현)

  • Kim, Yeong-Soo;Kim, Myung-Ho;Choi, Heung-Moon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.3
    • /
    • pp.96-106
    • /
    • 2009
  • Extending the scale of the distributed processing in a single Ethernet cluster is physically restricted by maximum ports per switch. This paper presents an implementation of MPI-based multicluster consisting of multiple Ethernet switches for extending the scale of distributed processing, and a asymptotical analysis for communication overhead through execution-time analysis model. To determine an optimum task partitioning, we analyzed the processing time for various partitioning schemes, and AAP(accumulator array partitioning) scheme was finally chosen to minimize the overall communication overhead. The scope of data partitioned in AAP was modified to fit for incremented nodes, and suitable load balancing algorithm was implemented. We tried to alleviate the communication overhead through exploiting the pipelined broadcast and flat-tree based result gathering, and overlapping of the communication and the computation time. We used the linear pipeline broadcast to reduce the communication overhead in intercluster which is interconnected by a single link. Experimental results shows nearly linear speedup by the proposed parallel distributed GHT implemented on MPI-based Ethernet multicluster with four 100Mbps Ethernet switches and up to 128 nodes of Pentium PC.

Performance Analysis of Implementation on Image Processing Algorithm for Multi-Access Memory System Including 16 Processing Elements (16개의 처리기를 가진 다중접근기억장치를 위한 영상처리 알고리즘의 구현에 대한 성능평가)

  • Lee, You-Jin;Kim, Jea-Hee;Park, Jong-Won
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.3
    • /
    • pp.8-14
    • /
    • 2012
  • Improving the speed of image processing is in great demand according to spread of high quality visual media or massive image applications such as 3D TV or movies, AR(Augmented reality). SIMD computer attached to a host computer can accelerate various image processing and massive data operations. MAMS is a multi-access memory system which is, along with multiple processing elements(PEs), adequate for establishing a high performance pipelined SIMD machine. MAMS supports simultaneous access to pq data elements within a horizontal, a vertical, or a block subarray with a constant interval in an arbitrary position in an $M{\times}N$ array of data elements, where the number of memory modules(MMs), m, is a prime number greater than pq. MAMS-PP4 is the first realization of the MAMS architecture, which consists of four PEs in a single chip and five MMs. This paper presents implementation of image processing algorithms and performance analysis for MAMS-PP16 which consists of 16 PEs with 17 MMs in an extension or the prior work, MAMS-PP4. The newly designed MAMS-PP16 has a 64 bit instruction format and application specific instruction set. The author develops a simulator of the MAMS-PP16 system, which implemented algorithms can be executed on. Performance analysis has done with this simulator executing implemented algorithms of processing images. The result of performance analysis verifies consistent response of MAMS-PP16 through the pyramid operation in image processing algorithms comparing with a Pentium-based serial processor. Executing the pyramid operation in MAMS-PP16 results in consistent response of processing time while randomly response time in a serial processor.