• Title/Summary/Keyword: Parallel Implementation

Search Result 883, Processing Time 0.025 seconds

Design and Hardware Implementation of High-Speed Variable-Length RSA Cryptosystem (가변길이 고속 RSA 암호시스템의 설계 및 하드웨어 구현)

  • 박진영;서영호;김동욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.9C
    • /
    • pp.861-870
    • /
    • 2002
  • In this paper, with targeting on the drawback of RSA of operation speed, a new 1024-bit RSA cryptosystem has been proposed and implemented in hardware to increase the operational speed and perform the variable-length encryption. The proposed cryptosystem mainly consists of the modular exponentiation part and the modular multiplication part. For the modular exponentiation, the RL-binary method, which performs squaring and modular multiplying in parallel, was improved, and then applied. And 4-stage CSA structure and radix-4 booth algorithm were applied to enhance the variable-length operation and reduce the number of partial product in modular multiplication arithmetic. The proposed RSA cryptosystem which can calculate at most 1024 bits at a tittle was mapped into the integrated circuit using the Hynix Phantom Cell Library for Hynix 0.35㎛ 2-Poly 4-Metal CMOS process. Also, the result of software implementation, which had been programmed prior to the hardware research, has been used to verify the operation of the hardware system. The size of the result from the hardware implementation was about 190k gate count and the operational clock frequency was 150㎒. By considering a variable-length of modulus number, the baud rate of the proposed scheme is one and half times faster than the previous works. Therefore, the proposed high speed variable-length RSA cryptosystem should be able to be used in various information security system which requires high speed operation.

Research Priorities to Support Mandatory Implementation of a Total Pollutant Load Management System (TPLMS) in the Han River Basin (한강수계 의무적 수질오염총량관리제시행지원을 위한 조사·연구의 우선순위 설정)

  • Lee, Chang-Hee;Lee, Bum-Yeon;Lee, Su-Woong
    • Journal of Environmental Policy
    • /
    • v.8 no.4
    • /
    • pp.25-36
    • /
    • 2009
  • The Total Pollutant Load Management System(TPLMS) in the Han River basin is being changed from a voluntary to a mandatory system. Accordingly, this study suggests directions and priorities for research that can support implementation of TPLMS through an objective approach that deploys gap analysis and analytic hierarchy processes (AHP). Gap analysis indicated that TPLMS in Korea is still focused on compliance with regulations, and that implementation of TPLMS is still in its early stage. Improvements are thus needed in flexibility and effectiveness, including introduction of emissions rights trading, and upgrading to a renewable emissions permit system. The AHP study indicated that R&D will need to proceed in parallel in multiple areas to improve systems and resolve scientific uncertainties. Balanced R&D will be needed in both the institutional and technical groups. Subgroup analysis indicated that developing a reasonable process to establish water quality management targets is of the highest priority in the institutional group. In the technical group, higher priority will need to be given to improving model reliability and developing innovative pollution load reduction technologies.

  • PDF

A Fast Processor Architecture and 2-D Data Scheduling Method to Implement the Lifting Scheme 2-D Discrete Wavelet Transform (리프팅 스킴의 2차원 이산 웨이브릿 변환 하드웨어 구현을 위한 고속 프로세서 구조 및 2차원 데이터 스케줄링 방법)

  • Kim Jong Woog;Chong Jong Wha
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.4 s.334
    • /
    • pp.19-28
    • /
    • 2005
  • In this paper, we proposed a parallel fast 2-D discrete wavelet transform hardware architecture based on lifting scheme. The proposed architecture improved the 2-D processing speed, and reduced internal memory buffer size. The previous lifting scheme based parallel 2-D wavelet transform architectures were consisted with row direction and column direction modules, which were pair of prediction and update filter module. In 2-D wavelet transform, column direction processing used the row direction results, which were not generated in column direction order but in row direction order, so most hardware architecture need internal buffer memory. The proposed architecture focused on the reducing of the internal memory buffer size and the total calculation time. Reducing the total calculation time, we proposed a 4-way data flow scheduling and memory based parallel hardware architecture. The 4-way data flow scheduling can increase the row direction parallel performance, and reduced the initial latency of starting of the row direction calculation. In this hardware architecture, the internal buffer memory didn't used to store the results of the row direction calculation, while it contained intermediate values of column direction calculation. This method is very effective in column direction processing, because the input data of column direction were not generated in column direction order The proposed architecture was implemented with VHDL and Altera Stratix device. The implementation results showed overall calculation time reduced from $N^2/2+\alpha$ to $N^2/4+\beta$, and internal buffer memory size reduced by around $50\%$ of previous works.

Parallel Processing of Satellite Images using CUDA Library: Focused on NDVI Calculation (CUDA 라이브러리를 이용한 위성영상 병렬처리 : NDVI 연산을 중심으로)

  • LEE, Kang-Hun;JO, Myung-Hee;LEE, Won-Hee
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.19 no.3
    • /
    • pp.29-42
    • /
    • 2016
  • Remote sensing allows acquisition of information across a large area without contacting objects, and has thus been rapidly developed by application to different areas. Thus, with the development of remote sensing, satellites are able to rapidly advance in terms of their image resolution. As a result, satellites that use remote sensing have been applied to conduct research across many areas of the world. However, while research on remote sensing is being implemented across various areas, research on data processing is presently insufficient; that is, as satellite resources are further developed, data processing continues to lag behind. Accordingly, this paper discusses plans to maximize the performance of satellite image processing by utilizing the CUDA(Compute Unified Device Architecture) Library of NVIDIA, a parallel processing technique. The discussion in this paper proceeds as follows. First, standard KOMPSAT(Korea Multi-Purpose Satellite) images of various sizes are subdivided into five types. NDVI(Normalized Difference Vegetation Index) is implemented to the subdivided images. Next, ArcMap and the two techniques, each based on CPU or GPU, are used to implement NDVI. The histograms of each image are then compared after each implementation to analyze the different processing speeds when using CPU and GPU. The results indicate that both the CPU version and GPU version images are equal with the ArcMap images, and after the histogram comparison, the NDVI code was correctly implemented. In terms of the processing speed, GPU showed 5 times faster results than CPU. Accordingly, this research shows that a parallel processing technique using CUDA Library can enhance the data processing speed of satellites images, and that this data processing benefits from multiple advanced remote sensing techniques as compared to a simple pixel computation like NDVI.

R Based Parallelization of a Climate Suitability Model to Predict Suitable Area of Maize in Korea (국내 옥수수 재배적지 예측을 위한 R 기반의 기후적합도 모델 병렬화)

  • Hyun, Shinwoo;Kim, Kwang Soo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.19 no.3
    • /
    • pp.164-173
    • /
    • 2017
  • Alternative cropping systems would be one of climate change adaptation options. Suitable areas for a crop could be identified using a climate suitability model. The EcoCrop model has been used to assess climate suitability of crops using monthly climate surfaces, e.g., the digital climate map at high spatial resolution. Still, a high-performance computing approach would be needed for assessment of climate suitability to take into account a complex terrain in Korea, which requires considerably large climate data sets. The objectives of this study were to implement a script for R, which is an open source statistics analysis platform, in order to use the EcoCrop model under a parallel computing environment and to assess climate suitability of maize using digital climate maps at high spatial resolution, e.g., 1 km. The total running time reduced as the number of CPU (Central Processing Unit) core increased although the speedup with increasing number of CPU cores was not linear. For example, the wall clock time for assessing climate suitability index at 1 km spatial resolution reduced by 90% with 16 CPU cores. However, it took about 1.5 time to compute climate suitability index compared with a theoretical time for the given number of CPU. Implementation of climate suitability assessment system based on the MPI (Message Passing Interface) would allow support for the digital climate map at ultra-high spatial resolution, e.g., 30m, which would help site-specific design of cropping system for climate change adaptation.

Status and Perspective of Biomass Co-firing to Pulverized Coal Power Plants (미분탄 석탄화력발전에서의 바이오매스 혼소 동향 및 전망)

  • Yang, Won
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.2 no.4
    • /
    • pp.525-529
    • /
    • 2016
  • Biomass co-firing to existing thermal power plants is one of the most economical and efficient way to reduce $CO_2$ emission from the plant. There are several methods of co-firing and it can be categorized into (1) Parallel co-firing, (2) Indirect co-firing, and (3) Direct co-firing. Parallel co-firing is the most expensive way to high-ratio co-firing because it requires biomass dedicated boiler. Direct co-firing is widely used because it does not need high capital cost compared with the other two methods. Regarding the direct co-firing, it can be classified into three methods- Method 1 does not need retrofit of the facilities because it uses existing coal mills for pulverizing biomass fuels. In this case high-ratio co-firing cannot be achieved because of poor grindability of biomass fuels. Method 2 needs biomass-dedicated mills and revision of fuel streams for the combustion system, and Method 3 needs additional retrofit of the boiler as well as biomass mills. It can achieve highest share of the biomass co-firing compared with other two methods. In Korea, many coal power plants have been adopting Method 1 for coping with RPS(Renewable portfolio standards). Higher co-firing ratio (> 5% thermal share) has not been considered in Korean power plants due to policy of limitation in biomass co-firing for securing REC(Renewable Energy Certificate). On the other hand, higher-share co-firing of biomass is widely used in Europe and US using biomass dedicated mills, following their policy to enhance utilization of renewable energy in those countries. Technical problems which can be caused by increasing share of the biomass in coal power plants are summarized and discussed in this report. $CO_2$ abatement will become more and more critical issues for coal power plants since Paris agreement(2015) and demand of higher share of biomass in the coal power plants will be rapidly increased in Korea as well. Torrefaction of the biomass can be one of the best options because torrefied biomass has higher heating value and grindability than other biomass fuels. Perspective of the biomass torrefaction for co-firing is discussed, and economic feasibility of biomass torrefaction will be crucial for implementation of this technology.

SSQUSAR : A Large-Scale Qualitative Spatial Reasoner Using Apache Spark SQL (SSQUSAR : Apache Spark SQL을 이용한 대용량 정성 공간 추론기)

  • Kim, Jonghoon;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.2
    • /
    • pp.103-116
    • /
    • 2017
  • In this paper, we present the design and implementation of a large-scale qualitative spatial reasoner, which can derive new qualitative spatial knowledge representing both topological and directional relationships between two arbitrary spatial objects in efficient way using Aparch Spark SQL. Apache Spark SQL is well known as a distributed parallel programming environment which provides both efficient join operations and query processing functions over a variety of data in Hadoop cluster computer systems. In our spatial reasoner, the overall reasoning process is divided into 6 jobs such as knowledge encoding, inverse reasoning, equal reasoning, transitive reasoning, relation refining, knowledge decoding, and then the execution order over the reasoning jobs is determined in consideration of both logical causal relationships and computational efficiency. The knowledge encoding job reduces the size of knowledge base to reason over by transforming the input knowledge of XML/RDF form into one of more precise form. Repeat of the transitive reasoning job and the relation refining job usually consumes most of computational time and storage for the overall reasoning process. In order to improve the jobs, our reasoner finds out the minimal disjunctive relations for qualitative spatial reasoning, and then, based upon them, it not only reduces the composition table to be used for the transitive reasoning job, but also optimizes the relation refining job. Through experiments using a large-scale benchmarking spatial knowledge base, the proposed reasoner showed high performance and scalability.

Implementation of Massive FDTD Simulation Computing Model Based on MPI Cluster for Semi-conductor Process (반도체 검증을 위한 MPI 기반 클러스터에서의 대용량 FDTD 시뮬레이션 연산환경 구축)

  • Lee, Seung-Il;Kim, Yeon-Il;Lee, Sang-Gil;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.9
    • /
    • pp.21-28
    • /
    • 2015
  • In the semi-conductor process, a simulation process is performed to detect defects by analyzing the behavior of the impurity through the physical quantity calculation of the inner element. In order to perform the simulation, Finite-Difference Time-Domain(FDTD) algorithm is used. The improvement of semiconductor which is composed of nanoscale elements, the size of simulation is getting bigger. Problems that a processor such as CPU or GPU cannot perform the simulation due to the massive size of matrix or a computer consist of multiple processors cannot handle a massive FDTD may come up. For those problems, studies are performed with parallel/distributed computing. However, in the past, only single type of processor was used. In GPU's case, it performs fast, but at the same time, it has limited memory. On the other hand, in CPU, it performs slower than that of GPU. To solve the problem, we implemented a computing model that can handle any FDTD simulation regardless of size on the cluster which consist of heterogeneous processors. We tested the simulation on processors using MPI libraries which is based on 'point to point' communication and verified that it operates correctly regardless of the number of node and type. Also, we analyzed the performance by measuring the total execution time and specific time for the simulation on each test.

An Implementation of 3D Graphic Accelerator for Phong Shading (퐁 음영법을 위한 3차원 그래픽 가속기의 구현)

  • Lee, Hyung;Park, Youn-Ok;Park, Jong-Won
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.5
    • /
    • pp.526-534
    • /
    • 2000
  • There have been many researches on the 3D graphic accelerator for high speed by needs of CAD/CAM,3D modeling, virtual reality or medical image. In this paper, an SIMD processor architecture for 3D graphic accelerator is proposed in order to improve the processing time of the 3D graphics, and a parallel Phong shading algorithm is presented to estimate performance of the proposed architecture. The proposed SIMD processor architecture for 3D graphic accelerator consists of PCI local bus interface, 16 Processing Elements (PE's), and Park's multi-access memory system (NAMS) that has 17 memory modules. A serial algorithm for Phong shading is modified for the architecture and the main key is to divide a polygon into $4\times{4}$ squares. And, for processing a square, 4 PE's are regarded as a PE Grou logically. Since MAMS can support block access type with interval 1, it is possible that 4 PE Groups process a square at a time. In consequence, 16 pixels are processed simultaneously. The proposed SIMD processor architecture is simulated by CADENCE Verilog-XL that is a package for the hardware simulation. With the same simulated results as that of the serial algorithm, the speed enhancement by the parallel algorithm to the serial one is 5.68.

  • PDF

Implementation of Hardware Data Prefetcher Adaptable for Various State-of-the-Art Workload (다양한 최신 워크로드에 적용 가능한 하드웨어 데이터 프리페처 구현)

  • Kim, KangHee;Park, TaeShin;Song, KyungHwan;Yoon, DongSung;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.12
    • /
    • pp.20-35
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.