• 제목/요약/키워드: 병렬 연산 처리

Search Result 552, Processing Time 0.028 seconds

Effcient Neural Network Architecture for Fat Target Detection and Recognition (목표물의 고속 탐지 및 인식을 위한 효율적인 신경망 구조)

  • Weon, Yong-Kwan;Baek, Yong-Chang;Lee, Jeong-Su
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.10
    • /
    • pp.2461-2469
    • /
    • 1997
  • Target detection and recognition problems, in which neural networks are widely used, require translation invariant and real-time processing in addition to the requirements that general pattern recognition problems need. This paper presents a novel architecture that meets the requirements and explains effective methodology to train the network. The proposed neural network is an architectural extension of the shared-weight neural network that is composed of the feature extraction stage followed by the pattern recognition stage. Its feature extraction stage performs correlational operation on the input with a weight kernel, and the entire neural network can be considered a nonlinear correlation filter. Therefore, the output of the proposed neural network is correlational plane with peak values at the location of the target. The architecture of this neural network is suitable for implementing with parallel or distributed computers, and this fact allows the application to the problems which require realtime processing. Net training methodology to overcome the problem caused by unbalance of the number of targets and non-targets is also introduced. To verify the performance, the proposed network is applied to detection and recognition problem of a specific automobile driving around in a parking lot. The results show no false alarms and fast processing enough to track a target that moves as fast as about 190 km per hour.

  • PDF

A Pipelined Hash Join Method for Load Balancing (부하 균형 유지를 고려한 파이프라인 해시 조인 방법)

  • Moon, Jin-Gue;Park, No-Sang;Kim, Pyeong-Jung;Jin, Seong-Il
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.755-768
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new hash join methods with load balancing capabilities. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets adaptively via a frequency distribution. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join processing without staying on disks. Unless the pipelining execution of multiple hash joins includes some load balancing mechanisms, the skew effect can severely deteriorate system performance. In this paper, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

A Load Balancing Method using Partition Tuning for Pipelined Multi-way Hash Join (다중 해시 조인의 파이프라인 처리에서 분할 조율을 통한 부하 균형 유지 방법)

  • Mun, Jin-Gyu;Jin, Seong-Il;Jo, Seong-Hyeon
    • Journal of KIISE:Databases
    • /
    • v.29 no.3
    • /
    • pp.180-192
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new harsh join methods in the shared-nothing multiprocessor environment. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets dynamically via a frequency distribution. Using harsh-based joins, multiple joins can be pipelined to that the early results from a join, before the whole join is completed, are sent to the next join processing without staying in disks. Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. However, this hardware structure is very sensitive to the data skew. Unless the pipelining execution of multiple hash joins includes some dynamic load balancing mechanism, the skew effect can severely deteriorate the system performance. In this parer, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

Implementation of RSA modular exponentiator using Division Chain (나눗셈 체인을 이용한 RSA 모듈로 멱승기의 구현)

  • 김성두;정용진
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.12 no.2
    • /
    • pp.21-34
    • /
    • 2002
  • In this paper we propos a new hardware architecture of modular exponentiation using a division chain method which has been proposed in (2). Modular exponentiation using the division chain is performed by receding an exponent E as a mixed form of multiplication and addition with divisors d=2 or $d=2^I +1$ and respective remainders r. This calculates the modular exponentiation in about $1.4log_2$E multiplications on average which is much less iterations than $2log_2$E of conventional Binary Method. We designed a linear systolic array multiplier with pipelining and used a horizontal projection on its data dependence graph. So, for k-bit key, two k-bit data frames can be inputted simultaneously and two modular multipliers, each consisting of k/2+3 PE(Processing Element)s, can operate in parallel to accomplish 100% throughput. We propose a new encoding scheme to represent divisors and remainders of the division chain to keep regularity of the data path. When it is synthesized to ASIC using Samsung 0.5 um CMOS standard cell library, the critical path delay is 4.24ns, and resulting performance is estimated to be abort 140 Kbps for a 1024-bit data frame at 200Mhz clock In decryption process, the speed can be enhanced to 560kbps by using CRT(Chinese Remainder Theorem). Futhermore, to satisfy real time requirements we can choose small public exponent E, such as 3,17 or $2^{16} +1$, in encryption and verification process. in which case the performance can reach 7.3Mbps.

Performance Optimization of Numerical Ocean Modeling on Cloud Systems (클라우드 시스템에서 해양수치모델 성능 최적화)

  • JUNG, KWANGWOOG;CHO, YANG-KI;TAK, YONG-JIN
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.27 no.3
    • /
    • pp.127-143
    • /
    • 2022
  • Recently, many attempts to run numerical ocean models in cloud computing environments have been tried actively. A cloud computing environment can be an effective means to implement numerical ocean models requiring a large-scale resource or quickly preparing modeling environment for global or large-scale grids. Many commercial and private cloud computing systems provide technologies such as virtualization, high-performance CPUs and instances, ether-net based high-performance-networking, and remote direct memory access for High Performance Computing (HPC). These new features facilitate ocean modeling experimentation on commercial cloud computing systems. Many scientists and engineers expect cloud computing to become mainstream in the near future. Analysis of the performance and features of commercial cloud services for numerical modeling is essential in order to select appropriate systems as this can help to minimize execution time and the amount of resources utilized. The effect of cache memory is large in the processing structure of the ocean numerical model, which processes input/output of data in a multidimensional array structure, and the speed of the network is important due to the communication characteristics through which a large amount of data moves. In this study, the performance of the Regional Ocean Modeling System (ROMS), the High Performance Linpack (HPL) benchmarking software package, and STREAM, the memory benchmark were evaluated and compared on commercial cloud systems to provide information for the transition of other ocean models into cloud computing. Through analysis of actual performance data and configuration settings obtained from virtualization-based commercial clouds, we evaluated the efficiency of the computer resources for the various model grid sizes in the virtualization-based cloud systems. We found that cache hierarchy and capacity are crucial in the performance of ROMS using huge memory. The memory latency time is also important in the performance. Increasing the number of cores to reduce the running time for numerical modeling is more effective with large grid sizes than with small grid sizes. Our analysis results will be helpful as a reference for constructing the best computing system in the cloud to minimize time and cost for numerical ocean modeling.

Performance of Initial Timing Acquisition in the DS-UWB Systems with Different Transmit Pulse Shaping Filters (DS-UWB 시스템에서 송신 필터에 따른 초기 동기 획득 성능 비교)

  • Kang, Kyu-Min
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.20 no.5
    • /
    • pp.493-502
    • /
    • 2009
  • In this paper, we compare the performance of initial timing acquisition in direct sequence ultra-wideband(DS-UWB) systems with different transmit pulse shaping filters through extensive computer simulations. Simulation results show that the timing acquisition performance of the DS-UWB system, whose chip rate is 1.32 Gchip/s, employing a rectangular transmit filter is similar to that employing a square root raised cosine(SRRC) filter with an interpolation factor of 4 in the realistic UWB channels(CM1 and CM3) as well as the additive white Gaussian noise(AWGN) channel. Additionally, we present both a 24-parallel digital correlator structure and a 24-parallel processing searcher operating at a 55 MHz system clock, and then briefly discuss the initial timing acquisition procedure. Because we can adopt an 1.32 Gsample/s digital-to-analog(D/A) converter and an 1.32 Gsample/s analog-to-digital(AID) converter in the DS-UWB system by employing the rectangular transmit filter, we have a realistic solution for the DS-UWB chipset development.

CMOS Image Automatic Exposure System With Real-time and Robustness Style for the Journal of Korean Contents (실시간성과 강건성을 갖는 CMOS 자동노출 시스템)

  • Choi, Wonseok;Kim, HeeSu;Kim, Jaehyun;Cho, Youngki;Choi, Sungho;Lee, Yongseon
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.10
    • /
    • pp.1-13
    • /
    • 2020
  • There are many factors that influence the image quality of CMOS camera images, among which the image exposure time is an important factor. If the image exposure time is long, the entire image on the screen becomes brighter. If the exposure time is shorter, the entire image becomes darker. When photographing a still image, real time is not required because the automatic exposure system is given sufficient time to obtain an appropriate exposure time. However, if the surroundings and environment change rapidly like the black box of a driving car, the exposure time should be applied in response to real time. To this end, a robust automatic exposure system for real-time performance and ambient light environment is required. An automatic exposure system that has real-time capability and is robust against the ambient light environment is required. we designed a real-time control sysem capable of parallel operation processing through the design of an embedded system using zynq's logic and ARM core, and developed a real-time CMOS automatic exposure system that is robust to noise and converges to a desired target value within 66 ms through PID control.

Variable Radix-Two Multibit Coding and Its VLSI Implementation of DCT/IDCT (가변길이 다중비트 코딩을 이용한 DCT/IDCT의 설계)

  • 김대원;최준림
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.12
    • /
    • pp.1062-1070
    • /
    • 2002
  • In this paper, variable radix-two multibit coding algorithm is presented and applied in the implementation of discrete cosine transform(DCT) and inverse discrete cosine transform(IDCT). Variable radix-two multibit coding means the 2k SD (signed digit) representation of overlapped multibit scanning with variable shift method. SD represented by 2k generates partial products, which can be easily implemented with shifters and adders. This algorithm is most powerful for the hardware implementation of DCT/IDCT with constant coefficient matrix multiplication. This paper introduces the suggested algorithm, it's proof and the implementation of DCT/IDCT The implemented IDCT chip with 8 PEs(Processing Elements) and one transpose memory runs at a tate of 400 Mpixels/sec at 54MHz frequency for high speed parallel signal processing, and it's verified in HDTV and MPEG decoder.

A Design of AES-based CCMP Core for IEEE 802.11i Wireless LAN Security (IEEE 802.11i 무선 랜 보안을 위한 AES 기반 CCMP Core 설계)

  • Hwang Seok-Ki;Lee Jin-Woo;Kim Chay-Hyeun;Song You-Soo;Shin Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.4
    • /
    • pp.798-803
    • /
    • 2005
  • This paper describes a design of AES(Advanced Encryption Standard)-based CCMP core for IEEE 802.1li wireless LAN security. To maximize its performance, two AES cores ate used, one is for counter mode for data confidentiality and the other is for CBC(Cipher Block Chaining) mode for authentication and data integrity. The S-box that requires the largest hardware in AES core is implemented using composite field arithmetic, and the gate count is reduced by about $20\%$ compared with conventional LUT(Lookup Table)-based design. The CCMP core designed in Verilog-HDL has 13,360 gates, and the estimated throughput is about 168 Mbps at 54-MHz clock frequency. The functionality of the CCMP core is verified by Excalibur SoC implementation.

Research on efficient HW/SW co-design method of light-weight cryptography using GEZEL (경량화 암호의 GEZEL을 이용한 효율적인 하드웨어/소프트웨어 통합 설계 기법에 대한 연구)

  • Kim, Sung-Gon;Kim, Hyun-Min;Hong, Seok-Hie
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.24 no.4
    • /
    • pp.593-605
    • /
    • 2014
  • In this paper, we propose the efficient HW/SW co-design method of light-weight cryptography such as HIGHT, PRESENT and PRINTcipher using GEZEL. At first the symmetric cryptographic algorithms were designed using the GEZEL language which is efficiently used for HW/SW co-design. And for the improvement of performance the HW optimization theory such as unfolding, retiming and so forth were adapted to the cryptographic HW module conducted by FSMD. Also, the operation modes of those algorithms were implemented using C language in 8051 microprocessor, it can be compatible to various platforms. For providing reliable communication between HW/SW and preventing the time delay the improved handshake protocol was chosen for enhancing the performance of the connection between HW/SW. The improved protocol can process the communication-core and cryptography-core on the HW in parallel so that the messages can be transmitted to SW after HW operation and received from SW during encryption operation.