• Title/Summary/Keyword: 병렬 연산 처리

Search Result 552, Processing Time 0.029 seconds

Design and Implementation of a Fault Simulation System for Mixed-level Combinational Logic Circuits (혼합형 조합 회로용 고장 시뮬레이션 시스템의 설계 및 구현)

  • Park, Yeong-Ho;Son, Jin-U;Park, Eun-Se
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.1
    • /
    • pp.311-323
    • /
    • 1997
  • This paper presents a fast fault simulation system for detecting stuck-at faults in mixed-level combinational logic circuits with gale level and switch -level primitives. For a practical fault simulator, the types are not restricted to static switch-level and/or gate-level circuits, but include dynamic switch-level circuits. To efficiently handle the multiple signal contention problems at wired logic elements, we propose a six-valued logic system and its logic calculus which are used together with signal strength information. As a basic algorithm for the fault simulation process, a well -known gate-level parallel pattern single fault propagation(PPSFP) technique is extended to switch-level circuits in order to handle pass-transistor circuits and precharged logic circuits as well as static CMOS circuits. Finally, we demonstrate the efficiency of our system through the experimental results for switch-level ISCAS85 benchmark combinational circuits and various industrial mixed-level circuits.

  • PDF

Image Pattern Classification and Recognition by Using the Associative Memory with Cellular Neural Networks (셀룰라 신경회로망의 연상메모리를 이용한 영상 패턴의 분류 및 인식방법)

  • Shin, Yoon-Cheol;Park, Yong-Hun;Kang, Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.2
    • /
    • pp.154-162
    • /
    • 2003
  • In this paper, Associative Memory with Cellular Neural Networks classifies and recognizes image patterns as an operator applied to image process. CNN processes nonlinear data in real-time like neural networks, and made by cell which communicates with each other directly through its neighbor cells as the Cellular Automata does. It is applied to the optimization problem, associative memory, pattern recognition, and computer vision. Image processing with CNN is appropriate to 2-D images, because each cell which corresponds to each pixel in the image is simultaneously processed in parallel. This paper shows the method for designing the structure of associative memory based on CNN and getting output image by choosing the most appropriate weight pattern among the whole learned weight pattern memories. Each template represents weight values between cells and updates them by learning. Hebbian rule is used for learning template weights and LMS algorithm is used for classification.

A Design and Implementation of the Division/square-Root for a Redundant Floating Point Binary Number using High-Speed Quotient Selector (고속 지수 선택기를 이용한 여분 부동 소수점 이진수의 제산/스퀘어-루트 설계 및 구현)

  • 김종섭;조상복
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.37 no.5
    • /
    • pp.7-16
    • /
    • 2000
  • This paper described a design and implementation of the division/square-root for a redundant floating point binary number using high-speed quotient selector. This division/square-root used the method of a redundant binary addition with 25MHz clock speed. The addition of two numbers can be performed in a constant time independent of the word length since carry propagation can be eliminated. We have developed a 16-bit VLSI circuit for division and square-root operations used extensively in each iterative step. It performed the division and square-toot by a redundant binary addition to the shifted binary number every 16 cycles. Also the circuit uses the nonrestoring method to obtain a quotient. The quotient selection logic used a leading three digits of partial remainders in order to be implemented in a simple circuit. As a result, the performance of the proposed scheme is further enhanced in the speed of operation process by applying new quotient selection addition logic which can be parallelly process the quotient decision field. It showed the speed-up of 13% faster than previously presented schemes used the same algorithms.

  • PDF

Efficient VLSI Architecture for Disparity Calculation based on Geodesic Support-weight (Geodesic Support-weight 기반 깊이정보 추출 알고리즘의 효율적인 VLSI 구조)

  • Ryu, Donghoon;Park, Taegeun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.9
    • /
    • pp.45-53
    • /
    • 2015
  • Adaptive support-weight based algorithm can produce better disparity map compared to generic area-based algorithms and also can be implemented as a realtime system. In this paper, we propose a realtime system based on geodesic support-weight which performs better segmentation of objects in the window. The data scheduling is analyzed for efficient hardware design and better performance and the parallel architecture for weight update which takes the longest delay is proposed. The exponential function is efficiently designed using a simple step function by careful error analysis. The proposed architecture is designed with verilogHDL and synthesized using Donbu Hitek 0.18um standard cell library. The proposed system shows 2.22% of error rate and can run up to 260Mhz (25fps) operation frequency with 182K gates.

Hardware Design of Bilateral Filter Based on Window Division (윈도우 분할 기반 양방향 필터의 하드웨어 설계)

  • Hyun, Yongho;Park, Taegeun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.12
    • /
    • pp.1844-1850
    • /
    • 2016
  • The bilateral filter can reduce the noise while preserving details computing the filtering output at each pixels as the average of neighboring pixels. In this paper, we propose a real-time system based on window division. Overall performance is increased due to the parallel architectures which computes five rows in the kernel window simultaneously but with pipelined scheduling. We consider the tradeoff between the filter performance and the hardware cost and the bit allocation has been determined by PSNR analysis. The proposed architecture is designed with verilogHDL and synthesized using Dongbu Hitek 110nm standard cell library. The proposed architecture shows 416Mpixels/s (397fps) of throughput at 416MHz of operating frequency with 132K gates.

A Study on the Instruction Set Architecture of Multimedia Extension Processor (멀티미디어 확장 프로세서의 명령어 집합 구조에 관한 연구)

  • O, Myeong-Hun;Lee, Dong-Ik;Park, Seong-Mo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.6
    • /
    • pp.420-435
    • /
    • 2001
  • As multimedia technology has rapidly grown recently, many researches to process multimedia data efficiently using general-purpose processors have been studied. In this paper, we proposed multimedia instructions which can process multimedia data effectively, and suggested a processor architecture for those instructions. The processor was described with Verilog-HDL in the behavioral level and simulated with CADENCE$^{TM}$ tool. Proposed multimedia instructions are total 48 instructions which can be classified into 7 groups. Multimedia data have 64-bit format and are processed as parallel subwords of 8-bit 8 bytes, 16-bit 4 half words or 32-bit 2 words. Modeled processor is developed based on the Integer Unit of SPARC V.9. It has five-stage pipeline RISC architecture with Harvard principle.e.

  • PDF

Fast GF(2m) Multiplier Architecture Based on Common Factor Post-Processing Method (공통인수 후처리 방식에 기반한 고속 유한체 곱셈기)

  • 문상국
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.6
    • /
    • pp.1188-1193
    • /
    • 2004
  • So far, there have been grossly 3 types of studies on GF(2m) multiplier architecture, such as serial multiplication, array multiplication, and hybrid multiplication. Serial multiplication method was first suggested by Mastrovito (1), to be known as the basic CF(2m) multiplication architecture, and this method was adopted in the array multiplier (2), consuming m times as much resource in parallel to extract m times of speed. In 1999, Paar studied further to get the benefit of both architecture, presenting the hybrid multiplication architecture (3). However, the hybrid architecture has defect that only complex ordo. of finite field should be used. In this paper, we propose a novel approach on developing serial multiplier architecture based on Mastrovito's, by modifying the numerical formula of the polynomial-basis serial multiplication. The proposed multiplier architecture was described and implemented in HDL so that the novel architecture was simulated and verified in the level of hardware as well as software. The implemented GF(2m) multiplier shows t times as fast as the traditional one, if we modularized the numerical expression by t number of parts.

A Cache Consistency Control for B-Tree Indices in a Database Sharing System (데이타베이스 공유 시스템에서 B-트리 인덱스를 위한 캐쉬 일관성 제어)

  • On, Gyeong-O;Jo, Haeng-Rae
    • The KIPS Transactions:PartD
    • /
    • v.8D no.5
    • /
    • pp.593-604
    • /
    • 2001
  • A database sharing system (DSS) refers to a system for high performance transaction processing. In the DSS, the processing nodes are coupled via a high speed network and share a common database at the disk level. Each node has a local memory and a separate copy of operating system. To reduce the number of disk accesses, the node caches data pages and index pages in its memory buffer. In general, B-tree index pages are accessed more often and thus cached at more processing nodes, than their corresponding data pages. There are also complicated operations in the B-tree such as Fetch, Fetch Next, Insertion and Deletion. Therefore, an efficient cache consistency scheme supporting high level concurrency is required. In this paper, we propose cache consistency schemes using identifiers of index pages and page_LSN of leaf page. The propose schemes can improve the system throughput by reducing the required message traffic between nodes and index re-traversal.

  • PDF

Non-Photorealistic Rendering Using CUDA-Based Image Segmentation (CUDA 기반 영상 분할을 사용한 비사실적 렌더링)

  • Yoon, Hyun-Cheol;Park, Jong-Seung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.11
    • /
    • pp.529-536
    • /
    • 2015
  • When rendering both three-dimensional objects and photo images together, the non-photorealistic rendering results are in visual discord since the two contents have their own independent color distributions. This paper proposes a non-photorealistic rendering technique which renders both three-dimensional objects and photo images such as cartoons and sketches. The proposed technique computes the color distribution property of the photo images and reduces the number of colors of both photo images and 3D objects. NPR is performed based on the reduced colormaps and edge features. To enhance the natural scene presentation, the image region segmentation process is preferred when extracting and applying colormaps. However, the image segmentation technique needs a lot of computational operations. It takes a long time for non-photorealistic rendering for large size frames. To speed up the time-consuming segmentation procedure, we use GPGPU for the parallel computing using the GPU. As a result, we significantly improve the execution speed of the algorithm.

Priority-based Multi-DNN scheduling framework for autonomous vehicles (자율주행차용 우선순위 기반 다중 DNN 모델 스케줄링 프레임워크)

  • Cho, Ho-Jin;Hong, Sun-Pyo;Kim, Myung-Sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.3
    • /
    • pp.368-376
    • /
    • 2021
  • With the recent development of deep learning technology, autonomous things technology is attracting attention, and DNNs are widely used in embedded systems such as drones and autonomous vehicles. Embedded systems that can perform large-scale operations and process multiple DNNs for high recognition accuracy without relying on the cloud are being released. DNNs with various levels of priority exist within these systems. DNNs related to the safety-critical applications of autonomous vehicles have the highest priority, and they must be handled first. In this paper, we propose a priority-based scheduling framework for DNNs when multiple DNNs are executed simultaneously. Even if a low-priority DNN is being executed first, a high-priority DNN can preempt it, guaranteeing the fast response characteristics of safety-critical applications of autonomous vehicles. As a result of checking through extensive experiments, the performance improved by up to 76.6% in the actual commercial board.