• Title/Summary/Keyword: 병렬처리 알고리즘

Search Result 697, Processing Time 0.029 seconds

Parallel clustering technology for real-time LWIR band image processing (실시간 LWIR 밴드 영상 처리를 위한 병렬 클러스터링 기술)

  • Cho, Yongjin;Lee, Kyou-seung;Hong, Seongha;Oh, Jong-woo;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.158-158
    • /
    • 2017
  • 비닐포장 하부에 위치한 콩의 생장 초기에 발생한 초엽을 인식하기 위한 연구를 수행중이다. 선행 연구에서 비닐포장에 접촉한 콩 초엽으로 인해 비닐포장 상부 표면의 열 반응 분포에 변화가 있음을 발견하였다. 현장에서 주행 중에 콩 초엽의 위치를 실시간으로 인식하고 연동된 선형 또는 회전형 엑츄에이터를 제어하여 정확한 위치에 천공을 수행하기 위해서는 계측 시스템과 제어 시스템간의 시간적 차이를 최소할 수 있는 실시간 신호 처리 기술이 필수적이다. 선행 연구에서 사용한 다중 IR 센서의 분해능은 $16{\times}4pixel$이며 주파수는 3 Hz로, 폭이 30cm 내외인 비닐포장 상부의 정밀 분석에 한계가 있음을 발견하였다. 이를 해결하기 위하여 분해능과 계측 주기를 개선할 수 있는 초소형 ($1cm{\times}1cm{\times}1cm$) 열화상 센서를 이용하였다. LWIR(Longwave infrared)영역에 해당하는 $8{\mu}m{\sim}14{\mu}m$의 영역에서 $0.05^{\circ}C$의 분해능을 보이는 $ Lepton^{TM}$ (500-0690-00, FLIR, Goleta, CA)모델을 사용하였다. 프레임당 $80{\times}60$ 픽셀의 정보가 2 Byte의 단위로 계측이 되며 9 Hz의 주파수로 대상면의 열 분포를 측정할 수 있다. 이론적으로 초당 정보 전송량은 86,400 Byte ($80{\times}60{\times}2{\times}9$)이며, 1 m를 진행하는 주행형 천공기에 적용할 경우 1 프레임당 10cm 정도의 면적을 측정하므로, 최대 위치 판정 분해능은 약 10 cm / 60 pixel = 0.17 cm/pixel로 상대적으로 정밀한 위치 판별이 가능하다. $80{\times}60{\times}2Byet$의 정보를 0.1초 이내에 분석해야 하는 기술적 과제를 해결하기 위하여 천공 작업기에 적합한 상용 SBC(Single board computer)의 클럭 속도(1 Ghz)로 처리 가능한 공간 분포 분석 알고리즘을 개발하였다. 전체 이미지 도메인을 한 번에 분석하는데 소요되는 시간을 최소화하기 위하여 공간정보 행렬을 균등히 배분하고 별도의 프로세서에서 Feature를 분석한 후 개별 프로세서의 결과를 경합식으로 판정하는 기술을 연구하였다. 오픈 소스인 MPICH(www.mpich.org) 라이브러리를 이용하여 개발한 신호 분석 프로그램을 클러스터링으로 연동된 개별 코어에 설치/수행 하였다. 2D 행렬인 열분포 정보를 공간적으로 균등 분배하여 개별 코어에서 행렬의 Spatial domain analysis를 수행하였다. $20{\times}20$의 클러스터링 단위를 이용할 경우 총 12개의 코어가 필요하였으며, 초당 10회의 연산이 가능함을 확인하였다. 병렬 클러스터링 기술을 이용하여 1m/s 내외의 주행 속도에 대응이 가능한 비닐포장 상부 열 분포 분석 시스템을 구현하였다.

  • PDF

ATM Cell Encipherment Method using Rijndael Algorithm in Physical Layer (Rijndael 알고리즘을 이용한 물리 계층 ATM 셀 보안 기법)

  • Im Sung-Yeal;Chung Ki-Dong
    • The KIPS Transactions:PartC
    • /
    • v.13C no.1 s.104
    • /
    • pp.83-94
    • /
    • 2006
  • This paper describes ATM cell encipherment method using Rijndael Algorithm adopted as an AES(Advanced Encryption Standard) by NIST in 2001. ISO 9160 describes the requirement of physical layer data processing in encryption/decryption. For the description of ATM cell encipherment method, we implemented ATM data encipherment equipment which satisfies the requirements of ISO 9160, and verified the encipherment/decipherment processing at ATM STM-1 rate(155.52Mbps). The DES algorithm can process data in the block size of 64 bits and its key length is 64 bits, but the Rijndael algorithm can process data in the block size of 128 bits and the key length of 128, 192, or 256 bits selectively. So it is more flexible in high bit rate data processing and stronger in encription strength than DES. For tile real time encryption of high bit rate data stream. Rijndael algorithm was implemented in FPGA in this experiment. The boundary of serial UNI cell was detected by the CRC method, and in the case of user data cell the payload of 48 octets (384 bits) is converted in parallel and transferred to 3 Rijndael encipherment module in the block size of 128 bits individually. After completion of encryption, the header stored in buffer is attached to the enciphered payload and retransmitted in the format of cell. At the receiving end, the boundary of ceil is detected by the CRC method and the payload type is decided. n the payload type is the user data cell, the payload of the cell is transferred to the 3-Rijndael decryption module in the block sire of 128 bits for decryption of data. And in the case of maintenance cell, the payload is extracted without decryption processing.

Design and Implementation of a Fault Simulation System for Mixed-level Combinational Logic Circuits (혼합형 조합 회로용 고장 시뮬레이션 시스템의 설계 및 구현)

  • Park, Yeong-Ho;Son, Jin-U;Park, Eun-Se
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.1
    • /
    • pp.311-323
    • /
    • 1997
  • This paper presents a fast fault simulation system for detecting stuck-at faults in mixed-level combinational logic circuits with gale level and switch -level primitives. For a practical fault simulator, the types are not restricted to static switch-level and/or gate-level circuits, but include dynamic switch-level circuits. To efficiently handle the multiple signal contention problems at wired logic elements, we propose a six-valued logic system and its logic calculus which are used together with signal strength information. As a basic algorithm for the fault simulation process, a well -known gate-level parallel pattern single fault propagation(PPSFP) technique is extended to switch-level circuits in order to handle pass-transistor circuits and precharged logic circuits as well as static CMOS circuits. Finally, we demonstrate the efficiency of our system through the experimental results for switch-level ISCAS85 benchmark combinational circuits and various industrial mixed-level circuits.

  • PDF

Image Pattern Classification and Recognition by Using the Associative Memory with Cellular Neural Networks (셀룰라 신경회로망의 연상메모리를 이용한 영상 패턴의 분류 및 인식방법)

  • Shin, Yoon-Cheol;Park, Yong-Hun;Kang, Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.2
    • /
    • pp.154-162
    • /
    • 2003
  • In this paper, Associative Memory with Cellular Neural Networks classifies and recognizes image patterns as an operator applied to image process. CNN processes nonlinear data in real-time like neural networks, and made by cell which communicates with each other directly through its neighbor cells as the Cellular Automata does. It is applied to the optimization problem, associative memory, pattern recognition, and computer vision. Image processing with CNN is appropriate to 2-D images, because each cell which corresponds to each pixel in the image is simultaneously processed in parallel. This paper shows the method for designing the structure of associative memory based on CNN and getting output image by choosing the most appropriate weight pattern among the whole learned weight pattern memories. Each template represents weight values between cells and updates them by learning. Hebbian rule is used for learning template weights and LMS algorithm is used for classification.

A Design and Implementation of the Division/square-Root for a Redundant Floating Point Binary Number using High-Speed Quotient Selector (고속 지수 선택기를 이용한 여분 부동 소수점 이진수의 제산/스퀘어-루트 설계 및 구현)

  • 김종섭;조상복
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.37 no.5
    • /
    • pp.7-16
    • /
    • 2000
  • This paper described a design and implementation of the division/square-root for a redundant floating point binary number using high-speed quotient selector. This division/square-root used the method of a redundant binary addition with 25MHz clock speed. The addition of two numbers can be performed in a constant time independent of the word length since carry propagation can be eliminated. We have developed a 16-bit VLSI circuit for division and square-root operations used extensively in each iterative step. It performed the division and square-toot by a redundant binary addition to the shifted binary number every 16 cycles. Also the circuit uses the nonrestoring method to obtain a quotient. The quotient selection logic used a leading three digits of partial remainders in order to be implemented in a simple circuit. As a result, the performance of the proposed scheme is further enhanced in the speed of operation process by applying new quotient selection addition logic which can be parallelly process the quotient decision field. It showed the speed-up of 13% faster than previously presented schemes used the same algorithms.

  • PDF

VLSI architecture design of CAVLC entropy encoder/decoder for H.264/AVC (H.264/AVC를 위한 CAVLC 엔트로피 부/복호화기의 VLSI 설계)

  • Lee Dae-joon;Jeong Yong-jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.5C
    • /
    • pp.371-381
    • /
    • 2005
  • In this paper, we propose an advanced hardware architecture for the CAVLC entropy encoder/decoder engine for real time video compression. The CAVLC (Context-based Adaptive Variable Length Coding) is a lossless compression method in H.264/AVC and it has high compression efficiency but has computational complexity. The reference memory size is optimized using partitioned storing method and memory reuse method which are based on partiality of memory referencing. We choose the hardware architecture which has the most suitable one in several encoder/decoder architectures for the mobile devices and improve its performance using parallel processing. The proposed architecture has been verified by ARM-interfaced emulation board using Altera Excalibur and also synthesized on Samsung 0.18 um CMOS technology. The synthesis result shows that the encoder can process about 300 CIF frames/s at 150MHz and the decoder can process about 250 CIF frames/s at 140Mhz. The hardware architectures are being used as core modules when implementing a complete H.264/AVC video encoder/decoder chip for real-time multimedia application.

Non-Photorealistic Rendering Using CUDA-Based Image Segmentation (CUDA 기반 영상 분할을 사용한 비사실적 렌더링)

  • Yoon, Hyun-Cheol;Park, Jong-Seung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.11
    • /
    • pp.529-536
    • /
    • 2015
  • When rendering both three-dimensional objects and photo images together, the non-photorealistic rendering results are in visual discord since the two contents have their own independent color distributions. This paper proposes a non-photorealistic rendering technique which renders both three-dimensional objects and photo images such as cartoons and sketches. The proposed technique computes the color distribution property of the photo images and reduces the number of colors of both photo images and 3D objects. NPR is performed based on the reduced colormaps and edge features. To enhance the natural scene presentation, the image region segmentation process is preferred when extracting and applying colormaps. However, the image segmentation technique needs a lot of computational operations. It takes a long time for non-photorealistic rendering for large size frames. To speed up the time-consuming segmentation procedure, we use GPGPU for the parallel computing using the GPU. As a result, we significantly improve the execution speed of the algorithm.

Hardware/Software Partitioning Methodology for Reconfigurable System (재구성형 시스템을 위한 하드웨어/소프트웨어 분할 기법)

  • Kim, Jun-Yong;Ahn, Seong-Yong;Lee, Jeong-A.
    • The KIPS Transactions:PartA
    • /
    • v.11A no.5
    • /
    • pp.303-312
    • /
    • 2004
  • In this paper, we propose a methodology solving the problem of the hardware-software partitioning in reconfigurable systems using a Y-chart design space exploration and implement a simulator according to the methodology. The methodology generates a mapping set between tasks and hardware elements using the hardware element model and the application model. We evaluate the throughput by simulating cases in each mapping set. With the throughput evaluation result, we can select the mapping case with the highest throughput. We also propose an heuristic improving the simulation time by reducing the mapping set on the basis of the relationship between workload and parallelism. Simulation results show that we can reduce the size of mapping set which poses difficulties on hardware-software partitioning by up to 80%.

Ring Embedding in (n.K) Star Graphs with Faulty Nodes (결함 노드를 갖는 (n,K)-스타 그래프에서의 링 임베딩)

  • Chang, Jung-Hwan;Kim, Jin-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.1
    • /
    • pp.22-34
    • /
    • 2002
  • In this paper, we consider ring embeding problem in faulty (n,k) star graphs which is recently proposed as an alternative interconnection network topology, By effectively utilizing such strategies as series of dimension expansions and even distribution of faulty nodes into sub-stars in graph itself. we prove that it is possible to construct a maximal fault-free ring excluding only faulty nodes when the number of faults is no more than n-3 and $n-k{\geq}2$, and also propose an algorithm which can embed the corresponding ring in (n.k)-star graphs This results will be applied into the multicasting applications that the underlying cycle properties on the multi-computer system.

High Throughput Parallel KMP Algorithm Considering CPU-GPU Memory Hierarchy (CPU-GPU 메모리 계층을 고려한 고처리율 병렬 KMP 알고리즘)

  • Park, Soeun;Kim, Daehee;Lee, Myungho;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.5
    • /
    • pp.656-662
    • /
    • 2018
  • Pattern matching algorithm is widely used in many application fields such as bio-informatics, intrusion detection, etc. Among many string matching algorithms, KMP (Knuth-Morris-Pratt) algorithm is commonly used because of its fast execution time when using large texts. However, the processing speed of KMP algorithm is also limited when the text size increases significantly. In this paper, we propose a high throughput parallel KMP algorithm considering CPU-GPU memory hierarchy based on OpenCL in GPGPU (General Purpose computing on Graphic Processing Unit). We focus on the optimization for the allocation of work-times and work-groups, the local memory copy of the pattern data and the failure table, and the overlapping of the data transfer with the string matching operations. The experimental results show that the execution time of the optimized parallel KMP algorithm is about 3.6 times faster than that of the non-optimized parallel KMP algorithm.