• Title/Summary/Keyword: 병렬 연산 처리

Search Result 554, Processing Time 0.024 seconds

Multiple-Valued Logic Multiplier for System-On-Panel (System-On-Panel을 위한 다치 논리 곱셈기 설계)

  • Hong, Moon-Pyo;Jeong, Ju-Young
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.2
    • /
    • pp.104-112
    • /
    • 2007
  • We developed a $7{\times}7$ parallel multiplier using LTPS-TFT. The proposed multiplier has multi-valued logic 7-3 Compressor with folding, 3-2 Compressor, and final carry propagation adder. Architecture minimized the carry propagation. And power consumption reduced by switching the current source to the circuit which is operated in current mode. The proposed multiplier improved PDP by 23%, EDP by 59%, and propagation delay time by 47% compared with Wallace Tree multiplier.

Automatic Stereo Matching for Auto-stereoscopic 3D display (무안경식 3D 디스플레이를 위한 자동 스테레오 정합)

  • Choi, Ho Yeol;Park, Jiho;Kim, Y.H.
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.07a
    • /
    • pp.140-141
    • /
    • 2012
  • 최근 영상분야의 키워드는 초고품질화, 초실감화, 스마트화로 대표될 수 있다. 그 중에서도 무안경식 3D는 초실감화를 이루기 위한 핵심응용분야 중 하나이다. 하지만 무안경식 3D 단말기가 성공적으로 보급되기 위해서는 연구되어야 할 분야가 여전히 존재한다. 그 중에서도 본 논문에서는 고화질의 무안경식 3D 스마트 콘텐츠 제작에 필요한 자동 스테레오 정합 기법을 제안하였다. 이전까지 연구된 변이지도 추출을 위한 알고리즘은 전역적 최적화 방법을 사용할 시 영상의 해상도와 깊이 정도에 따른 연산량의 증가로 많은 수행시간이 요구되었다. 또한 좌/우 영상의 intensity 정보만으로는 정확한 변이지도 추출이 어렵다는 한계점이 존재하였다. 이러한 이유로 본 논문에서는 스트림 영상에서 프레임 간의 정보를 이용하여 신뢰지도와 경계정보를 생성하였으며 belief propagation 스테레오 정합 방법을 이용하여 고화질의 정확한 변이지도를 추출하였다. 또한, 알고리즘의 연산량에 대한 문제를 해결하기 위한 고속화 방안으로, 최근 많은 연구가 이루어지고 있는 GPU(graphics processing units) 를 이용한 병렬처리를 연구하였다. 마지막으로 연구결과의 신뢰성을 향상하기 위하여 다양한 데이터를 이용한 실험을 통해 고화질의 영상정보를 고속으로 추출할 수 있음을 확인하였다.

  • PDF

Acceleration Method of Inter Prediction using Advanced SIMD (Advanced SIMD를 이용한 화면 간 예측 고속화방법)

  • Kim, Wan-Su;Lee, Jae-Heung
    • Journal of IKEEE
    • /
    • v.16 no.4
    • /
    • pp.382-388
    • /
    • 2012
  • An H.264/AVC fast motion estimation methodology is presented in this paper. Advanced SIMD based NEON which is one of the parallel processing methods is supported under the ARM Cortex-A9 dual-core platform. NEON is applied to a full search technique with one of the various motion estimation methods and SAD operation count of each macroblock is reduced to 1/4. Pixel values of the corresponding macroblock are assigned to eight 16-bit NEON registers and Intrinsic function in NEON architecture carried out 128 bits arithmetic operations at the same time. In this way, the exact motion vector with the minimum SAD value among the calculated SAD values can be designated. Experimental results show that performance gets improved 30% above average in accordance with the size of image and macroblock.

Programmable Multimedia Platform for Video Processing of UHD TV (UHD TV 영상신호처리를 위한 프로그래머블 멀티미디어 플랫폼)

  • Kim, Jaehyun;Park, Goo-man
    • Journal of Broadcast Engineering
    • /
    • v.20 no.5
    • /
    • pp.774-777
    • /
    • 2015
  • This paper introduces the world's first programmable video-processing platform for the enhancement of the video quality of the 8K(7680x4320) UHD(Ultra High Definition) TV operating up to 60 frames per second. In order to support required computing capacity and memory bandwidth, the proposed platform implemented several key features such as symmetric multi-cluster architecture for parallel data processing, a ring-data path between the clusters for data pipelining and hardware accelerators for computing filter operations. The proposed platform based on RP(Reconfigurable Processor) processes video quality enhancement algorithms and handles effectively new UHD broadcasting standards and display panels.

Design Plan of Signal Processing Structure for Real-Time Application in Drone Detection Radar (실시간 적용을 위한 드론 탐지 레이다용 신호처리 구조 설계 방안)

  • Kong, Young-Joo;Sohn, Sung-Hwan;Hyun, Jun-Seok;Yoo, Dong-Gil;Cho, In-Cheol
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.3
    • /
    • pp.31-36
    • /
    • 2022
  • Recently, drones are being used in various fields, and drone technology is also developing. The risks of drones are increasing, then technology to detect drones is important. However, it is extremely difficult to detect and recognize drones due to the low level radar cross section of the commercial drones. In this paper, a signal processor structure that was mounted the miniaturized and light-weighted was designed. in order to process large amounts of data in real time, parallel processing was performed for each channel and an algorithm was applied to shorten the operation time in each step. As a test of verifing the detection performance through test, it was confirmed that the structure design works in real time.

Research for Improving the Speed of Scrambler in the WAVE System (WAVE 시스템에서 스크램블러의 속도 향상을 위한 연구)

  • Lee, Dae-Sik;You, Young-Mo;Lee, Sang-Youn;Oh, Se-Kab
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37A no.9
    • /
    • pp.799-808
    • /
    • 2012
  • Bit operation of scrambler in the WAVE System become less efficient because parallel processing is impossible in terms of hardware and software. In this paper, we propose algorism to find the starting position of the matrix table. Also, when bit operation algorithm of scrambler and algorithms for matrix table, algorithm used to find starting position of the matrix table were compared with the performance as 8 bit, 16bit, 32 bit processing units. As a result, the number of processing times per second could be done 2917.8 times more in an 8-bit, 5432.1 times in a 16-bit, 10277.8 times in a 32 bit. Therefore, algorithm to find the starting position of the matrix table improves the speed of the scrambler in the WAVE and the receiving speed of a variety of information gathering and precision over the Vehicle to Infra or Vehicle to Vehicle in the Intelligent Transport Systems.

A VLSI Implementation of Real-time 8$\times$8 2-D DCT Processor for the Subprimary Rate Video Codec (저 전송률 비디오 코덱용 실시간 8$\times$8 이차원 DCT 처리기의 VLSI 구현)

  • 권용무;김형곤
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.15 no.1
    • /
    • pp.58-70
    • /
    • 1990
  • This paper describes a VLSI implementation of real-time two dimensional DCT processor for the subprimary rate video codec system. The proposed architecture exploits the parallelism and concurrency of the distributes architecture for vector inner product operation of DCT and meets the CCITT performance requirements of video codec for full CSIF 30 frames/sec. It is also shown that this architecture satisfies all the CCITT IDCT accuracy specification by simulating the suggested architecture in bit level. The efficient VLSI disign methodology to design suggested architecture is considered and the module generator oriented design environments are constructed based on SUN 3/150C workstation. Using the constructed design environments. the suggensted architecture have been designed by double metal 2micron CMOS technology. The chip area fo designed 8x8 2-D DA-DCT (Distributed Arithmetic DCT) processor is about 3.9mmx4.8mm.

  • PDF

The Design of 10-bit 200MS/s CMOS Parallel Pipeline A/D Converter (10-비트 200MS/s CMOS 병렬 파이프라인 아날로그/디지털 변환기의 설계)

  • Chung, Kang-Min
    • The KIPS Transactions:PartA
    • /
    • v.11A no.2
    • /
    • pp.195-202
    • /
    • 2004
  • This paper introduces the design or parallel Pipeline high-speed analog-to-digital converter(ADC) for the high-resolution video applications which require very precise sampling. The overall architecture of the ADC consists of 4-channel parallel time-interleaved 10-bit pipeline ADC structure a]lowing 200MSample/s sampling speed which corresponds to 4-times improvement in sampling speed per channel. Key building blocks are composed of the front-end sample-and-hold amplifier(SHA), the dynamic comparator and the 2-stage full differential operational amplifier. The 1-bit DAC, comparator and gain-2 amplifier are used internally in each stage and they were integrated into single switched capacitor architecture allowing high speed operation as well as low power consumption. In this work, the gain of operational amplifier was enhanced significantly using negative resistance element. In the ADC, a delay line Is designed for each stage using D-flip flops to align the bit signals and minimize the timing error in the conversion. The converter has the power dissipation of 280㎽ at 3.3V power supply. Measured performance includes DNL and INL of +0.7/-0.6LSB, +0.9/-0.3LSB.

Profiler Design for Evaluating Performance of WebCL Applications (WebCL 기반 애플리케이션의 성능 평가를 위한 프로파일러 설계 및 구현)

  • Kim, Cheolwon;Cho, Hyeonjoong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.8
    • /
    • pp.239-244
    • /
    • 2015
  • WebCL was proposed for high complex computing in Javascript. Since WebCL-based applications are distributed and executed on an unspecified number of general clients, it is important to profile their performances on different clients. Several profilers have been introduced to support various programming languages but WebCL profiler has not been developed yet. In this paper, we present a WebCL profiler to evaluate WebCL-based applications and monitor the status of GPU on which they run. This profiler helps developers know the execution time of applications, memory read/write time, GPU statues such as its power consumption, temperature, and clock speed.

Image Pattern Classification and Recognition by using Associative Memories with Cellular Neural Networks (셀룰라신경회로망의 연상메모리를 이용한 영상 패턴의 분류 및 인식 방법)

  • Shin, Yoon-Cheol;Park, Yong-Hun;Kang, Hoon
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2002.05a
    • /
    • pp.231-234
    • /
    • 2002
  • 셀룰라 신경회로망의 연상 메모리를 이용하여 시각적인 입력 데이터의 연산을 통하여 영상 패턴의 분류와 인식을 수행한다. 셀룰라 신경회로망은 일반적인 신경회로망과 같이 비선형 데이터의 실시간 처리가 가능하고, 세 포자동자와 같이 격자구조의 셀로 이루어져 인접한 셀과 직접 정보를 주고받는다. 응용 분야로는 최적화, 선형/비선형화, 연상 메모리, 패턴인식, 컴퓨터 비젼 등에 적용할 수 있다. 영상의 이미지 픽셀을 셀룰라 신경회로망의 셀에 대응하여 전체 이미지 영상을 모든 셀룰라 신경회로망의 셀에서 동시에 병렬로 처리할 수 있어 2-D 이미지 처리에 적합하다 본 논문은 셀룰라 신경회로망에 의한 연상 메모리 구조를 설계하고, 학습된 하중값 메모리에서 가장 적당한 하중값을 선택하여 학습된 영상과 정확히 일치하는 출력을 얻는 방법을 제시한다. 학습을 통한 연상 메모리 구현에는 각각의 뉴런에서 일정하지 않은 다른 템플릿을 사용한다. 각각의 템플릿은 뉴런들 간의 연결 하중값을 나타내고 학습011 따라 갱신된다. 학습방법으로는 템플릿 하중값 학습에 뉴런들 간의 연결 하중값을 조정하는 가장 단순한 규칙인 Hebb의 학습방법이 사용되었고 분류값 학습에 LMS 알고리즘이 사용되었다

  • PDF