• Title/Summary/Keyword: Fully-parallel architecture

Search Result 32, Processing Time 0.026 seconds

An embedded vision system based on an analog VLSI Optical Flow vision sensor

  • Becanovic, Vlatako;Matsuo, Takayuki;Stocker, Alan A.
    • Proceedings of the Korea Society of Information Technology Applications Conference
    • /
    • 2005.11a
    • /
    • pp.285-288
    • /
    • 2005
  • We propose a novel programmable miniature vision module based on a custom designed analog VLSI (aVLSI) chip. The vision module consists of the optical flow vision sensor embedded with commercial off-the-shelves digital hardware; in our case is the Intel XScale PXA270 processor enforced with a programmable gate array device. The aVLSI sensor provides gray-scale imager data as well as smooth optical flow estimates, thus each pixel gives a triplet of information that can be continuously read out as three independent images. The particular computational architecture of the custom designed sensor, which is fully parallel and also analog, allows for efficient real-time estimations of the smooth optical flow. The Intel XScale PXA270 controls the sensor read-out and furthermore allows, together with the programmable gate array, for additional higher level processing of the intensity image and optical flow data. It also provides the necessary standard interface such that the module can be easily programmed and integrated into different vision systems, or even form a complete stand-alone vision system itself. The low power consumption, small size and flexible interface of the proposed vision module suggests that it could be particularly well suited as a vision system in an autonomous robotics platform and especially well suited for educational projects in the robotic sciences.

  • PDF

Input-Series-Output-Parallel Connected DC/DC Converter for a Photovoltaic PCS with High Efficiency under a Wide Load Range

  • Lee, Jong-Pil;Min, Byung-Duk;Kim, Tae-Jin;Yoo, Dong-Wook;Yoo, Ji-Yoon
    • Journal of Power Electronics
    • /
    • v.10 no.1
    • /
    • pp.9-13
    • /
    • 2010
  • This paper proposes an input-series-output-parallel connected ZVS full bridge converter with interleaved control for photovoltaic power conditioning systems (PV PCS). The input-series connection enables a fully modular power-system architecture, where low voltage and standard power modules can be connected in any combination at the input and/or at the output, to realize any given specifications. Further, the input-series connection enables the use of low-voltage MOSFETs that are optimized for a very low RDSON, thus, resulting in lower conduction losses. The system costs decrease due to the reduced current, and the volumes of the output filters due to the interleaving technique. A topology for a photovoltaic (PV) dc/dc converter that can dramatically reduce the power rating and increase the efficiency of a PV system by analyzing the PV module characteristics is proposed. The control scheme, consisting of an output voltage loop, a current loop and input voltage balancing loops, is proposed to achieve input voltage sharing and output current sharing. The total PV system is implemented for a 10-kW PV power conditioning system (PCS). This system has a dc/dc converter with a 3.6-kW power rating. It is only one-third of the total PV PCS power. A 3.6-kW prototype PV dc/dc converter is introduced to experimentally verify the proposed topology. In addition, experimental results show that the proposed topology exhibits good performance.

An Efficient Parallelized Algorithm of SEED Block Cipher on Cell BE (CELL 프로세서를 이용한 SEED 블록 암호화 알고리즘의 효율적인 병렬화 기법)

  • Kim, Deok-Ho;Yi, Jae-Young;Ro, Won-Woo
    • The KIPS Transactions:PartA
    • /
    • v.17A no.6
    • /
    • pp.275-280
    • /
    • 2010
  • In this paper, we discuss and propose an efficiently parallelized block cipher algorithm on the CELL BE processor. With considering the heterogeneous feature of the CELL BE architecture, we apply different encoding/decoding methods to PPE and SPE and improve the throughput. Our implementation was fully tested, with execution results showing achievement of high throughput, capable of supporting as high network speed as 2.59 Gbps. Compared to various parallel implementations on multi-core systems, our approach provides speedup of 1.34 in terms of encoding/decoding speed.

"Buildings Without Walls:" A Tectonic Case for Two "First" Skyscrapers

  • Leslie, Thomas
    • International Journal of High-Rise Buildings
    • /
    • v.9 no.1
    • /
    • pp.53-60
    • /
    • 2020
  • "A practical architect might not unnaturally conceive the idea of erecting a vast edifice whose frame should be entirely of iron, and clothing the frame--preserving it--by means of a casing of stone…that shell must be regarded only as an envelope, having no function other than supporting itself..." --Viollet-le-Duc, 1868. Viollet-le-Duc's recipe for an encased iron frame foresaw the separation of structural and enclosing functions into discrete systems. This separation is an essential characteristic of skyscrapers today, but at the time of his writing cast iron's brittle nature meant that iron frames could not, on their own, resist lateral forces in tall structures. Instead, tall buildings had to be braced with masonry shear walls, which often also served as environmental enclosure. The commercial availability of steel after the 1880s allowed for self-braced metal frames while parallel advances in glass and terra cotta allowed exterior walls to achieve vanishingly thin proportions. Two Chicago buildings by D.H. Burnham & Co. were the first to match a frame "entirely of iron" with an "envelope" supporting only itself. The Reliance Building (1895) was the first of these, but the Fisher Building (1896) more fully exploited this new constructive typology, eschewing brick entirely, to become the first "building without walls," a break with millennia of tall construction reliant upon masonry

Revolutionizing Brain Tumor Segmentation in MRI with Dynamic Fusion of Handcrafted Features and Global Pathway-based Deep Learning

  • Faizan Ullah;Muhammad Nadeem;Mohammad Abrar
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.1
    • /
    • pp.105-125
    • /
    • 2024
  • Gliomas are the most common malignant brain tumor and cause the most deaths. Manual brain tumor segmentation is expensive, time-consuming, error-prone, and dependent on the radiologist's expertise and experience. Manual brain tumor segmentation outcomes by different radiologists for the same patient may differ. Thus, more robust, and dependable methods are needed. Medical imaging researchers produced numerous semi-automatic and fully automatic brain tumor segmentation algorithms using ML pipelines and accurate (handcrafted feature-based, etc.) or data-driven strategies. Current methods use CNN or handmade features such symmetry analysis, alignment-based features analysis, or textural qualities. CNN approaches provide unsupervised features, while manual features model domain knowledge. Cascaded algorithms may outperform feature-based or data-driven like CNN methods. A revolutionary cascaded strategy is presented that intelligently supplies CNN with past information from handmade feature-based ML algorithms. Each patient receives manual ground truth and four MRI modalities (T1, T1c, T2, and FLAIR). Handcrafted characteristics and deep learning are used to segment brain tumors in a Global Convolutional Neural Network (GCNN). The proposed GCNN architecture with two parallel CNNs, CSPathways CNN (CSPCNN) and MRI Pathways CNN (MRIPCNN), segmented BraTS brain tumors with high accuracy. The proposed model achieved a Dice score of 87% higher than the state of the art. This research could improve brain tumor segmentation, helping clinicians diagnose and treat patients.

A Dynamic Service Binding Framework for Embedded Devices (임베디드 장치를 위한 동적 서비스 연결 프레임워크)

  • Yeom, Gwy-Duk;Lee, Jeong-Geum
    • The KIPS Transactions:PartA
    • /
    • v.14A no.2
    • /
    • pp.117-124
    • /
    • 2007
  • In this paper we present a translation lookaside buffer (TLB) system with low power consumption for embedded processors. The proposed TLB is constructed as multiple banks, each with an associated block buffer and a corresponding comparator. Either the block buffer or the main bank is selectively accessed on the basis of two bits in the block buffer (tag buffer). Dynamic power savings are achieved by reducing the number of entries accessed in parallel, as a result of using the tag buffer as a filtering mechanism. The performance overhead of the proposed TLB is negligible compared with other hierarchical TLB structures. For example, the two-cycle overhead of the proposed TLB is only about 1%, as compared with 5% overhead for a filter (micro) TLB and 14% overhead for a same structure without continuos accessing distinction algorithm. We show that the average hit ratios of the block buffers and the main banks of the proposed TLB are 95% and 5% respectively. Dynamic power is reduced by about 95% with respect to with a fully associative TLB, 90% with respect to a filter TLB, and 40% relative to a same structure without continuos accessing distinction algorithm.

A Development of JPEG-LS Platform for Mirco Display Environment in AR/VR Device. (AR/VR 마이크로 디스플레이 환경을 고려한 JPEG-LS 플랫폼 개발)

  • Park, Hyun-Moon;Jang, Young-Jong;Kim, Byung-Soo;Hwang, Tae-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.2
    • /
    • pp.417-424
    • /
    • 2019
  • This paper presents the design of a JPEG-LS codec for lossless image compression from AR/VR device. The proposed JPEG-LS(: LosSless) codec is mainly composed of a context modeling block, a context update block, a pixel prediction block, a prediction error coding block, a data packetizer block, and a memory block. All operations are organized in a fully pipelined architecture for real time image processing and the LOCO-I compression algorithm using improved 2D approach to compliant with the SBT coding. Compared with a similar study in JPEG-LS, the Block-RAM size of proposed STB-FLC architecture is reduced to 1/3 compact and the parallel design of the predication block could improved the processing speed.

Analysis on the Active/Inactive Status of Computational Resources for Improving the Performance of the GPU (GPU 성능 저하 해결을 위한 내부 자원 활용/비활용 상태 분석)

  • Choi, Hongjun;Son, Dongoh;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.1-11
    • /
    • 2015
  • In recent high performance computing system, GPGPU has been widely used to process general-purpose applications as well as graphics applications, since GPU can provide optimized computational resources for massive parallel processing. Unfortunately, GPGPU doesn't exploit computational resources on GPU in executing general-purpose applications fully, because the applications cannot be optimized to GPU architecture. Therefore, we provide GPU research guideline to improve the performance of computing systems using GPGPU. To accomplish this, we analyze the negative factors on GPU performance. In this paper, in order to clearly classify the cause of the negative factors on GPU performance, GPU core status are defined into 5 status: fully active status, partial active status, idle status, memory stall status and GPU core stall status. All status except fully active status cause performance degradation. We evaluate the ratio of each GPU core status depending on the characteristics of benchmarks to find specific reasons which degrade the performance of GPU. According to our simulation results, partial active status, idle status, memory stall status and GPU core stall status are induced by computational resource underutilization problem, low parallelism, high memory requests, and structural hazard, respectively.

A New Hardware Design for Generating Digital Holographic Video based on Natural Scene (실사기반 디지털 홀로그래픽 비디오의 실시간 생성을 위한 하드웨어의 설계)

  • Lee, Yoon-Hyuk;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.11
    • /
    • pp.86-94
    • /
    • 2012
  • In this paper we propose a hardware architecture of high-speed CGH (computer generated hologram) generation processor, which particularly reduces the number of memory access times to avoid the bottle-neck in the memory access operation. For this, we use three main schemes. The first is pixel-by-pixel calculation rather than light source-by-source calculation. The second is parallel calculation scheme extracted by modifying the previous recursive calculation scheme. The last one is a fully pipelined calculation scheme and exactly structured timing scheduling by adjusting the hardware. The proposed hardware is structured to calculate a row of a CGH in parallel and each hologram pixel in a row is calculated independently. It consists of input interface, initial parameter calculator, hologram pixel calculators, line buffer, and memory controller. The implemented hardware to calculate a row of a $1,920{\times}1,080$ CGH in parallel uses 168,960 LUTs, 153,944 registers, and 19,212 DSP blocks in an Altera FPGA environment. It can stably operate at 198MHz. Because of the three schemes, the time to access the external memory is reduced to about 1/20,000 of the previous ones at the same calculation speed.

Realization of the Pulse Doppler Radar Signal Processor with an Expandable Feature using the Multi-DSP Based Morocco-2 Board (다중 DSP 구조의 Morocco-2 보드를 이용한 확장성을 갖는 펄스 도플러 레이다 신호처리기 구현)

  • 조명제;임중수
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.12 no.7
    • /
    • pp.1147-1156
    • /
    • 2001
  • In this paper, a new design architecture of radar signal processor in real time is proposed. It has been designed and implemented under the consideration to minimize the inter-processor communication overhead and to maintain the coherence in Doppler pulse domain and in range domain. Its structure can be easily reconfigured and reprogrammed in accordance with an addition of function algorithm or a modification of operational scenario. As we designed a task configuration for parallel processing from measures of computation time for function algorithms and transmission time for results by signal processing, data exchange between processors for performing of function algorithms could be fully removed. Morocco-2 board equipped ADSP-21060 processor of Analog Devices inc. and APEX-3.2 developed for SHARC DSP were used to construct the radar signal processor.

  • PDF