• Title/Summary/Keyword: and Parallel Processing

Search Result 2,013, Processing Time 0.037 seconds

A Network Time Server using CPS (GPS를 이용한 네트워크 시각 서버)

  • 황소영;유동희
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.5
    • /
    • pp.1004-1009
    • /
    • 2004
  • Precise time synchronization is a main technology in high-speed communications, parallel and distributed processing systems, Internet information industry and electronic commerce. Synchronized clocks are useful for many leasers. Often a distributed system is designed to realize some synchronized behavior, especially in real-time processing in factories, aircraft, space vehicles, and military applications. Nowadays, time synchronization has been compulsory thing as distributed processing and network operations are generalized. A network time server obtains, keeps accurate and precise time by synchronizing its local clock to standard reference time source and distributes time information through standard time synchronization protocol. This paper describes design issues and implementation of a network time server for time synchronization especially based on a clock model. The system uses GPS (Global Positioning System) as a standard reference time source and offers UTC (universal Time coordinated) through NTP (Network Time protocol). Implementation result and performance analysis are also presented.

Design of High-performance Pedestrian and Vehicle Detection Circuit using Haar-like Features (Haar-like 특징을 이용한 고성능 보행자 및 차량 인식 회로 설계)

  • Kim, Soo-Jin;Park, Sang-Kyun;Lee, Seon-Young;Cho, Kyeong-Soon
    • The KIPS Transactions:PartA
    • /
    • v.19A no.4
    • /
    • pp.175-180
    • /
    • 2012
  • This paper describes the design of high-performance pedestrian and vehicle detection circuit using the Haar-like features. The proposed circuit uses a sliding window for every image frame in order to extract Haar-like features and to detect pedestrians and vehicles. A total of 200 Haar-like features per sliding window is extracted from Haar-like feature extraction circuit and the extracted features are provided to AdaBoost classifier circuit. In order to increase the processing speed, the proposed circuit adopts the parallel architecture and it can process two sliding windows at the same time. We described the proposed high-performance pedestrian and vehicle detection circuit using Verilog HDL and synthesized the gate-level circuit using the 130nm standard cell library. The synthesized circuit consists of 1,388,260 gates and its maximum operating frequency is 203MHz. Since the proposed circuit processes about 47.8 $640{\times}480$ image frames per second, it can be used to provide the real-time detection of pedestrians and vehicles.

Parallel Video Processing Using Divisible Load Scheduling Paradigm

  • Suresh S.;Mani V.;Omkar S. N.;Kim H.J.
    • Journal of Broadcast Engineering
    • /
    • v.10 no.1 s.26
    • /
    • pp.83-102
    • /
    • 2005
  • The problem of video scheduling is analyzed in the framework of divisible load scheduling. A divisible load can be divided into any number of fractions (parts) and can be processed/computed independently on the processors in a distributed computing system/network, as there are no precedence relationships. In the video scheduling, a frame can be split into any number of fractions (tiles) and can be processed independently on the processors in the network, and then the results are collected to recompose the single processed frame. The divisible load arrives at one of the processors in the network (root processor) and the results of the computation are collected and stored in the same processor. In this problem communication delay plays an important role. Communication delay is the time to send/distribute the load fractions to other processors in the network. and the time to collect the results of computation from other processors by the root processors. The objective in this scheduling problem is that of obtaining the load fractions assigned to each processor in the network such that the processing time of the entire load is a minimum. We derive closed-form expression for the processing time by taking Into consideration the communication delay in the load distribution process and the communication delay In the result collection process. Using this closed-form expression, we also obtain the optimal number of processors that are required to solve this scheduling problem. This scheduling problem is formulated as a linear pro-gramming problem and its solution using neural network is also presented. Numerical examples are presented for ease of understanding.

A Study on Distributed Processing of Big Data and User Authentication for Human-friendly Robot Service on Smartphone (인간 친화적 로봇 서비스를 위한 대용량 분산 처리 기술 및 사용자 인증에 관한 연구)

  • Choi, Okkyung;Jung, Wooyeol;Lee, Bong Gyou;Moon, Seungbin
    • Journal of Internet Computing and Services
    • /
    • v.15 no.1
    • /
    • pp.55-61
    • /
    • 2014
  • Various human-friendly robot services have been developed and mobile cloud computing is a real time computing service that allows users to rent IT resources what they want over the internet and has become the new-generation computing paradigm of information society. The enterprises and nations are actively underway of the business process using mobile cloud computing and they are aware of need for implementing mobile cloud computing to their business practice, but it has some week points such as authentication services and distributed processing technologies of big data. Sometimes it is difficult to clarify the objective of cloud computing service. In this study, the vulnerability of authentication services on mobile cloud computing is analyzed and mobile cloud computing model is constructed for efficient and safe business process. We will also be able to study how to process and analyze unstructured data in parallel to this model, so that in the future, providing customized information for individuals may be possible using unstructured data.

Reconfiguration Problems in VLSI and WSI Cellular Arrays (초대규모 집적 또는 웨이퍼 규모 집적을 이용한 셀룰러 병렬 처리기의 재구현)

  • 한재일
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.10
    • /
    • pp.1553-1571
    • /
    • 1993
  • A significant amount of research has focused on the development of highly parallel architectures to obtain far more computational power than conventional computer systems. These architectures usually comprise of a large number of processors communicating through an interconnection network. The VLSI (Very Large Scale Integration) and WSI (Wafer Scale Integration) cellular arrays form one important class of those parallel architectures, and consist of a large number of simple processing cells, all on a single chip or wafer, each interconnected only to its neighbors. This paper studies three fundamental issues in these arrays : fault-tolerant reconfiguration. functional reconfiguration, and their integration. The paper examines conventional techniques, and gives an in-depth discussion about fault-tolerant reconfiguration and functional reconfiguration, presenting testing control strategy, configuration control strategy, steps required f4r each reconfiguration, and other relevant topics. The issue of integrating fault tolerant reconfiguration and functional reconfiguration has been addressed only recently. To tackle that problem, the paper identifies the relation between fault tolerant reconfiguration and functional reconfiguration, and discusses appropriate testing and configuration control strategy for integrated reconfiguration on VLSI and WSI cellular arrays.

  • PDF

Integration of Ba0.5Sr0.5TiO3Epitaxial Thin Films on Si Substrates and their Dielectric Properties (Si기판 위에 Ba0.5Sr0.5TiO3 산화물 에피 박막의 집적화 및 박막의 유전 특성에 관한 연구)

  • Kim, Eun-Mi;Moon, Jong-Ha;Lee, Won-Jae;Kim, Jin-Hyeok
    • Journal of the Korean Ceramic Society
    • /
    • v.43 no.6 s.289
    • /
    • pp.362-368
    • /
    • 2006
  • Epitaxial $Ba_{0.5}Sr_{0.5}TiO_3$ (BSTO) thin films have been grown on TiN buffered Si (001) substrates by Pulsed Laser Deposition (PLD) method and the effects of substrate temperature and oxygen partial pressure during the deposition on their dielectric properties and crystallinity were investigated. The crystal orientation, epitaxy nature, and microstructure of oxide thin films were investigated using X-Ray Diffraction (XRD) and Transmission Electron Microscopy (TEM). Thin films were prepared with laser fluence of $4.2\;J/cm^2\;and\;3\;J/cm^2$, repetition rate of 8 Hz and 10 Hz, substrate temperatures of $700^{\circ}C$ and ranging from $350^{\circ}C\;to\;700^{\circ}C$ for TiN and oxide respectively. BSTO thin-films were grown on TiN-buffered Si substrates at various oxygen partial pressure ranging from $1{\times}10^{-4}$ torr to $1{\times}10^{-5}$ torr. The TiN buffer layer and BSTO thin films were grown with cube-on-cube epitaxial orientation relationship of $[110](001)_{BSTO}{\parallel}[110](001)_{TiN}{\parallel}[110](001)_{Si}$. The crystallinity of BSTO thin films was improved with increasing substrate temperature. C-axis lattice parameters of BSTO thin films, calculated from XRD ${\theta}-2{\theta}$ scans, decreased from 0.408 m to 0.404 nm and the dielectric constants of BSTO epitaxial thin films increased from 440 to 938 with increasing processing oxygen partial pressure.

Variable Radix-Two Multibit Coding and Its VLSI Implementation of DCT/IDCT (가변길이 다중비트 코딩을 이용한 DCT/IDCT의 설계)

  • 김대원;최준림
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.12
    • /
    • pp.1062-1070
    • /
    • 2002
  • In this paper, variable radix-two multibit coding algorithm is presented and applied in the implementation of discrete cosine transform(DCT) and inverse discrete cosine transform(IDCT). Variable radix-two multibit coding means the 2k SD (signed digit) representation of overlapped multibit scanning with variable shift method. SD represented by 2k generates partial products, which can be easily implemented with shifters and adders. This algorithm is most powerful for the hardware implementation of DCT/IDCT with constant coefficient matrix multiplication. This paper introduces the suggested algorithm, it's proof and the implementation of DCT/IDCT The implemented IDCT chip with 8 PEs(Processing Elements) and one transpose memory runs at a tate of 400 Mpixels/sec at 54MHz frequency for high speed parallel signal processing, and it's verified in HDTV and MPEG decoder.

Embedding Multiple Meshes into a Twisted Cube (다중 메쉬의 꼬인 큐브에 대한 임베딩)

  • Kim, Sook-Yeon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.2
    • /
    • pp.61-65
    • /
    • 2010
  • The twisted cube has received great attention because it has several superior properties to the hypercube that is widely known as a versatile parallel processing system. In this paper, we show that node-disjoint $2^{n-1}$ meshes of size $2^n{\times}2^m$ can be embedded into a twisted cube with dilation 1 where $1{\leq}n{\leq}m$. The expansion is 1 for even m and 2 for odd m.

A Design and Implementation of a Java Parallel Processing System based on the WWW and Its Performance Improvement Schemes (WWW기반 자바 병렬 처리 시스템의 설계 및 구현과 성능 향상 기법)

  • 한연희;박찬열;정영식;황종선
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10a
    • /
    • pp.715-717
    • /
    • 1998
  • 인터넷이 급속도로 발전하여 이러한 환경에서 네트워크 연결된 여러 호스트들의 자원을 이용하는 시도가 활발하게 이루이지고 있다. 본 논문은 이러한 환경에서 의뢰인-병렬처리서버-작업자 구성을 이용하여, 작업자 애플릿을 임의의 호스트에 분산시키고, 대량의 연산 수행을 지닌 작업을 배분하여 수행시틴 뒤, 그 결과를 의뢰인에게 보여주는 WWW기반 자바 병렬 시스템의 설계 및 구현에 관하여 기술한다. 성능 향상을 위해서 자바의 원격 메소드 호출(Remote Method Invocation)을 이용한 애플릿간 통신 메커니즘을 구현하고, 작업자의 결과를 의뢰인에게 서버를 거치지 않고 곧바로 보내도록 한다. 또한 각 작업자마다의 성능비를 분석하여 태스크들을 할당하는 방법을 통해 작업 시간을 단축시킨다. 이 시스템에 연산 수행량이 많은 프랙탈 이미지 처리 작업을 배분하여 수행시키고, 작업 태스크의 크기에 따른 수행성능과 작업 배분방법에 따른 수행성능을 측정하여 그 결과를 비교, 제시한다.

  • PDF

A Low-Power LSI Design of Japanese Word Recognition System

  • Yoshizawa, Shingo;Miyanaga, Yoshikazu;Wada, Naoya;Yoshida, Norinobu
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.98-101
    • /
    • 2002
  • This paper reports a parallel architecture in a HMM based speech recognition system for a low-power LSI design. The proposed architecture calculates output probability of continuous HMM (CHMM) by using concurrent and pipeline processing. They enable to reduce memory access and have high computing efficiency. The novel point is the efficient use of register arrays that reduce memory access considerably compared with any conventional method. The implemented system can achieve a real time response with lower clock in a middle size vocabulary recognition task (100-1000 words) by using this technique.

  • PDF