• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.027 seconds

FPGA-Based Low-Power and Low-Cost Portable Beamformer Design (FPGA 기반 저전력 및 저비용 휴대용 빔포머 설계)

  • Jeong, GabJoong;Park, CheolYoung
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.24 no.1
    • /
    • pp.31-38
    • /
    • 2019
  • In this paper, we develop a beamforming front end platform with pipeline circuit configuration method that can apply various clinical diagnostic applications of ultrasound image technology. Hardware design targets compression applications as well as scalable applications where power, integration levels and replication possibilities are important. Firmware design was implemented to achieve optimal FPGA parallel processing level by constructing new IP and system-oriented design environment to accelerate design productivity with maximum productivity improvement using Vivado HLS tool, which is a next generation high level synthesis tool. Former supports the high-speed management function of scan data that can create an image area arbitrarily and can be appropriately corrected and supplemented when reconfiguring or changing system specifications in the future.

NoSQL-based Sensor Web System for Fine Particles Analysis Services (미세먼지 분석 서비스를 위한 NoSQL 기반 센서 웹 시스템)

  • Kim, Jeong-Joon;Kwak, Kwang-Jin;Park, Jeong-Min
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.2
    • /
    • pp.119-125
    • /
    • 2019
  • Recently, it has become a social problem due to fine particles. There are more people wearing masks, weather alerts and disaster notices. Research and policy are actively underway. Meteorologically, the biggest damage caused by fine particles is the inversion layer phenomenon. In this study, we designed a system to warn fine Particles by analyzing inversion layer and wind direction. This weather information system proposes a system that can efficiently perform scalability and parallel processing by using OGC sensor web enablement system and NoSQL storage for sensor control and data exchange.

A Study on Machine Learning Compiler and Modulo Scheduler (머신러닝 컴파일러와 모듈로 스케쥴러에 관한 연구)

  • Doosan Cho
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.1
    • /
    • pp.87-95
    • /
    • 2024
  • This study is on modulo scheduling algorithms for multicore processor in machine learning applications. Machine learning algorithms are designed to perform a large amount of operations such as vectors and matrices in order to quickly process large amounts of data stream. To support such large amounts of computations, processor architectures to support applications such as artificial intelligence, neural networks, and machine learning are designed in the form of parallel processing such as multicore. To effectively utilize these multi-core hardware resources, various compiler techniques are being used and studied. In this study, among these compiler techniques, we analyzed the modular scheduler, which is especially important in one core's computation pipeline. This paper looked at and compared the iterative modular scheduler and the swing modular scheduler, which are the most widely used and studied. As a result, both schedulers provided similar performance results, and when measuring register pressure as an indicator, it was confirmed that the swing modulo scheduler provided slightly better performance. In this study, a technique that divides recurrence edge is proposed to improve the minimum initiation interval of the modulo schedulers.

A Design of Fractional Motion Estimation Engine with 4×4 Block Unit of Interpolator & SAD Tree for 8K UHD H.264/AVC Encoder (8K UHD(7680×4320) H.264/AVC 부호화기를 위한 4×4블럭단위 보간 필터 및 SAD트리 기반 부화소 움직임 추정 엔진 설계)

  • Lee, Kyung-Ho;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.6
    • /
    • pp.145-155
    • /
    • 2013
  • In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Fractional Motion Estimation in 8K UHD($7680{\times}4320$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $10{\times}10$ reference data for interpolation, we design 2D cache buffer which consists of the $10{\times}10$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The gate count is 436.5Kgates. The proposed H.264/AVC Fractional Motion Estimation can support 8K UHD at 30 frames per second by running at 187MHz.

Livestock Disease Forecasting and Smart Livestock Farm Integrated Control System based on Cloud Computing (클라우드 컴퓨팅기반 가축 질병 예찰 및 스마트 축사 통합 관제 시스템)

  • Jung, Ji-sung;Lee, Meong-hun;Park, Jong-kweon
    • Smart Media Journal
    • /
    • v.8 no.3
    • /
    • pp.88-94
    • /
    • 2019
  • Livestock disease is a very important issue in the livestock industry because if livestock disease is not responded quickly enough, its damage can be devastating. To solve the issues involving the occurrence of livestock disease, it is necessary to diagnose in advance the status of livestock disease and develop systematic and scientific livestock feeding technologies. However, there is a lack of domestic studies on such technologies in Korea. This paper, therefore, proposes Livestock Disease Forecasting and Livestock Farm Integrated Control System using Cloud Computing to quickly manage livestock disease. The proposed system collects a variety of livestock data from wireless sensor networks and application. Moreover, it saves and manages the data with the use of the column-oriented database Hadoop HBase, a column-oriented database management system. This provides livestock disease forecasting and livestock farm integrated controlling service through MapReduce Model-based parallel data processing. Lastly, it also provides REST-based web service so that users can receive the service on various platforms, such as PCs or mobile devices.

Generating Local Addresses for Block-Cyclic Distributed Array (블록-순환으로 분배된 배열의 지역 주소 생성)

  • Kwon, Oh-Young;Kim, Tae-Geun;Han, Tack-Don;Yang, Sung-Bong;Kim, Shin-Dug
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.11
    • /
    • pp.2835-2844
    • /
    • 1998
  • Most data parallel languages provide the block-cyclic distribution (cyclic(k)) that is one of the most general regular distributions. In order to generate local addresses for an array section A(l:h:s) with block-cyclic distribution, efficient compiling methods or run-time methods are required. In this paper, two local address generation methods for the block-cyclic distribution are presented. One is a simple scan method that is modified from the virtual-block scheme. The other is a linear-time ${\Delta}M$ table that contains the local memory access information construction method. This method is simpler than other algorithms for generating a ${\Delta}M$ table. Experimental results show that a simple that a simple scan method has poor performance but a linear-time ${\Delta}M$ table generation method is faster than other algorithms in ${\Delta}M$ table generation time and access time for 10,000 array elements.

  • PDF

Efficient Workload Distribution of Photomosaic Using OpenCL into a Heterogeneous Computing Environment (이기종 컴퓨팅 환경에서 OpenCL을 사용한 포토모자이크 응용의 효율적인 작업부하 분배)

  • Kim, Heegon;Sa, Jaewon;Choi, Dongwhee;Kim, Haelyeon;Lee, Sungju;Chung, Yongwha;Park, Daihee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.8
    • /
    • pp.245-252
    • /
    • 2015
  • Recently, parallel processing methods with accelerator have been introduced into a high performance computing and a mobile computing. The photomosaic application can be parallelized by using inherent data parallelism and accelerator. In this paper, we propose a way to distribute the workload of the photomosaic application into a CPU and GPU heterogeneous computing environment. That is, the photomosaic application is parallelized using both CPU and GPU resource with the asynchronous mode of OpenCL, and then the optimal workload distribution rate is estimated by measuring the execution time with CPU-only and GPU-only distribution rates. The proposed approach is simple but very effective, and can be applied to parallelize other applications on a CPU and GPU heterogeneous computing environment. Based on the experimental results, we confirm that the performance is improved by 141% into a heterogeneous computing environment with the optimal workload distribution compared with using GPU-only method.

DESIGN OF COMPACT PARTICLE DETECTOR SYSTEM USING FPGA FOR SPACE PARTICLE ENVIRONMENT MEASUREMENT (FPGA를 이용한 우주 입자환경 관측용 초소형 입자검출기 시스템 설계)

  • Ryu, K.;Oh, D.S.;Kim, S.J.;Kim, H.J.;Lee, J.J.;Shin, G.H.;Ko, D.H.;Min, K.W.;Hwang, J.A.
    • Journal of Astronomy and Space Sciences
    • /
    • v.24 no.2
    • /
    • pp.155-166
    • /
    • 2007
  • We have designed a high resolution proton and electron telescope for the detection of high energy particles, which constitute a major part of the space environment. The flux of the particles, in the satellite orbits, can vary abruptly according to the position and solar activities. In this study, a conceptual design of the detector, for adapting these variations with a high energy resolution, was made and the performance was estimated. In addition, a parallel processing algorithm was devised and embodied using FPGA for the high speed data processing, capable of detecting high flux without losing energy resolution, on board a satellite.

VLSI Implementation of Low-Power Motion Estimation Using Reduced Memory Accesses and Computations (메모리 호출과 연산횟수 감소기법을 이용한 저전력 움직임추정 VLSI 구현)

  • Moon, Ji-Kyung;Kim, Nam-Sub;Kim, Jin-Sang;Cho, Won-Kyung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.5A
    • /
    • pp.503-509
    • /
    • 2007
  • Low-power motion estimation is required for video coding in portable information devices. In this paper, we propose a low-power motion estimation algorithm and 1-D systolic may VLSI architecture using full search block matching algorithm (FSBMA). Main power dissipation sources of FSBMA are complex computations and frequent memory accesses for data in the search area. In the proposed algorithm, memory accesses and computations are reduced by using 1D PE (processing array) array architecture performing motion estimation of two neighboring blocks in parallel and by skipping unnecessary computations during motion estimation. The VLSI implementation results of the algorithm show that the proposed VLSI architecture can save 9.3% power dissipation and can operate two times faster than an existing low-power motion estimator.

Fast View Synthesis Using GPGPU (GPGPU를 이용한 고속 영상 합성 기법)

  • Shin, Hong-Chang;Park, Han-Hoon;Park, Jong-Il
    • Journal of Broadcast Engineering
    • /
    • v.13 no.6
    • /
    • pp.859-874
    • /
    • 2008
  • In this paper, we develop a fast view synthesis method that generates multiple intermediate views in real-time for the 3D display system when the camera geometry and depth map of reference views are given in advance. The proposed method achieves faster view synthesis than previous approaches in GPU by processing in parallel the entire computations required for the view synthesis. Specifically, we use $CUDA^{TM}$ (by NVIDIA) to control GPU device. For increasing the processing speed, we adapted all the processes for the view synthesis to single instruction multiple data (SIMD) structure that is a main feature of CUDA, maximized the use of the high-speed memories on GPU device, and optimized the implementation. As a result, we could synthesize 9 intermediate view images with the size of 720 by 480 pixels within 0.128 second.