• Title/Summary/Keyword: and Parallel Processing

Search Result 2,013, Processing Time 0.03 seconds

A Pipelined Hash Join Method for Load Balancing (부하 균형 유지를 고려한 파이프라인 해시 조인 방법)

  • Moon, Jin-Gue;Park, No-Sang;Kim, Pyeong-Jung;Jin, Seong-Il
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.755-768
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new hash join methods with load balancing capabilities. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets adaptively via a frequency distribution. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join processing without staying on disks. Unless the pipelining execution of multiple hash joins includes some load balancing mechanisms, the skew effect can severely deteriorate system performance. In this paper, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

A Study on the Digital Filter Design using Software for Analysis of Observation Data in Radio Astronomy (전파천문 관측데이터 분석을 위해 소프트웨어를 이용한 디지털필터 설계에 관한 연구)

  • Yeom, Jae-Hwan;Oh, Se-Jin;Roh, Duk-Gyoo;Oh, Chung-Sik;Jung, Dong-Kyu;Shin, Jae-Sik;Kim, Hyo-Ryoung;Hwang, Ju-Yeon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.16 no.4
    • /
    • pp.175-181
    • /
    • 2015
  • In this paper, we propose a design method for a digital filter using software in order to analyze the radio astronomy observation data. Recently the analysis method for radio astronomy observing system is transferring from hardware to software by developing of state-of-the-art of computer system. The existing hardware system is not able to easily change the specification because it is implemented to meet special requirements and it takes a high cost and time. In case of software, however, it has an advantage to implement with small cost if open software is used, and flexibly changes to satisfy the desired specification. But, in order to analyze the massive data like radio astronomy with software, the good performance system is needed for computer. Therefore, this paper proposes a digital filter design method using software with the same performance as that of digital filter implemented with hardware in observation system which is operated by the KVN(Korean VLBI Network). To design a digital filter, the proposed method is performed with standard C language and the simulation is conducted with GNU(GNU's Not Unix) Octave and investigated to show its effectiveness. In addition, for the high speed operation of the designed digital filter, the SSE(Streaming SIMD Extensions) library is adopted for available parallel operation. By the proposed digital filter, the digital filtering is performed for the wide band observation data in the KVN observation mode, the filtering result of narrow band observation has no ripple inside of stop band, and confirmed the effectiveness of the proposed method.

Efficient Multiple Joins using the Synchronization of Page Execution Time in Limited Processors Environments (한정된 프로세서 환경에서 체이지 실행시간 동기화를 이용한 효율적인 다중 결합)

  • Lee, Kyu-Ock;Weon, Young-Sun;Hong, Man-Pyo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.732-741
    • /
    • 2001
  • In the relational database systems the join operation is one of the most time-consuming query operations. Many parallel join algorithms have been developed 개 reduce the execution time Multiple hash join algorithm using allocation tree is one of the most efficient ones. However, it may have some delay on the processing each node of allocation tree, which is occurred in tuple-probing phase by the difference between one page reading time of outer relation and the processing time of already read one. This delay problem was solved by using the concept of synchronization of page execution time with we had proposed In this paper the effects of the performance improvements in each node of the allocation tree are extended to the whole allocation tree and the performance evaluation about that is processed. In addition we propose an efficient algorithm for multiple hash joins in limited number of processor environments according to the relationship between the number of input relations in the allocation tree and the number of processors allocated to the tree. Finally. we analyze the performance by building the analytical cost model and verify the validity of it by various performance comparison with previous method.

  • PDF

The Hardware Design of Effective Deblocking Filter for HEVC Encoder (HEVC 부호기를 위한 효율적인 디블록킹 하드웨어 설계)

  • Park, Jae-Ha;Park, Seung-yong;Ryoo, Kwang-ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.755-758
    • /
    • 2014
  • In this paper, we propose effective Deblocking Filter hardware architecture for High Efficiency Video Coding encoder. we propose Deblocking Filter hardware architecture with less processing time, filter ordering for low area design, effective memory architecture and four-pipeline for a high performance HEVC(High Efficiency Video Coding) encoder. Proposed filter ordering can be used to reduce delay according to preprocessing. It can be used for realtime single-port SRAM read and write. it can be used in parallel processing by using two filters. Using 10 memory is effective for solving the hazard caused by a single-port SRAM. Also the proposed filter can be used in low-voltage design by using clock gating architecture in 4-pipeline. The proposed Deblocking Filter encoder architecture is designed by Verilog HDL, and implemented by 100k logic gates in TSMC $0.18{\mu}m$ process. At 150MHz, the proposed Deblocking Filter encoder can support 4K Ultra HD video encoding at 30fps, and can be operated at a maximum speed of 200MHz.

  • PDF

Design and Performance Evaluation of Expansion Buffer Cache (확장 버퍼 캐쉬의 설계 및 성능 평가)

  • Hong Won-Kee
    • The KIPS Transactions:PartA
    • /
    • v.11A no.7 s.91
    • /
    • pp.489-498
    • /
    • 2004
  • VLIW processor is considered to be an appropriate processor for the embedded system, provided with high performance and low power con-sumption due to its simple hardware structure. Unfortunately, the VLIW processor often suffers from high memory access latency due to the variable length of I-packets, which consist of independent instructions to be issued in parallel. It is because of the variable I-packet length that some I-packets must be placed over two cache blocks, which are called straddle I-packets, so that two cache accesses are required to fetch such I-packets. In this paper, an expansion buffer cache is proposed to improve not only the instruction fetch bandwidth, but also the power consumption of the I-cache with moderate hardware cost. The expansion buffer cache has a small expansion buffer containing a fraction of a straddle packet along with the main cache to reduce the additional cache accesses due to the straddle I-packets. With a great reduction in the cache accesses due to the straddle packets, the expansion buffer cache can achieve $5{\~}9{\%}$improvement over the conventional I-caches in the $Delay{\cdot}Power{\cdot}Area$ metric.

Performance Evaluation of Scheduling Algorithms according to Communication Cost in the Grid System of Co-allocation Environment (Co-allocation 환경의 그리드 시스템에서 통신비용에 따른 스케줄링 알고리즘의 성능 분석)

  • Kang, Oh-Han;Kang, Sang-Seong;Kim, Jin-Suk
    • The KIPS Transactions:PartA
    • /
    • v.14A no.2
    • /
    • pp.99-106
    • /
    • 2007
  • Grid computing, a mechanism which uses heterogeneous systems that are geographically distributed, draws attention as a new paradigm for the next generation operation of parallel and distributed computing. The importance of grid computing concerning communication cost is very huge because grid computing furnishes uses with integrated virtual computing service, in which a number of computer systems are connected by a high-speed network. Therefore, to reduce the execution time, the scheduling algorithm in grid environment should take communication cost into consideration as well as computing ability of resources. However, most scheduling algorithms have not only ignored the communication cost by assuming that all tasks were dealt in one cluster, but also did not consider the overhead of communication cost when the tasks were processed in a number of clusters. In this paper, the functions of original scheduling algorithms are analyzed. More importantly, the functions of algorithms are compared and analyzed with consideration of communication cost within the co allocation environment, in which a task is performed separately in many clusters.

Implementation of a GPU Cluster System using Inexpensive Graphics Devices (저가의 그래픽스 장치를 이용한 GPU 클러스터 시스템 구현)

  • Lee, Jong-Min;Lee, Jung-Hwa;Kim, Seong-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.11
    • /
    • pp.1458-1466
    • /
    • 2011
  • Recently the research on GPGPU has been carried out actively as the performance of GPUs has been increased rapidly. In this paper, we propose the system architecture by benchmarking the existing supercomputer architecture for a cost-effective system using GPUs in low-cost graphics devices and implement a GPU cluster system with eight GPUs. We also make the software development environment that is suitable for the GPU cluster system and use it for the performance evaluation by implementing the n-body problem. According to its result, we found that it is efficient to use multiple GPUs when the problem size is large due to its communication cost. In addition, we could calculate up to eight million celestial bodies by applying the method of calculating block by block to mitigate the problem size constraint due to the limited resource in GPUs.

All-port Broadcasting Algorithms on Wormhole Routed Star Graph Networks (웜홀 라우팅을 지원하는 스타그래프 네트워크에서 전 포트 브로드캐스팅 알고리즘)

  • Kim, Cha-Young;Lee, Sang-Kyu;Lee, Ju-Young
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.2
    • /
    • pp.65-74
    • /
    • 2002
  • Recently star networks are considered as attractive alternatives to the widely used hypercube for interconnection networks in parallel processing systems by many researchers. One of the fundamental communication problems on star graph networks is broadcasing In this paper we consider the broadcasting problems in star graph networks using wormhole routing. In wormhole routed system minimizing link contention is more critical for the system performance than the distance between two communicating nodes. We use Hamiltonian paths in star graph to set up link-disjoint communication paths We present a broadcast algorithm in n-dimensional star graph of N(=n!) nodes such that the total completion time is no larger than $([long_n n!]+1)$ steps where $([long_n n!]+1)$ is the lower bound This result is significant improvement over the previous n-1 step broadcasting algorithm.

Deep Learning-based Real-Time Super-Resolution Architecture Design (경량화된 딥러닝 구조를 이용한 실시간 초고해상도 영상 생성 기술)

  • Ahn, Saehyun;Kang, Suk-Ju
    • Journal of Broadcast Engineering
    • /
    • v.26 no.2
    • /
    • pp.167-174
    • /
    • 2021
  • Recently, deep learning technology is widely used in various computer vision applications, such as object recognition, classification, and image generation. In particular, the deep learning-based super-resolution has been gaining significant performance improvement. Fast super-resolution convolutional neural network (FSRCNN) is a well-known model as a deep learning-based super-resolution algorithm that output image is generated by a deconvolutional layer. In this paper, we propose an FPGA-based convolutional neural networks accelerator that considers parallel computing efficiency. In addition, the proposed method proposes Optimal-FSRCNN, which is modified the structure of FSRCNN. The number of multipliers is compressed by 3.47 times compared to FSRCNN. Moreover, PSNR has similar performance to FSRCNN. We developed a real-time image processing technology that implements on FPGA.

Hybrid Transactional Memory using Sampling-based Retry Policy in Multi-Core Environment (멀티코어 환경에서 샘플링 기반 재시도 정책을 이용한 하이브리드 트랜잭셔널 메모리)

  • Kang, Moon-Hwan;Jang, Yeon-Woo;Yoon, Min;Chang, Jae-Woo
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.2
    • /
    • pp.49-61
    • /
    • 2017
  • Transactional Memory (TM) has greatly changed the parallel programming paradigm for transaction processing and is classified into STM, HTM, HyTM according to hardware or software frameworks. However, the existing studies have a problem that they provide static retry policy for all workloads. To solve the problems, we propose an hybrid transactional memory scheme using sampling-based adaptive retry policy in multi-core environment. First, the proposed scheme determines whether to use STM or HTM according to the characteristic of a transaction. Otherwise, it executes HTM and STM concurrently by using a bloom filter. Second, the proposed scheme provides adaptive retry policy for HTM according to the characteristic of transactions in each workload. Finally, through the experimental performance evaluation using STAMP, the proposed scheme shows 10~20% better performance than the existing schemes.