• Title/Summary/Keyword: parallel computer processing

Search Result 652, Processing Time 0.027 seconds

Performance of Initial Timing Acquisition in the DS-UWB Systems with Different Transmit Pulse Shaping Filters (DS-UWB 시스템에서 송신 필터에 따른 초기 동기 획득 성능 비교)

  • Kang, Kyu-Min
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.20 no.5
    • /
    • pp.493-502
    • /
    • 2009
  • In this paper, we compare the performance of initial timing acquisition in direct sequence ultra-wideband(DS-UWB) systems with different transmit pulse shaping filters through extensive computer simulations. Simulation results show that the timing acquisition performance of the DS-UWB system, whose chip rate is 1.32 Gchip/s, employing a rectangular transmit filter is similar to that employing a square root raised cosine(SRRC) filter with an interpolation factor of 4 in the realistic UWB channels(CM1 and CM3) as well as the additive white Gaussian noise(AWGN) channel. Additionally, we present both a 24-parallel digital correlator structure and a 24-parallel processing searcher operating at a 55 MHz system clock, and then briefly discuss the initial timing acquisition procedure. Because we can adopt an 1.32 Gsample/s digital-to-analog(D/A) converter and an 1.32 Gsample/s analog-to-digital(AID) converter in the DS-UWB system by employing the rectangular transmit filter, we have a realistic solution for the DS-UWB chipset development.

Design of Web-based Parallel Processing System using Performance-based Task Allocation (성능 기반 태스크 할당을 이용한 웹 기반 병렬처리 시스템의 설계)

  • Han, Youn-Hee;Park, Chan-Yeol;Jeong, Young-Sik;Hwang, Chong-Sun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.3
    • /
    • pp.264-276
    • /
    • 2000
  • Recent advances of technologies make easy sharing various information and utilizing system resources on the Internet. Especially, code migration using applets of Java supports the distribution of programs on the web environment, and also browsers executing the applets guarantee the reliability of a migrated codes. In this paper, we describe the design and implementation of a web-based parallel processing system, which distributes migratable codes of a large job, makes the distributed codes to execute in parallel, and controls and gathers the results of each execution. The hosts participate in the computation reside on the Internet, spreaded out geographically, and the heterogeneity and the variability among them are severe. Thus, task allocation considering the performance differences and the adaptability to the severe variability are necessary. We present an adaptive task allocation algorithm applied to our system and the performance evaluation.

  • PDF

Implementation of Parallel Volume Rendering Using the Sequential Shear-Warp Algorithm (순차 Shear-Warp 알고리즘을 이용한 병렬볼륨렌더링의 구현)

  • Kim, Eung-Kon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1620-1632
    • /
    • 1998
  • This paper presents a fast parallel algorithm for volume rendering and its implementation using C language and MPI MasPar Programming Language) on the 4,096 processor MasPar MP-2 machine. This parallel algorithm is a parallelization hased on the Lacroute' s sequential shear - warp algorithm currently acknowledged to be the fastest sequential volume rendering algorithm. This algorithm reduces communication overheads by using the sheared space partition scheme and the load balancing technique using load estimates from the previous iteration, and the number of voxels to be processed by using the run-length encoded volume data structure.Actual performance is 3 to 4 frames/second on the human hrain scan dataset of $128\times128\times128$ voxels. Because of the scalability of this algorithm, performance of ]2-16 frames/sc.'cond is expected on the 16,384 processor MasPar MP-2 machine. It is expected that implementation on more current SIMD or MIMD architectures would provide 3O~60 frames/second on large volumes.

  • PDF

Effective Parallel Hash Join Algorithm Based on Histoftam Equalization in the Presence of Data Skew (데이터 편재 하에서 히스토그램 변환기법에 기초한 효율적인 병렬 해쉬 결합 알고리즘)

  • Park, Ung-Gyu;Choe, Hwang-Gyu;Kim, Tak-Gon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.2
    • /
    • pp.338-348
    • /
    • 1997
  • In this pater, we first propose a data distribution framework to resolve load imbalance and bucket oerflow in parallel hash join.Using the histogram equalization technique, the framework transforms a histogram of skewed data to the desired uniform distribution that corresponds to the relative computing power of node processors in the system.Next we propose an effcient parallel hash join algorithm for handing skwed data based on the proposed data distribution methodology.For performance comparison of our algorithm with other hash join algorithms.we perform similation experiments and actual exeution on COREDB database computer with 8-node hyperube architecture. In these experiments, skwed data distebution of the join atteibute is modeled using a Zipf-like distribution.The perfomance studies undicate that our algorithm outperforms other algorithms in the skewed cases.

  • PDF

VIA-Based PC Cluster System for Efficient Information Retrieval (효율적인 정보 검색을 위한 VIA 기반 PC 클러스터 시스템)

  • Kang, Na-Young;Chung, Sang-Hwa;Jang, Han-Kook
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.10
    • /
    • pp.539-549
    • /
    • 2002
  • PC cluster-based Information Retrieval (IR) systems improve their performances by parallel processing of query terms using cluster nodes. However TCP/IP based communication used to exchange data between cluster nodes prevents the performance from being improved further. The user-level communication mechanisms solve the problem by eliminating the time-consuming kernel access in exchanging data between cluster nodes. The Virtual Interface Architecture (VIA) is one of the representative user-level communication mechanisms which provide low latency and high bandwidth. In this paper, we propose a VIA-based parallel IR system on a PC cluster. The IR system is implemented using the following three communication methods: Sealable Coherent Interface (SCI) based VIA, MPI on SCI based VIA, MPI on Fast Ethernet based VIA. Through experiments, the performances of the three methods are analyzed in various aspects.

Performance Comparison of Particle Simulation Using GPU Between OpenGL and Unity (OpenGL과 Unity간의 GPU를 이용한 Particle Simulation의 성능 비교)

  • Kim, Min Sang;Sung, Nak-Jun;Choi, Yoo-Joo;Hong, Min
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.479-486
    • /
    • 2017
  • Recently, GPGPU has been able to increase the degradation of computer performance, and it is now possible to run physically based real-time simulations on PCs that require high computational complexity. Physical calculations applied in physics simulation can be performed by parallel processing, and can be efficiently performed using parallel computation using Compute shader recently supported by OpenGL 4.3 and Unity 4.0. In this paper, we measure and compare the number of performance in real - time physics simulation in OpenGL running on various platforms and Unity, a content creation tool supporting various platforms. Particle simulation experiments show that particle simulation using Unity performs faster than 136.04%. It is expected that it will be able to select better development tools for future multi - platform support.

Circuit Design of Parallel Power Operation Equipment for Peak Power Reduction (상전원의 피크치 전력 감소를 위한 전력병합장치 회로설계)

  • Yang, Jaesoo;Kim, Donghan;Kim, ManDo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.9
    • /
    • pp.273-278
    • /
    • 2014
  • Recent use of electricity during peak hours electricity supply-demand imbalance is inevitable that limit power use force. Therefore, in this paper, a circuit of parallel power operation equipment for peak power reduction which saves the power to electricity storage device during the non-peak power time and supply from the storage power during the expected power shortages time is designed Through this circuitry, the peak power of the commercial power supply with the parallel operation and connection of the commercial power supply and the power supply of the inverter from electricity storage that is a key feature of PRS(Peak power Reduction System) can be controlled. In addition, in order to increase the efficiency, a Transless Power Circuit DC-AC inverter is developed. Moreover, a variable impedance control is applied to the storage of electric power of an Uninterruptible Power Supply associated with a commercial power source.

A Study on the Pixel-Paralled Image Processing System for Image Smoothing (영상 평활화를 위한 화소-병렬 영상처리 시스템에 관한 연구)

  • Kim, Hyun-Gi;Yi, Cheon-Hee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.11
    • /
    • pp.24-32
    • /
    • 2002
  • In this paper we implemented various image processing filtering using the format converter. This design method is based on realized the large processor-per-pixel array by integrated circuit technology. These two types of integrated structure are can be classify associative parallel processor and parallel process DRAM(or SRAM) cell. Layout pitch of one-bit-wide logic is identical memory cell pitch to array high density PEs in integrate structure. This format converter design has control path implementation efficiently, and can be utilize the high technology without complicated controller hardware. Sequence of array instruction are generated by host computer before process start, and instructions are saved on unit controller. Host computer is executed the pixel-parallel operation starting at saved instructions after processing start. As a result, we obtained three result that 1)simple smoothing suppresses higher spatial frequencies, reducing noise but also blurring edges, 2) a smoothing and segmentation process reduces noise while preserving sharp edges, and 3) median filtering, like smoothing and segmentation, may be applied to reduce image noise. Median filtering eliminates spikes while maintaining sharp edges and preserving monotonic variations in pixel values.

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.7
    • /
    • pp.305-317
    • /
    • 2008
  • As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.

Design of a High Throughput Parallel Turbo Decoder (고 처리율 병렬 터보 복호기 설계)

  • Lee, Won-Ho;Park, Heemin;Rim, Chong S.
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.11
    • /
    • pp.50-57
    • /
    • 2013
  • This paper provides a design of high-throughput parallel turbo decoder that is able to decode several packets of various length simultaneously. For high-speed communications, designing of Turbo decoder as parallel structures reduces the long decoding time caused by iterative turbo decode way. Also, by employing the double buffer structure for input and output packets improves the decoder throughput by enabling continuous decoding. Because parallel turbo decoder is designed to be able to decode the packet of the longest length, there exist idle PE's(Processing Element) in the case of decoding packets of short length. The main idea of this paper is to increase the utilization of PE's in parallel Turbo decoder and to improve the decoder throughput by using the idle PE's immediately for the subsequent packets decoding. For this, the control is necessary to enable the concurrent decoding of several short packets and we propose the method of this control. Applying the proposed method, we implemented Turbo Decoder with 32 PE's that can decode packets of 6144 bits maximum. Compared to the conventional Turbo decoder, although the area was increased about 16%, the decoder throughput was improved 28 times for short packets.