• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.03 seconds

Hardware Implementation of Minimized Serial-Divider for Image Frame-Unit Processing in Mobile Phone Camera. (Mobile Phone Camera의 이미지 프레임 단위 처리를 위한 소형화된 Serial-Divider의 하드웨어 구현)

  • Kim, Kyung-Rin;Lee, Sung-Jin;Kim, Hyun-Soo;Kim, Kang-Joo;Kang, Bong-Soon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.119-122
    • /
    • 2007
  • In this paper, we propose the method of hardware-design for the division operation of image frame-unit processing in mobile phone camera. Generally, there are two types of the data processing, which are the parallel and serial type. The parallel type makes it possible to process in realtime, but it needs significant hardware size due to many comparators and buffer memories. Compare the serial type with the parallel type, the hardware size of the serial type is smaller than the other because it uses only one comparator, but serial type is not able to process in realtime. To use the hardware resources efficiently, we employ the serial divider since frame-unit operation for image processing does not need realtime process. When compared with both in the same bit size and operating frequency, the hardware size of the serial divider is approximately in the ratio of 13 percentage compared with the parallel divider.

  • PDF

Implementation of Parallel Processor for Sound Synthesis of Guitar (기타의 음 합성을 위한 병렬 프로세서 구현)

  • Choi, Ji-Won;Kim, Yong-Min;Cho, Sang-Jin;Kim, Jong-Myon;Chong, Ui-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.191-199
    • /
    • 2010
  • Physical modeling is a synthesis method of high quality sound which is similar to real sound for musical instruments. However, since physical modeling requires a lot of parameters to synthesize sound of a musical instrument, it prevents real-time processing for the musical instrument which supports a large number of sounds simultaneously. To solve this problem, this paper proposes a single instruction multiple data (SIMD) parallel processor that supports real-time processing of sound synthesis of guitar, a representative plucked string musical instrument. To control six strings of guitar, we used a SIMD parallel processor which consists of six processing elements (PEs). Each PE supports modeling of the corresponding string. The proposed SIMD processor can generate synthesized sounds of six strings simultaneously when a parallel synthesis algorithm receives excitation signals and parameters of each string as an input. Experimental results using a sampling rate 44.1 kHz and 16 bits quantization indicate that synthesis sounds using the proposed parallel processor were very similar to original sound. In addition, the proposed parallel processor outperforms commercial TI's TMS320C6416 in terms of execution time (8.9x better) and energy efficiency (39.8x better).

Parallelization of Genome Sequence Data Pre-Processing on Big Data and HPC Framework (빅데이터 및 고성능컴퓨팅 프레임워크를 활용한 유전체 데이터 전처리 과정의 병렬화)

  • Byun, Eun-Kyu;Kwak, Jae-Hyuck;Mun, Jihyeob
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.10
    • /
    • pp.231-238
    • /
    • 2019
  • Analyzing next-generation genome sequencing data in a conventional way using single server may take several tens of hours depending on the data size. However, in order to cope with emergency situations where the results need to be known within a few hours, it is required to improve the performance of a single genome analysis. In this paper, we propose a parallelized method for pre-processing genome sequence data which can reduce the analysis time by utilizing the big data technology and the highperformance computing cluster which is connected to the high-speed network and shares the parallel file system. For the reliability of analytical data, we have chosen a strategy to parallelize the existing analytical tools and algorithms to the new environment. Parallelized processing, data distribution, and parallel merging techniques have been developed and performance improvements have been confirmed through experiments.

Efficient Data Management for Finite Element Analysis with Pre-Post Processing of Large Structures (전-후 처리 과정을 포함한 거대 구조물의 유한요소 해석을 위한 효율적 데이터 구조)

  • 박시형;박진우;윤태호;김승조
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 2004.04a
    • /
    • pp.389-395
    • /
    • 2004
  • We consider the interface between the parallel distributed memory multifrontal solver and the finite element method. We give in detail the requirement and the data structure of parallel FEM interface which includes the element data and the node array. The full procedures of solving a large scale structural problem are assumed to have pre-post processors, of which algorithm is not considered in this paper. The main advantage of implementing the parallel FEM interface is shown up in the case that we use a distributed memory system with a large number of processors to solve a very large scale problem. The memory efficiency and the performance effect are examined by analyzing some examples on the Pegasus cluster system.

  • PDF

Processing parallel-disk viscometry data in the presence of wall slip

  • Leong, Yee-Kwong;Campbell, Graeme R.;Yeow, Y. Leong;Withers, John W.
    • Korea-Australia Rheology Journal
    • /
    • v.20 no.2
    • /
    • pp.51-58
    • /
    • 2008
  • This paper describes a two-step Tikhonov regularization procedure for converting the steady shear data generated by parallel-disk viscometers, in the presence of wall slip, into a shear stress-shear rate function and a wall shear stress-slip velocity functions. If the material under test has a yield stress or a critical wall shear stress below which no slip is observed the method will also provide an estimate of these stresses. Amplification of measurement noise is kept under control by the introduction of two separate regularization parameters and Generalized Cross Validation is used to guide the selection of these parameters. The performance of this procedure is demonstrated by applying it to the parallel disk data of an oil-in-water emulsion, of a foam and of a mayonnaise.

Development of Big-data Management Platform Considering Docker Based Real Time Data Connecting and Processing Environments (도커 기반의 실시간 데이터 연계 및 처리 환경을 고려한 빅데이터 관리 플랫폼 개발)

  • Kim, Dong Gil;Park, Yong-Soon;Chung, Tae-Yun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.16 no.4
    • /
    • pp.153-161
    • /
    • 2021
  • Real-time access is required to handle continuous and unstructured data and should be flexible in management under dynamic state. Platform can be built to allow data collection, storage, and processing from local-server or multi-server. Although the former centralize method is easy to control, it creates an overload problem because it proceeds all the processing in one unit, and the latter distributed method performs parallel processing, so it is fast to respond and can easily scale system capacity, but the design is complex. This paper provides data collection and processing on one platform to derive significant insights from various data held by an enterprise or agency in the latter manner, which is intuitively available on dashboards and utilizes Spark to improve distributed processing performance. All service utilize dockers to distribute and management. The data used in this study was 100% collected from Kafka, showing that when the file size is 4.4 gigabytes, the data processing speed in spark cluster mode is 2 minute 15 seconds, about 3 minutes 19 seconds faster than the local mode.

Eager Data Transfer Mechanism for Reducing Communication Latency in User-Level Network Protocols

  • Won, Chul-Ho;Lee, Ben;Park, Kyoung;Kim, Myung-Joon
    • Journal of Information Processing Systems
    • /
    • v.4 no.4
    • /
    • pp.133-144
    • /
    • 2008
  • Clusters have become a popular alternative for building high-performance parallel computing systems. Today's high-performance system area network (SAN) protocols such as VIA and IBA significantly reduce user-to-user communication latency by implementing protocol stacks outside of operating system kernel. However, emerging parallel applications require a significant improvement in communication latency. Since the time required for transferring data between host memory and network interface (NI) make up a large portion of overall communication latency, the reduction of data transfer time is crucial for achieving low-latency communication. In this paper, Eager Data Transfer (EDT) mechanism is proposed to reduce the time for data transfers between the host and network interface. The EDT employs cache coherence interface hardware to directly transfer data between the host and NI. An EDT-based network interface was modeled and simulated on the Linux-based, complete system simulation environment, Linux/SimOS. Our simulation results show that the EDT approach significantly reduces the data transfer time compared to DMA-based approaches. The EDTbased NI attains 17% to 38% reduction in user-to-user message time compared to the cache-coherent DMA-based NIs for a range of message sizes (64 bytes${\sim}$4 Kbytes) in a SAN environment.

The Parallel ANN(Artificial Neural Network) Simulator using Mobile Agent (이동 에이전트를 이용한 병렬 인공신경망 시뮬레이터)

  • Cho, Yong-Man;Kang, Tae-Won
    • The KIPS Transactions:PartB
    • /
    • v.13B no.6 s.109
    • /
    • pp.615-624
    • /
    • 2006
  • The objective of this paper is to implement parallel multi-layer ANN(Artificial Neural Network) simulator based on the mobile agent system which is executed in parallel in the virtual parallel distributed computing environment. The Multi-Layer Neural Network is classified by training session, training data layer, node, md weight in the parallelization-level. In this study, We have developed and evaluated the simulator with which it is feasible to parallel the ANN in the training session and training data parallelization because these have relatively few network traffic. In this results, we have verified that the performance of parallelization is high about 3.3 times in the training session and training data. The great significance of this paper is that the performance of ANN's execution on virtual parallel computer is similar to that of ANN's execution on existing super-computer. Therefore, we think that the virtual parallel computer can be considerably helpful in developing the neural network because it decreases the training time which needs extra-time.

Fuzzy Inference of Large Volumes in Parallel Computing Environments (병렬컴퓨팅 환경에서의 대용량 퍼지 추론)

  • 김진일;이상구
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.10 no.4
    • /
    • pp.293-298
    • /
    • 2000
  • In fuzzy expert systems or database systems that have volumes of fuzzy data or large fuzzy rules, the inference time is much increased. Therefore, a high performance parallel fuzzy computing environment is needed. In this paper, we propose a parallel fuzzy inference mechanism in parallel computing environments. In this, fuzzy rules are distributed and executed simultaneously. The ONE_TO_ALL algorithm is used to broadcast the fuzzy input input vector to the all nodes. The results of the MIN/MAX operations are transferred to the output processor by the ALL_TO_ONE algorithm. By parallel processing of fuzzy or data, the parallel fuzzy inference algortihm extracts effective and achieves and achieves a good speed factor.

  • PDF

Parallel Approximate String Matching with k-Mismatches for Multiple Fixed-Length Patterns in DNA Sequences on Graphics Processing Units (GPU을 이용한 다중 고정 길이 패턴을 갖는 DNA 시퀀스에 대한 k-Mismatches에 의한 근사적 병열 스트링 매칭)

  • Ho, ThienLuan;Kim, HyunJin;Oh, SeungRohk
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.6
    • /
    • pp.955-961
    • /
    • 2017
  • In this paper, we propose a parallel approximate string matching algorithm with k-mismatches for multiple fixed-length patterns (PMASM) in DNA sequences. PMASM is developed from parallel single pattern approximate string matching algorithms to effectively calculate the Hamming distances for multiple patterns with a fixed-length. In the preprocessing phase of PMASM, all target patterns are binary encoded and stored into a look-up memory. With each input character from the input string, the Hamming distances between a substring and all patterns can be updated at the same time based on the binary encoding information in the look-up memory. Moreover, PMASM adopts graphics processing units (GPUs) to process the data computations in parallel. This paper presents three kinds of PMASM implementation methods in GPUs: thread PMASM, block-thread PMASM, and shared-mem PMASM methods. The shared-mem PMASM method gives an example to effectively make use of the GPU parallel capacity. Moreover, it also exploits special features of the CUDA (Compute Unified Device Architecture) memory structure to optimize the performance. In the experiments with DNA sequences, the proposed PMASM on GPU is 385, 77, and 64 times faster than the traditional naive algorithm, the shift-add algorithm and the single thread PMASM implementation on CPU. With the same NVIDIA GPU model, the performance of the proposed approach is enhanced up to 44% and 21%, compared with the naive, and the shift-add algorithms.