• Title/Summary/Keyword: Parallel data processing

Search Result 751, Processing Time 0.036 seconds

Spark Framework Based on a Heterogenous Pipeline Computing with OpenCL (OpenCL을 활용한 이기종 파이프라인 컴퓨팅 기반 Spark 프레임워크)

  • Kim, Daehee;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.2
    • /
    • pp.270-276
    • /
    • 2018
  • Apache Spark is one of the high performance in-memory computing frameworks for big-data processing. Recently, to improve the performance, general-purpose computing on graphics processing unit(GPGPU) is adapted to Apache Spark framework. Previous Spark-GPGPU frameworks focus on overcoming the difficulty of an implementation resulting from the difference between the computation environment of GPGPU and Spark framework. In this paper, we propose a Spark framework based on a heterogenous pipeline computing with OpenCL to further improve the performance. The proposed framework overlaps the Java-to-Native memory copies of CPU with CPU-GPU communications(DMA) and GPU kernel computations to hide the CPU idle time. Also, CPU-GPU communication buffers are implemented with switching dual buffers, which reduce the mapped memory region resulting in decreasing memory mapping overhead. Experimental results showed that the proposed Spark framework based on a heterogenous pipeline computing with OpenCL had up to 2.13 times faster than the previous Spark framework using OpenCL.

Autonomous Vehicle Tracking Using Two TDNN Neural Networks (뉴럴네트워크를 이용한 무인 전방차량 추적방법)

  • Lee, Hee-Man
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.5
    • /
    • pp.1037-1045
    • /
    • 1996
  • In this paper, the parallel model for stereo camera is employed to find the heralding angle and the distance between a leading vehicle and the following vehicle, BART(Binocular Autonomous Research Team vehicle). Two TDNNs (Time Delay Neural Network) such as S-TDNN and A-TDNN are introduced to control BART. S-TDNN controls the speed of the following vehicle while A-TDNN controls the steering angle of BATR. A human drives BART to collect data which are used for training the said neural networks. The trained networks performed the vehicle tracking function satisfactorily under the same driving conditions performed by the human driver. The neural network approach has good portability which decreases costs and saves development time for the different types of vehicles.

  • PDF

Development of Multi-Laser Vision System For 3D Surface Scanning (3 차원 곡면 데이터 획득을 위한 멀티 레이져 비젼 시스템 개발)

  • Lee, J.H.;Kwon, K.Y.;Lee, H.C.;Doe, Y.C.;Choi, D.J.;Park, J.H.;Kim, D.K.;Park, Y.J.
    • Proceedings of the KSME Conference
    • /
    • 2008.11a
    • /
    • pp.768-772
    • /
    • 2008
  • Various scanning systems have been studied in many industrial areas to acquire a range data or to reconstruct an explicit 3D model. Currently optical technology has been used widely by virtue of noncontactness and high-accuracy. In this paper, we describe a 3D laser scanning system developped to reconstruct the 3D surface of a large-scale object such as a curved-plate of ship-hull. Our scanning system comprises of 4ch-parallel laser vision modules using a triangulation technique. For multi laser vision, calibration method based on least square technique is applied. In global scanning, an effective method without solving difficulty of matching problem among the scanning results of each camera is presented. Also minimal image processing algorithm and robot-based calibration technique are applied. A prototype had been implemented for testing.

  • PDF

A design of CAVLC(Context-Adaptive Variable Length Coding) for H.264 (H.264 CAVLC(Context-Adaptive Variable Length Coding)설계)

  • Lee, Yong-Ju;Suh, Ki-Bum
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.108-111
    • /
    • 2008
  • In this paper, we propose an advanced hardware architecture for the CAVLC entropy encoder engine for real time Full HD video compression. Since there are 384 data coefficients which are sum of 376 AC coefficient and 8 DC coefficient per one macroblock, 384 coefficient have to be processed per one macroblock in worst case for real time processing. We propose an novel architecture which includes parallel architecture and pipeline processing, and reduction "0" in AC/DC coefficient table. To verify the proposed architecture, we develop the reference C for CAVLC and verified the designed circuit with the test vector from reference C code.

  • PDF

An implementation of a unified ALU in multi-core GPGPU based on SIMT architecture (SIMT 구조 기반 멀티코어 GPGPU의 통합 ALU 설계)

  • Kyung, Gyu-taek;Kwak, Jae-Chang;Lee, Kwang-yeob
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2013.10a
    • /
    • pp.540-543
    • /
    • 2013
  • This paper describes an implementation of a unified ALU on multi-core GPGPU based on SIMT architecture. Our unified ALU can operate conditional branch instructions, data movement instructions, integer arithmetic instructions and floating-point arithmetic instructions. Since multi-core GPGPU contains a lot of ALU for parallel processing of various types, the main point of this paper is to design the minimum size ALU by unifying similar processing of each operations on circit level. All instrunctions were tested by making a test program. And we compare this results with results of CPU operations to verify our ALU. Our unified ALU's gate size is approximately 20,000 and the maximum operation frequency is 430MHz.

  • PDF

Design and Analysis of Continuous Media Representation Mechanism to minimize the Variance of Latency Time (지연시간의 변화를 고려한 연속미디어 표현 메카니즘의 설계 및 분석)

  • Yeo, In-Kook;Hwang, Dae-Hoon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.5
    • /
    • pp.787-796
    • /
    • 1995
  • In this paper, a mechanism to enhance the utilization of the tertiary storage devices such as optical disk, digital tape and so on is proposed. For this purpose, we suggest a storage structure with a CD-ROM and multiple disk and design an efficient mechanism to represent the continuous media stored in CD-ROM. The continuous media storage structure proposed in this pater can enhanced the retrieval capability of information by sending data using a shared bus with high bandwidth. And a continuous media representation mechanism not only can reduce the discontinuity of representation using parallel operation of fetch and representation but also minimize the variance of service time and the missing ratio of re-reference requirement by residing frequently accessed object on the disk.

  • PDF

A Study on the Design of Echo-Canceller using SIA(Stochastic Iteration Algorithm) (SIA(Stochastic Iteration Algorithm)을 이용한 반향제거기 설계에 관한 연구)

  • Cho, Hyon-Mook;Kim, Sang-Hoon;Park, Nho-Kyung;Moon, Dai-Tchul;Tchah, Kyun-Hyon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.2
    • /
    • pp.38-49
    • /
    • 1994
  • This paper proposes Echo canceller used in simultaneous two-way ('full-duplex') transmission of data signals over two-wire circuits which can be achieved by using a hybrid coupler. This Echo canceller uses sequential processing instead of parallel processing with conventional adaptive digital filter. This structure reduces the number of multipliers. Thus, this structure is much more suitable for IC implementation. This Echo canceller operates according to the 'Stochastic Iteration Algorithm(SIA).' SIA algorithm has merit of good performance and small hardware requirement.

  • PDF

Signal Processing and Implementation of Transmitter for Cochlear Implant (인공 와우를 위한 신호 처리 및 전달부의 구현)

  • Chae, D.;Choi, D.;Byun, J.;Baeck, S.;Kong, H.;Park, S.
    • Proceedings of the KIEE Conference
    • /
    • 1993.07a
    • /
    • pp.284-286
    • /
    • 1993
  • Software and hardware for cochlear implant system have been developed to create a speech signal processing system which, in real-time, extracts model parameter including formants, pitch, amplitude information. The system is based on the Texas Instruments TMS320 family. In hardware, computer interface has been desisted and implemented that allows presentation of biphasic pulse stimuli to patients with the hearing handicapped. The host computer sends a stream of bytes to the parallel port. Upon receipt of the data the interface generates the appropriate burst sequence that is delivered to the patient's external transmitter coil. The coded information is interpreted by the Nucleus-22 internal receiver that delivers the pulse to the specified electrodes at the specified amplitude and pulse width.

  • PDF

Extension of the NEAMS workbench to parallel sensitivity and uncertainty analysis of thermal hydraulic parameters using Dakota and Nek5000

  • Delchini, Marc-Olivier G.;Swiler, Laura P.;Lefebvre, Robert A.
    • Nuclear Engineering and Technology
    • /
    • v.53 no.10
    • /
    • pp.3449-3459
    • /
    • 2021
  • With the increasing availability of high-performance computing (HPC) platforms, uncertainty quantification (UQ) and sensitivity analyses (SA) can be efficiently leveraged to optimize design parameters of complex engineering problems using modeling and simulation tools. The workflow involved in such studies heavily relies on HPC resources and hence requires pre-processing and post-processing capabilities of large amounts of data along with remote submission capabilities. The NEAMS Workbench addresses all aspects of the workflows involved in these studies by relying on a user-friendly graphical user interface and a python application program interface. This paper highlights the NEAMS Workbench capabilities by presenting a semiautomated coupling scheme between Dakota and any given package integrated with the NEAMS Workbench, yielding a simplified workflow for users. This new capability is demonstrated by running a SA of a turbulent flow in a pipe using the open-source Nek5000 CFD code. A total of 54 jobs were run on a HPC platform using the remote capabilities of the NEAMS Workbench. The results demonstrate that the semiautomated coupling scheme involving Dakota can be efficiently used for UQ and SA while keeping scripting tasks to a minimum for users. All input and output files used in this work are available in https://code.ornl.gov/neams-workbench/dakota-nek5000-study.

A Study of Dark Photon at the Electron-Positron Collider Experiments Using KISTI-5 Supercomputer

  • Park, Kihong;Cho, Kihyeon
    • Journal of Astronomy and Space Sciences
    • /
    • v.38 no.1
    • /
    • pp.55-63
    • /
    • 2021
  • The universe is well known to be consists of dark energy, dark matter and the standard model (SM) particles. The dark matter dominates the density of matter in the universe. The dark matter is thought to be linked with dark photon which are hypothetical hidden sector particles similar to photons in electromagnetism but potentially proposed as force carriers. Due to the extremely small cross-section of dark matter, a large amount of data is needed to be processed. Therefore, we need to optimize the central processing unit (CPU) time. In this work, using MadGraph5 as a simulation tool kit, we examined the CPU time, and cross-section of dark matter at the electron-positron collider considering three parameters including the center of mass energy, dark photon mass, and coupling constant. The signal process pertained to a dark photon, which couples only to heavy leptons. We only dealt with the case of dark photon decaying into two muons. We used the simplified model which covers dark matter particles and dark photon particles as well as the SM particles. To compare the CPU time of simulation, one or more cores of the KISTI-5 supercomputer of Nurion Knights Landing and Skylake and a local Linux machine were used. Our results can help optimize high-energy physics software through high-performance computing and enable the users to incorporate parallel processing.