• Title/Summary/Keyword: High Performance Massive Computing

Search Result 28, Processing Time 0.022 seconds

Large-scale 3D fast Fourier transform computation on a GPU

  • Jaehong Lee;Duksu Kim
    • ETRI Journal
    • /
    • v.45 no.6
    • /
    • pp.1035-1045
    • /
    • 2023
  • We propose a novel graphics processing unit (GPU) algorithm that can handle a large-scale 3D fast Fourier transform (i.e., 3D-FFT) problem whose data size is larger than the GPU's memory. A 1D FFT-based 3D-FFT computational approach is used to solve the limited device memory issue. Moreover, to reduce the communication overhead between the CPU and GPU, we propose a 3D data-transposition method that converts the target 1D vector into a contiguous memory layout and improves data transfer efficiency. The transposed data are communicated between the host and device memories efficiently through the pinned buffer and multiple streams. We apply our method to various large-scale benchmarks and compare its performance with the state-of-the-art multicore CPU FFT library (i.e., fastest Fourier transform in the West [FFTW]) and a prior GPU-based 3D-FFT algorithm. Our method achieves a higher performance (up to 2.89 times) than FFTW; it yields more performance gaps as the data size increases. The performance of the prior GPU algorithm decreases considerably in massive-scale problems, whereas our method's performance is stable.

Three-dimensional human activity recognition by forming a movement polygon using posture skeletal data from depth sensor

  • Vishwakarma, Dinesh Kumar;Jain, Konark
    • ETRI Journal
    • /
    • v.44 no.2
    • /
    • pp.286-299
    • /
    • 2022
  • Human activity recognition in real time is a challenging task. Recently, a plethora of studies has been proposed using deep learning architectures. The implementation of these architectures requires the high computing power of the machine and a massive database. However, handcrafted features-based machine learning models need less computing power and very accurate where features are effectively extracted. In this study, we propose a handcrafted model based on three-dimensional sequential skeleton data. The human body skeleton movement over a frame is computed through joint positions in a frame. The joints of these skeletal frames are projected into two-dimensional space, forming a "movement polygon." These polygons are further transformed into a one-dimensional space by computing amplitudes at different angles from the centroid of polygons. The feature vector is formed by the sampling of these amplitudes at different angles. The performance of the algorithm is evaluated using a support vector machine on four public datasets: MSR Action3D, Berkeley MHAD, TST Fall Detection, and NTU-RGB+D, and the highest accuracies achieved on these datasets are 94.13%, 93.34%, 95.7%, and 86.8%, respectively. These accuracies are compared with similar state-of-the-art and show superior performance.

Performance Analysis of IEEE 1394 High Speed Serial Bus for Massive Multimedia Transmission (대용량 멀티미디어 전송을 위한 IEEE 1394고속 직렬 버스의 성능 분석)

  • 이희진;민구봉;김종권
    • Journal of KIISE:Information Networking
    • /
    • v.30 no.4
    • /
    • pp.494-503
    • /
    • 2003
  • The IEEE 1394 High Speed Serial Bus is a versatile, high-performance, and low-cost method of promoting interoperability between all types of A/V and computing devices. IEEE 1394 provides two transfer modes: asynchronous mode for best effort service and isochronous mode for best effort service with bandwidth reservation. This paper shows the bus performance and compared the transfer odes first at the link level and then at the application level. For the application level performance, we analyze the bus systems with fixed and adaptive interfaces, applied between the upper layer and the 1394 layer, using polling systems. Also we verifies the analysis models with simulation studies. Based on our analysis, we conclude that the adaptive interface reduces the bus access time and so increases the bus utilization.

Implementation of Efficient Power Method on CUDA GPU (CUDA 기반 GPU에서 효율적인 Power Method의 구현)

  • Kim, Jung-Hwan;Kim, Jin-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.9-16
    • /
    • 2011
  • GPU computing is emerging in high performance application area since it can easily exploit massive parallelism in a way of cost-effective computing. The power method which finds the eigen vector of a given matrix is widely used in various applications such as PageRank for calculating importance of web pages. In this research we made the power method efficiently parallelized on GPU and also suggested how it can be improved to enhance its performance. The power method mainly consists of matrix-vector product and it can be easily parallelized. However, it should decide the convergence of the eigen vector and need scaling of the vector subsequently. Such operations incur several calls to GPU kernels and data movement between host and GPU memories. We improved the performance of the power method by means of reduced calls to GPU kernels, optimized thread allocation and enhanced decision operation for the convergence.

BARAM: VIRTUAL WIND-TUNNEL SYSTEM FOR CFD SIMULATION (BARAM: 전산유체 해석을 위한 가상풍동 시스템)

  • Kim, Min Ah;Lee, Joong-Youn;Gu, Gibeom;Her, Young-Ju;Lee, Sehoon;Park, Soo Hyung;Kim, Kyu Hong;Cho, Kumwon
    • Journal of computational fluids engineering
    • /
    • v.20 no.4
    • /
    • pp.28-35
    • /
    • 2015
  • BARAM system that means 'wind' in Korean has been established as a virtual wind tunnel system for aircraft design. Its aim is to provide researchers with easy-to-use, production-level environment for all stages of CFD simulation. To cope with this goal an integrated environment with a set of CFD solvers is developed and coupled with an highly-efficient visualization software. BARAM has three improvements comparing with previous CFD simulation environments. First, it provides a new automatic mesh generation method for structured and unstructured grid. Second, it also provides real-time visualization for massive CFD data set. Third, it includes more high-fidelity CFD solvers than commercial solvers.

DJFS: Providing Highly Reliable and High-Performance File System with Small-Sized NVRAM

  • Kim, Junghoon;Lee, Minho;Song, Yongju;Eom, Young Ik
    • ETRI Journal
    • /
    • v.39 no.6
    • /
    • pp.820-831
    • /
    • 2017
  • File systems and applications try to implement their own update protocols to guarantee data consistency, which is one of the most crucial aspects of computing systems. However, we found that the storage devices are substantially under-utilized when preserving data consistency because they generate massive storage write traffic with many disk cache flush operations and force-unit-access (FUA) commands. In this paper, we present DJFS (Delta-Journaling File System) that provides both a high level of performance and data consistency for different applications. We made three technical contributions to achieve our goal. First, to remove all storage accesses with disk cache flush operations and FUA commands, DJFS uses small-sized NVRAM for a file system journal. Second, to reduce the access latency and space requirements of NVRAM, DJFS attempts to journal compress the differences in the modified blocks. Finally, to relieve explicit checkpointing overhead, DJFS aggressively reflects the checkpoint transactions to file system area in the unit of the specified region. Our evaluation on TPC-C SQLite benchmark shows that, using our novel optimization schemes, DJFS outperforms Ext4 by up to 64.2 times with only 128 MB of NVRAM.

Design and Prototyping of Scientific Collaboration Platform over KREONET (KREONET 기반의 과학기술협업연구 플랫폼(RealLab) 설계 및 프로토타입 구축)

  • Kwon, Yoonjoo;Hong, Wontaek
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.9
    • /
    • pp.297-306
    • /
    • 2015
  • Cloud computing has been increasingly used in various fields due to its flexibility, scalability, cost effectiveness, etc. Recently, many scientific communities have been attempting to use cloud computing as a way to deal with difficulties in constructing and operating a research infrastructure. Especially, since they need various collaborations based on networking, such as sharing experimental data, redistributing experimental results, and so forth, cloud computing environment that supports high performance networking is required for scientific communities. To address these issues, we propose RealLab, a high performance cloud platform for collaborative research that provides virtual experimental research environment and data sharing infrastructure over KREONET/GLORIAD. Additionally, we describe some RealLab use cases for showing the swift creation of experimental environment and explain how massive experimental data can be transferred and shared among the community members.

Design and implementation of a Shared-Concurrent File System in distributed UNIX environment (분산 UNIX 환경에서 Shared-Concurrent File System의 설계 및 구현)

  • Jang, Si-Ung;Jeong, Gi-Dong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.3
    • /
    • pp.617-630
    • /
    • 1996
  • In this paper, a shared-concurrent file system (S-CFS) is designed and implemented using conventional disks as disk arrays on a Workstation Cluster which can be used as a small-scale server. Since it is implemented on UNIX operating systems, S_CFS is not only portable and flexible but also efficient in resource usage because it does not require additional I/O nodes. The result of the research shows that on small-scale systems with enough disks, the performance of the concurrent file system on transaction processing applications is bounded by the bottleneck of CPUs computing powers while the performance of the concurrent file system on massive data I/Os is bounded by the time required to copy data between buffers. The concurrent file system,which has been implemented on a Workstation Cluster with 8 disks,shows a throughput of 388 tps in case of transaction processing applications and can provide the bandwidth of 15.8 Mbytes/sec in case of massive data processing applications. Moreover,the concurrent file system has been dsigned to enhance the throughput of applications requirring high performance I/O by controlling the paralleism of the concurrent file system on user's side.

  • PDF

Big Data Security and Privacy: A Taxonomy with Some HPC and Blockchain Perspectives

  • Alsulbi, Khalil;Khemakhem, Maher;Basuhail, Abdullah;Eassa, Fathy;Jambi, Kamal Mansur;Almarhabi, Khalid
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.7
    • /
    • pp.43-55
    • /
    • 2021
  • The amount of Big Data generated from multiple sources is continuously increasing. Traditional storage methods lack the capacity for such massive amounts of data. Consequently, most organizations have shifted to the use of cloud storage as an alternative option to store Big Data. Despite the significant developments in cloud storage, it still faces many challenges, such as privacy and security concerns. This paper discusses Big Data, its challenges, and different classifications of security and privacy challenges. Furthermore, it proposes a new classification of Big Data security and privacy challenges and offers some perspectives to provide solutions to these challenges.

RLDB: Robust Local Difference Binary Descriptor with Integrated Learning-based Optimization

  • Sun, Huitao;Li, Muguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.9
    • /
    • pp.4429-4447
    • /
    • 2018
  • Local binary descriptors are well-suited for many real-time and/or large-scale computer vision applications, while their low computational complexity is usually accompanied by the limitation of performance. In this paper, we propose a new optimization framework, RLDB (Robust-LDB), to improve a typical region-based binary descriptor LDB (local difference binary) and maintain its computational simplicity. RLDB extends the multi-feature strategy of LDB and applies a more complete region-comparing configuration. A cascade bit selection method is utilized to select the more representative patterns from massive comparison pairs and an online learning strategy further optimizes descriptor for each specific patch separately. They both incorporate LDP (linear discriminant projections) principle to jointly guarantee the robustness and distinctiveness of the features from various scales. Experimental results demonstrate that this integrated learning framework significantly enhances LDB. The improved descriptor achieves a performance comparable to floating-point descriptors on many benchmarks and retains a high computing speed similar to most binary descriptors, which better satisfies the demands of applications.