• Title/Summary/Keyword: computational scalability

Search Result 66, Processing Time 0.027 seconds

Multi-View Supporting VR/AR Visualization System for Supercomputing-based Engineering Analysis Services (슈퍼컴퓨팅 기반의 공학해석 서비스 제공을 위한 멀티 뷰 지원 VR/AR 가시화 시스템 개발)

  • Seo, Dong Woo;Lee, Jae Yeol;Lee, Sang Min;Kim, Jae Seong;Park, Hyung Wook
    • Korean Journal of Computational Design and Engineering
    • /
    • v.18 no.6
    • /
    • pp.428-438
    • /
    • 2013
  • The requirement for high performance visualization of engineering analysis of digital products is increasing since the size of the current analysis problems is more and more complex, which needs high-performance codes as well as high performance computing systems. On the other hand, different companies or customers do not have all the facilities or have difficulties in accessing those computing resources. In this paper, we present a multi-view supporting VR/AR system for providing supercomputing-based engineering analysis services. The proposed system is designed to provide different views supporting VR/AR visualization services depending on the requirement of the customers. It provides a sophisticated VR rendering directly dependent on a supercomputing resource as well as a remotely accessible AR visualization. By providing multi-view centric analysis services, the proposed system can be more easily applied to various customers requiring different levels of high performance computing resources. We will show the scalability and vision of the proposed approach by demonstrating illustrative examples with different levels of complexity.

Distributed Target Localization with Inaccurate Collaborative Sensors in Multipath Environments

  • Feng, Yuan;Yan, Qinsiwei;Tseng, Po-Hsuan;Hao, Ganlin;Wu, Nan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.5
    • /
    • pp.2299-2318
    • /
    • 2019
  • Location-aware networks are of great importance for both civil lives and military applications. Methods based on line-of-sight (LOS) measurements suffer sever performance loss in harsh environments such as indoor scenarios, where sensors can receive both LOS and non-line-of-sight (NLOS) measurements. In this paper, we propose a data association (DA) process based on the expectation maximization (EM) algorithm, which enables us to exploit multipath components (MPCs). By setting the mapping relationship between the measurements and scatters as a latent variable, coefficients of the Gaussian mixture model are estimated. Moreover, considering the misalignment of sensor position, we propose a space-alternating generalized expectation maximization (SAGE)-based algorithms to jointly update the target localization and sensor position information. A two dimensional (2-D) circularly symmetric Gaussian distribution is employed to approximate the probability density function of the sensor's position uncertainty via the minimization of the Kullback-Leibler divergence (KLD), which enables us to calculate the expectation step with low computational complexity. Moreover, a distributed implementation is derived based on the average consensus method to improve the scalability of the proposed algorithm. Simulation results demonstrate that the proposed centralized and distributed algorithms can perform close to the Monte Carlo-based method with much lower communication overhead and computational complexity.

THE EFFECT OF NUMBER OF VIRTUAL CHANNELS ON NOC EDP

  • Senejani, Mahdieh Nadi;Ghadiry, Mahdiar Hossein;Dermany, Mohamad Khalily
    • Journal of applied mathematics & informatics
    • /
    • v.28 no.1_2
    • /
    • pp.539-551
    • /
    • 2010
  • Low scalability and power efficiency of the shared bus in SoCs is a motivation to use on chip networks instead of traditional buses. In this paper we have modified the Orion power model to reach an analytical model to estimate the average message energy in K-Ary n-Cubes with focus on the number of virtual channels. Afterward by using the power model and also the performance model proposed in [11] the effect of number of virtual channels on Energy-Delay product have been analyzed. In addition a cycle accurate power and performance simulator have been implemented in VHDL to verify the results.

An Embedded ACELP Speech Coding Based on the AMR-WB Codec

  • Byun, Kyung-Jin;Eo, Ik-Soo;Jeong, Hee-Bum;Hahn, Min-Soo
    • ETRI Journal
    • /
    • v.27 no.2
    • /
    • pp.231-234
    • /
    • 2005
  • This letter proposes a new embedded speech coding structure based on the Adaptive Multi-Rate Wideband (AMR-WB) standard codec. The proposed coding scheme consists of three different bitrates where the two lower bitrates are embedded into the highest one. The embedded bitstream was achieved by modifying the algebraic codebook search procedure adopted for the AMR-WB codec. The proposed method provides the advantage of scalability due to the embedded bitstream, while it inevitably requires some additional computational complexity for obtaining two different code vectors of the higher bitrate modes. Compared to the AMR-WB codec, the embedded coder shows improved speech qualities for two higher bitrate modes with a slightly increased bitrate caused by the decreased coding efficiency of the algebraic codebook.

  • PDF

Fast Enhancement Layer Encoding Method using CU Depth Correlation between Adjacent Layers for SHVC

  • Kim, Kyeonghye;Lee, Seonoh;Ahn, Yongjo;Sim, Donggyu
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.6
    • /
    • pp.260-264
    • /
    • 2013
  • This paper proposes a fast enhancement layer coding method to reduce computational complexity for Scalable HEVC (SHVC) which is based on High Efficiency Video Coding (HEVC). The proposed method decreases encoding time by simplifying Rate Distortion Optimization (RDO)for enhancement layers (EL). The simplification is achieved by restricting CU depths based on the correlation of coding unit (CU) depths between adjacent layers and scalability (spatial or quality) of EL. Comparing with the performance of SHM 1.0 software encoder, the proposed method reduces the encoding time by up to 31.5%.

비점성 압축성 코드의 병렬화 기법에 의한 슈퍼컴퓨터 CRAY T3E의 성능 분석

  • Go Deok-Gon
    • 한국전산유체공학회:학술대회논문집
    • /
    • 1997.10a
    • /
    • pp.17-22
    • /
    • 1997
  • The performances of the CRAYT3E and CRAYC90 were compared in the point of aerodynamics. The CRAYC90 with and without the highest vector option was run, respectively. The CRAYT3E was run with various processors (from 1pe to 32pes). The communication utilities of MPI and SHMEM were used to inform the boundary data to the other processors. The DADI Euler solver, which is implicit scheme and use central difference method, was used. The domain decomposition method was also used. As the result, the CRAYC90 with the highest vector option is 5.7 times faster than the CRAYT3E with 1 processor. However, because of the scalability of the CRAYT3E, the CRAYT3E with more than 6 processors is faster than CRAYC90. In case that 32 processors used, the CRAYT3E is 6 times faster than CRAYC90 with the highest vector option.

  • PDF

Design and Implementation of a Massively Parallel Multithreaded Architecture: DAVRID

  • Sangho Ha;Kim, Junghwan;Park, Eunha;Yoonhee Hah;Sangyong Han;Daejoon Hwang;Kim, Heunghwan;Seungho Cho
    • Journal of Electrical Engineering and information Science
    • /
    • v.1 no.2
    • /
    • pp.15-26
    • /
    • 1996
  • MPAs(Massively Parallel Architectures) should address two fundamental issues for scalability: synchronization and communication latency. Dataflow architecture faces problems of excessive synchronization overhead and inefficient execution of sequential programs while they offer the ability to exploit massive parallelism inherent in programs. In contrast, MPAs based on von Neumann computational model may suffer from inefficient synchronization mechanism and communication latency. DAVRID (DAtaflow/Von Neumann RISC hybrID) is a massively parallel multithreaded architecture which takes advantages of von Neumann and dataflow models. It has good single thread performance as well as tolerates synchronization and communication latency. In this paper, we describe the DAVRID architecture in detail and evaluate its performance through simulation runs over several benchmarks.

  • PDF

Study on Accelerating Distributed ML Training in Orchestration

  • Su-Yeon Kim;Seok-Jae Moon
    • International journal of advanced smart convergence
    • /
    • v.13 no.3
    • /
    • pp.143-149
    • /
    • 2024
  • As the size of data and models in machine learning training continues to grow, training on a single server is becoming increasingly challenging. Consequently, the importance of distributed machine learning, which distributes computational loads across multiple machines, is becoming more prominent. However, several unresolved issues remain regarding the performance enhancement of distributed machine learning, including communication overhead, inter-node synchronization challenges, data imbalance and bias, as well as resource management and scheduling. In this paper, we propose ParamHub, which utilizes orchestration to accelerate training speed. This system monitors the performance of each node after the first iteration and reallocates resources to slow nodes, thereby speeding up the training process. This approach ensures that resources are appropriately allocated to nodes in need, maximizing the overall efficiency of resource utilization and enabling all nodes to perform tasks uniformly, resulting in a faster training speed overall. Furthermore, this method enhances the system's scalability and flexibility, allowing for effective application in clusters of various sizes.

Measuring Hadoop Optimality by Lorenz Curve (로렌츠 커브를 이용한 하둡 플랫폼의 최적화 지수)

  • Kim, Woo-Cheol;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.2
    • /
    • pp.249-261
    • /
    • 2014
  • Ever increasing "Big data" can only be effectively processed by parallel computing. Parallel computing refers to a high performance computational method that achieves effectiveness by dividing a big query into smaller subtasks and aggregating results from subtasks to provide an output. However, it is well-known that parallel computing does not achieve scalability which means that performance is improved linearly by adding more computers because it requires a very careful assignment of tasks to each node and collecting results in a timely manner. Hadoop is one of the most successful platforms to attain scalability. In this paper, we propose a measurement for Hadoop optimization by utilizing a Lorenz curve which is a proxy for the inequality of hardware resources. Our proposed index takes into account the intrinsic overhead of Hadoop systems such as CPU, disk I/O and network. Therefore, it also indicates that a given Hadoop can be improved explicitly and in what capacity. Our proposed method is illustrated with experimental data and substantiated by Monte Carlo simulations.

A Ray-Tracing Algorithm Based On Processor Farm Model (프로세서 farm 모델을 이용한 광추적 알고리듬)

  • Lee, Hyo Jong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.2 no.1
    • /
    • pp.24-30
    • /
    • 1996
  • The ray tracing method, which is one of many photorealistic rendering techniques, requires heavy computational processing to synthesize images. Parallel processing can be used to reduce the computational processing time. A parallel algorithm for the ray tracing has been implemented and executed for various images on transputer systems. In order to develop a scalable parallel algorithm, a processor farming technique has been exploited. Since each image is divided and distributed to each farming processor, the scalability of the parallel system and load balancing are achieved naturally in the proposed algorithm. Efficiency of the parallel algorithm is obtained up to 95% for nine processors. However, the best size of a distributed task is much higher in simple images due to less computational requirement for every pixel. Efficiency degradation is observed for large granularity tasks because of load unbalancing caused by the large task. Overall, transputer systems behave as good scalable parallel processing system with respect to the cost-performance ratio.

  • PDF