• Title/Summary/Keyword: Distributed processing. Cluster

Search Result 129, Processing Time 0.025 seconds

Effects of Hypervisor on Distributed Big Data Processing in Virtualizated Cluster Environment (가상화 클러스터 환경에서 빅 데이터 분산 처리 성능에 하이퍼바이저가 미치는 영향)

  • Chung, Haejin;Nah, Yunmook
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.2
    • /
    • pp.89-94
    • /
    • 2016
  • Recently, cluster computing environments have been in a process of change toward virtualized cluster environments. The change of the cluster environment has great impact on the performance of large volume distributed processing. Therefore, many domestic and international IT companies have invested heavily in research on cluster environments. In this paper, we show how the hypervisor affects the performance of distributed processing of a large volume of data. We present a performance comparison of MapReduce processing in two virtualized cluster environments, one built using the Xen hypervisor and the other built using the container-based Docker. Our results show that Docker is faster than Xen.

Performance Factor of Distributed Processing of Machine Learning using Spark (스파크를 이용한 머신러닝의 분산 처리 성능 요인)

  • Ryu, Woo-Seok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.1
    • /
    • pp.19-24
    • /
    • 2021
  • In this paper, we study performance factor of machine learning in the distributed environment using Apache Spark and presents an efficient distributed processing method through experiments. This work firstly presents performance factor when performing machine learning in a distributed cluster by classifying cluster performance, data size, and configuration of spark engine. In addition, performance study of regression analysis using Spark MLlib running on the Hadoop cluster is performed while changing the configuration of the node and the Spark Executor. As a result of the experiment, it was confirmed that the effective number of executors was affected by the number of data blocks, but depending on the cluster size, the maximum and minimum values were limited by the number of cores and the number of worker nodes, respectively.

Dynamic Cluster Management of Hadoop Distributed Filesystem (하둡 분산 파일시스템의 동적 클러스터 관리 기법)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.435-437
    • /
    • 2016
  • Hadoop Distributed File System(HDFS) is a file system for distributed processing of big data by replicating data to distributed data nodes. HDFS cluster shows a great scalability up to thousands of nodes, but it assumes a exclusive node cluster with numerous nodes for the big data processing. Various operational-purpose worker systems used by office are hardly considered as a part of cluster. This paper discusses this problem and proposes a dynamic cluster management technique to increase storage capability and analytic performance of hadoop cluster. The propsed technique can add legacy systems to the cluster and can remove them from the cluster dynamically depending on their availability.

  • PDF

A Data Transfer Method of the Sub-Cluster Group based on the Distributed and Shared Memory (분산 공유메모리를 기반으로 한 서브 클러스터 그룹의 자료전송방식)

  • Lee, Kee-Jun
    • The KIPS Transactions:PartA
    • /
    • v.10A no.6
    • /
    • pp.635-642
    • /
    • 2003
  • The radical development of recent network technology provides the basic foundation which can establish a high speed and cheap cluster system. It is a general trend that conventional cluster systems are built as the system over a fixed level based on stabilized and high speed local networks. A multi-distributed web cluster group is a web cluster model which can obtain high performance, high efficiency and high availability through mutual cooperative works between effective job division and system nodes through parallel performance of a given work and shared memory of SC-Server with low price and low speed system nodes on networks. For this, multi-distributed web cluster group builds a sub-cluster group bound with single imaginary networks of multiple system nodes and uses the web distributed shared memory of system nodes for the effective data transmission within sub-cluster groups. Since the presented model uses a load balancing and parallel computing method of large-scale work required from users, it can maximize the processing efficiency.

An Internet-based computing framework for the simulation of multi-scale response of structural systems

  • Chen, Hung-Ming;Lin, Yu-Chih
    • Structural Engineering and Mechanics
    • /
    • v.37 no.1
    • /
    • pp.17-37
    • /
    • 2011
  • This paper presents a new Internet-based computational framework for the realistic simulation of multi-scale response of structural systems. Two levels of parallel processing are involved in this frame work: multiple local distributed computing environments connected by the Internet to form a cluster-to-cluster distributed computing environment. To utilize such a computing environment for a realistic simulation, the simulation task of a structural system has been separated into a simulation of a simplified global model in association with several detailed component models using various scales. These related multi-scale simulation tasks are distributed amongst clusters and connected to form a multi-level hierarchy. The Internet is used to coordinate geographically distributed simulation tasks. This paper also presents the development of a software framework that can support the multi-level hierarchical simulation approach, in a cluster-to-cluster distributed computing environment. The architectural design of the program also allows the integration of several multi-scale models to be clients and servers under a single platform. Such integration can combine geographically distributed computing resources to produce realistic simulations of structural systems.

A Token Based Protocol for Mutual Exclusion in Mobile Ad Hoc Networks

  • Sharma, Bharti;Bhatia, Ravinder Singh;Singh, Awadhesh Kumar
    • Journal of Information Processing Systems
    • /
    • v.10 no.1
    • /
    • pp.36-54
    • /
    • 2014
  • Resource sharing is a major advantage of distributed computing. However, a distributed computing system may have some physical or virtual resource that may be accessible by a single process at a time. The mutual exclusion issue is to ensure that no more than one process at a time is allowed to access some shared resource. The article proposes a token-based mutual exclusion algorithm for the clustered mobile ad hoc networks (MANETs). The mechanism that is adapted to handle token passing at the inter-cluster level is different from that at the intra-cluster level. It makes our algorithm message efficient and thus suitable for MANETs. In the interest of efficiency, we implemented a centralized token passing scheme at the intra-cluster level. The centralized schemes are inherently failure prone. Thus, we have presented an intra-cluster token passing scheme that is able to tolerate a failure. In order to enhance reliability, we applied a distributed token circulation scheme at the inter-cluster level. More importantly, the message complexity of the proposed algorithm is independent of N, which is the total number of nodes in the system. Also, under a heavy load, it turns out to be inversely proportional to n, which is the (average) number of nodes per each cluster. We substantiated our claim with the correctness proof, complexity analysis, and simulation results. In the end, we present a simple approach to make our protocol fault tolerant.

Design and Implementation of Distributed Cluster Supporting Dynamic Down-Scaling of the Cluster (노드의 동적 다운 스케일링을 지원하는 분산 클러스터 시스템의 설계 및 구현)

  • Woo-Seok Ryu
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.2
    • /
    • pp.361-366
    • /
    • 2023
  • Apache Hadoop, a representative framework for distributed processing of big data, has the advantage of increasing cluster size up to thousands of nodes to improve parallel distributed processing performance. However, reducing the size of the cluster is limited to the extent of permanently decommissioning nodes with defects or degraded performance, so there are limitations to operate multiple nodes flexibly in small clusters. In this paper, we discuss the problems that occur when removing nodes from the Hadoop cluster and propose a dynamic down-scaling technique to manage the distributed cluster more flexibly. To do this, we design and implement a modified Hadoop system and interfaces to support dynamic down-scaling of the cluster which supports temporary pause of a node and reconnection of it when necessary, rather than decommissioning the node when removing a node from the Hadoop cluster. We have verified that effective downsizing can be performed without performance degradation based on experimental results.

A Cluster-based Routing Protocol with Energy Consumption Balance in Distributed Wireless Sensor Networks (분산 무선센서 네트워크의 클러스터-기반 에너지 소비 균형 라우팅 프로토콜)

  • Kim, Tae-Hyo;Ju, Yeon-Jeong;Oh, Ho-Suck;Kim, Min-Kyu;Jung, Yong-Bae
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.15 no.4
    • /
    • pp.155-161
    • /
    • 2014
  • In this paper, a cluster-based routing protocol in distributed sensor network is proposed, which enable the balanced energy consumption in the sensor nodes densely deployed in the sensor fields. This routing protocol is implemented based on clusters with hierarchical scheme. The clusters are formed by the closely located sensor nodes. A cluster node with maximum residual energy in the cluster, can be selected as cluster head node. In routing, one of the nodes in the intersection area between two clusters is selected as a relay-node and this method can extend the lifetime of all the sensor nodes in view of the balanced consumption of communication energy.

Implementation of AIoT Edge Cluster System via Distributed Deep Learning Pipeline

  • Jeon, Sung-Ho;Lee, Cheol-Gyu;Lee, Jae-Deok;Kim, Bo-Seok;Kim, Joo-Man
    • International journal of advanced smart convergence
    • /
    • v.10 no.4
    • /
    • pp.278-288
    • /
    • 2021
  • Recently, IoT systems are cloud-based, so that continuous and large amounts of data collected from sensor nodes are processed in the data server through the cloud. However, in the centralized configuration of large-scale cloud computing, computational processing must be performed at a physical location where data collection and processing take place, and the need for edge computers to reduce the network load of the cloud system is gradually expanding. In this paper, a cluster system consisting of 6 inexpensive Raspberry Pi boards was constructed to perform fast data processing. And we propose "Kubernetes cluster system(KCS)" for processing large data collection and analysis by model distribution and data pipeline method. To compare the performance of this study, an ensemble model of deep learning was built, and the accuracy, processing performance, and processing time through the proposed KCS system and model distribution were compared and analyzed. As a result, the ensemble model was excellent in accuracy, but the KCS implemented as a data pipeline proved to be superior in processing speed..

Load Balancing Strategies for Network-based Cluster System

  • Jung, Hoon-Jin;Choung Shik park;Park, Sang-Bang
    • Proceedings of the IEEK Conference
    • /
    • 2000.07a
    • /
    • pp.314-317
    • /
    • 2000
  • Cluster system provides attractive scalability in terms of computation power and memory size. With the advances in high speed computer network technology, cluster systems are becoming increasingly competitive compared to expensive parallel machines. In parallel processing program, each task load is difficult to predict before running the program and each task is interdependent each other in many ways. Load imbalancing induces an obstacle to system performance. Most of researches in load balancing were concerned with distributed system but researches in cluster system are few. In cluster system, the dynamic load balancing algorithm which evaluates each processor's load in runtime is purpose that the load of each node are evenly distributed. But, if communication cost or node complexity becomes high, it is not effective method for all nodes to attend load balancing process. In that circumstances, it is good to reduce the number of node which attend to load balancing process. We have modeled cluster systems and proposed marginal dynamic load balancing algorithms suitable for that circumstances.

  • PDF