• 제목/요약/키워드: Cluster Computing

검색결과 425건 처리시간 0.028초

Scalable Prediction Models for Airbnb Listing in Spark Big Data Cluster using GPU-accelerated RAPIDS

  • Muralidharan, Samyuktha;Yadav, Savita;Huh, Jungwoo;Lee, Sanghoon;Woo, Jongwook
    • Journal of information and communication convergence engineering
    • /
    • 제20권2호
    • /
    • pp.96-102
    • /
    • 2022
  • We aim to build predictive models for Airbnb's prices using a GPU-accelerated RAPIDS in a big data cluster. The Airbnb Listings datasets are used for the predictive analysis. Several machine-learning algorithms have been adopted to build models that predict the price of Airbnb listings. We compare the results of traditional and big data approaches to machine learning for price prediction and discuss the performance of the models. We built big data models using Databricks Spark Cluster, a distributed parallel computing system. Furthermore, we implemented models using multiple GPUs using RAPIDS in the spark cluster. The model was developed using the XGBoost algorithm, whereas other models were developed using traditional central processing unit (CPU)-based algorithms. This study compared all models in terms of accuracy metrics and computing time. We observed that the XGBoost model with RAPIDS using GPUs had the highest accuracy and computing time.

리눅스 클러스터 시스템 통합 관리 도구 (Integrated Linux Cluster System Administration Tool)

  • 김은회;김지연;박용관;권성주;최재영
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제8권6호
    • /
    • pp.639-646
    • /
    • 2002
  • 본 논문에서는 리눅스 클러스터 시스템 통합 관리 도구인 CATS-i'(Cluster Administration ToolS on the Internet)의 시스템 구성과 디자인 문제들을 논한다. CATS-i'는 리눅스 클러스터 시스템을 쉽고 빠르고 안전하게 설치하고 관리하기 위하여 개발된 도구이다. 클러스터의 운영체제 설치에서부터 응용 프로그램 패키지 설치, 클러스터 노드들의 자원들을 실시간으로 모니터링하고 관리할 수 있는 기능, 배치 작업 제출 및 관리 기능들이 통합되어 사용자에게 단일 시스템 이미지를 제공한다. 또한 사용자가 플랫폼에 관계없이 쉽고 편리하게 클러스터의 상태를 파악하고 관리할 수 있는 자바 기반의 강력한 그래픽 사용자 인터페이스를 제공한다.

RDP: A storage-tier-aware Robust Data Placement strategy for Hadoop in a Cloud-based Heterogeneous Environment

  • Muhammad Faseeh Qureshi, Nawab;Shin, Dong Ryeol
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권9호
    • /
    • pp.4063-4086
    • /
    • 2016
  • Cloud computing is a robust technology, which facilitate to resolve many parallel distributed computing issues in the modern Big Data environment. Hadoop is an ecosystem, which process large data-sets in distributed computing environment. The HDFS is a filesystem of Hadoop, which process data blocks to the cluster nodes. The data block placement has become a bottleneck to overall performance in a Hadoop cluster. The current placement policy assumes that, all Datanodes have equal computing capacity to process data blocks. This computing capacity includes availability of same storage media and same processing performances of a node. As a result, Hadoop cluster performance gets effected with unbalanced workloads, inefficient storage-tier, network traffic congestion and HDFS integrity issues. This paper proposes a storage-tier-aware Robust Data Placement (RDP) scheme, which systematically resolves unbalanced workloads, reduces network congestion to an optimal state, utilizes storage-tier in a useful manner and minimizes the HDFS integrity issues. The experimental results show that the proposed approach reduced unbalanced workload issue to 72%. Moreover, the presented approach resolve storage-tier compatibility problem to 81% by predicting storage for block jobs and improved overall data block placement by 78% through pre-calculated computing capacity allocations and execution of map files over respective Namenode and Datanodes.

윈도우즈 기반의 병렬컴퓨팅 환경 구축 및 성능평가 (Construction and Performance Evaluation of Windows- based Parallel Computing Environment)

  • 신재렬;김명호;최정열
    • 한국전산유체공학회:학술대회논문집
    • /
    • 한국전산유체공학회 2001년도 추계 학술대회논문집
    • /
    • pp.58-62
    • /
    • 2001
  • Aparallel computing environment was constructed based on Windows 2000 operating system. This cluster was configured using Fast-Ethernet system to hold up together the clients within a network domain. For the parallel computation, MPI implements for Windows such as MPICH.NT.1.2.2 and MP-MPICHNT.1.2 were used with Compaq Visual Fortran compiler which produce a well optimized executives for x86 systems. The evaluation of this cluster performance was carried out using a preconditioned Navier-Stokes code for the 2D analysis of a compressible and viscous flow around a compressor blade. The parallel performance was examined in comparison with those of Linux clusters studied previously by changing a number of processors, problem size and MPI libraries. The result from the test problems presents that parallel performance of the low cost Fast-Ethernet Windows cluster is superior to that of a Linux cluster of similar configuration and is comparable to that of a Myrinet cluster.

  • PDF

On the Performance of Oracle Grid Engine Queuing System for Computing Intensive Applications

  • Kolici, Vladi;Herrero, Albert;Xhafa, Fatos
    • Journal of Information Processing Systems
    • /
    • 제10권4호
    • /
    • pp.491-502
    • /
    • 2014
  • In this paper we present some research results on computing intensive applications using modern high performance architectures and from the perspective of high computational needs. Computing intensive applications are an important family of applications in distributed computing domain. They have been object of study using different distributed computing paradigms and infrastructures. Such applications distinguish for their demanding needs for CPU computing, independently of the amount of data associated with the problem instance. Among computing intensive applications, there are applications based on simulations, aiming to maximize system resources for processing large computations for simulation. In this research work, we consider an application that simulates scheduling and resource allocation in a Grid computing system using Genetic Algorithms. In such application, a rather large number of simulations is needed to extract meaningful statistical results about the behavior of the simulation results. We study the performance of Oracle Grid Engine for such application running in a Cluster of high computing capacities. Several scenarios were generated to measure the response time and queuing time under different workloads and number of nodes in the cluster.

MDS를 위한 클러스터 시스템의 PBS 기반 정보 제공자 개발 (Development of an PBS-based Information Provider of Cluster System for MDS)

  • 조광문
    • 한국콘텐츠학회논문지
    • /
    • 제5권2호
    • /
    • pp.207-211
    • /
    • 2005
  • 최근 그리드 환경 같은 고성능 컴퓨팅 분야에서 널리 보급되고 있는 컴퓨팅 환경은 클러스터 시스템이다. 그러나 그리드 환경에서 클러스터 시스템의 주요한 정보들이 다른 시스템들에게 적절하게 제공되지 못하고 있다. 본 논문에서는 이러한 정보들을 제공하는 체계를 제안하였다. 그리드 환경에서 클러스터 시스템의 정보를 제공한다. 이러한 배경하에서 그리드 정보 서비스의 활용 가능성이 증대될 수 있다.

  • PDF

A Token Based Protocol for Mutual Exclusion in Mobile Ad Hoc Networks

  • Sharma, Bharti;Bhatia, Ravinder Singh;Singh, Awadhesh Kumar
    • Journal of Information Processing Systems
    • /
    • 제10권1호
    • /
    • pp.36-54
    • /
    • 2014
  • Resource sharing is a major advantage of distributed computing. However, a distributed computing system may have some physical or virtual resource that may be accessible by a single process at a time. The mutual exclusion issue is to ensure that no more than one process at a time is allowed to access some shared resource. The article proposes a token-based mutual exclusion algorithm for the clustered mobile ad hoc networks (MANETs). The mechanism that is adapted to handle token passing at the inter-cluster level is different from that at the intra-cluster level. It makes our algorithm message efficient and thus suitable for MANETs. In the interest of efficiency, we implemented a centralized token passing scheme at the intra-cluster level. The centralized schemes are inherently failure prone. Thus, we have presented an intra-cluster token passing scheme that is able to tolerate a failure. In order to enhance reliability, we applied a distributed token circulation scheme at the inter-cluster level. More importantly, the message complexity of the proposed algorithm is independent of N, which is the total number of nodes in the system. Also, under a heavy load, it turns out to be inversely proportional to n, which is the (average) number of nodes per each cluster. We substantiated our claim with the correctness proof, complexity analysis, and simulation results. In the end, we present a simple approach to make our protocol fault tolerant.

그리드 환경하에서 고성능 컴퓨팅을 이용한 열유동 해석 기법에 관한 기초연구 (A Fundamental Study of Thermal-Fluid Flow Analysis using High Performance Computing under the GRID)

  • 홍승도;이대성;이재룡;하만영;이상산
    • 대한기계학회:학술대회논문집
    • /
    • 대한기계학회 2003년도 추계학술대회
    • /
    • pp.928-933
    • /
    • 2003
  • For simulation of three-dimensional turbulent flow with LES and DNS takes much time and expense with current available computing resources. It is nearly impossible to simulate turbulent flow with high Reynolds number. So, the emerging alternative is the Grid computing for needed computation power and working environment. In this study, the CFD code was parallelized to adapt it for the parallel computing under the Grid environment. In the first place, the Grid environment was built to connect the PC-Cluster facilities belong to the different institutions using communication network system. And CFD applications were calculated to check the performance of the parallel code developed for the Grid environment. Although it is a fundamental study, it brings about a important meaning as first step in research of the Grid.

  • PDF

Application of Urban Computing to Explore Living Environment Characteristics in Seoul : Integration of S-Dot Sensor and Urban Data

  • Daehwan Kim;Woomin Nam;Keon Chul Park
    • 인터넷정보학회논문지
    • /
    • 제24권4호
    • /
    • pp.65-76
    • /
    • 2023
  • This paper identifies the aspects of living environment elements (PM2.5, PM10, Noise) throughout Seoul and the urban characteristics that affect them by utilizing the big data of the S-Dot sensors in Seoul, which has recently become a hot topic. In other words, it proposes a big data based urban computing research methodology and research direction to confirm the relationship between urban characteristics and living environments that directly affect citizens. The temporal range is from 2020 to 2021, which is the available range of time series data for S-Dot sensors, and the spatial range is throughout Seoul by 500mX500m GRID. First of all, as part of analyzing specific living environment patterns, simple trends through EDA are identified, and cluster analysis is conducted based on the trends. After that, in order to derive specific urban planning factors of each cluster, basic statistical analysis such as ANOVA, OLS and MNL analysis were conducted to confirm more specific characteristics. As a result of this study, cluster patterns of environment elements(PM2.5, PM10, Noise) and urban factors that affect them are identified, and there are areas with relatively high or low long-term living environment values compared to other regions. The results of this study are believed to be a reference for urban planning management measures for vulnerable areas of living environment, and it is expected to be an exploratory study that can provide directions to urban computing field, especially related to environmental data in the future.

가상화 클러스터 환경에서 빅 데이터 분산 처리 성능에 하이퍼바이저가 미치는 영향 (Effects of Hypervisor on Distributed Big Data Processing in Virtualizated Cluster Environment)

  • 정혜진;나연묵
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제22권2호
    • /
    • pp.89-94
    • /
    • 2016
  • 최근 클라우드 컴퓨팅 시장의 클러스터 환경이 일반 클러스터 환경에서 가상화 클러스터 환경으로 변화하고 있다. 이러한 클러스터 환경의 변화는 대용량 분산처리 성능에 영향을 끼치고 있으며, 국내외의 많은 IT관련 기업에서 경쟁적으로 연구와 서비스에 집중 투자하고 있다. 본 논문에서는 대용량 데이터 분산 처리 성능에 하이퍼바이저가 미치는 영향을 비교하기 위한 목적으로 하이퍼바이저를 사용하는 Xen과 컨테이너 기반의 Docker를 사용하여 가상 클러스터 환경을 만들고, MapReduce의 성능을 측정하는 실험을 하였다. 이 결과 하이퍼바이저를 사용하지 않은 Docker 성능이 약 1.44배 - 2.92배 더 좋은 것을 검증하였다.