• Title/Summary/Keyword: cluster file system

Search Result 91, Processing Time 0.025 seconds

Design and Implementation of Inline Data Deduplication in Cluster File System (클러스터 파일 시스템에서 인라인 데이터 중복제거 설계 및 구현)

  • Kim, Youngchul;Kim, Cheiyol;Lee, Sangmin;Kim, Youngkyun
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.8
    • /
    • pp.369-374
    • /
    • 2016
  • The growing demand of virtual computing and storage resources in the cloud computing environment has led to deduplication of storage system for effective reduction and utilization of storage space. In particular, large reduction in the storage space is made possible by preventing data with identical content as the virtual desktop images from being stored on the virtual desktop infrastructure. However, in order to provide reliable support of virtual desktop services, the storage system must address a variety of workloads by virtual desktop, such as performance overhead due to deduplication, periodic data I/O storms and frequent random I/O operations. In this paper, we designed and implemented a clustered file system to support virtual desktop and storage services in cloud computing environment. The proposed clustered file system provides low storage consumption by means of inline deduplication on virtual desktop images. In addition, it reduces performance overhead by deduplication process in the data server and not the virtual host on which virtual desktops are running.

Sim-Hadoop : Leveraging Hadoop Distributed File System and Parallel I/O for Reliable and Efficient N-body Simulations (Sim-Hadoop : 신뢰성 있고 효율적인 N-body 시뮬레이션을 위한 Hadoop 분산 파일 시스템과 병렬 I / O)

  • Awan, Ammar Ahmad;Lee, Sungyoung;Chung, Tae Choong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.476-477
    • /
    • 2013
  • Gadget-2 is a scientific simulation code has been used for many different types of simulations like, Colliding Galaxies, Cluster Formation and the popular Millennium Simulation. The code is parallelized with Message Passing Interface (MPI) and is written in C language. There is also a Java adaptation of the original code written using MPJ Express called Java Gadget. Java Gadget writes a lot of checkpoint data which may or may not use the HDF-5 file format. Since, HDF-5 is MPI-IO compliant, we can use our MPJ-IO library to perform parallel reading and writing of the checkpoint files and improve I/O performance. Additionally, to add reliability to the code execution, we propose the usage of Hadoop Distributed File System (HDFS) for writing the intermediate (checkpoint files) and final data (output files). The current code writes and reads the input, output and checkpoint files sequentially which can easily become bottleneck for large scale simulations. In this paper, we propose Sim-Hadoop, a framework to leverage HDFS and MPJ-IO for improving the I/O performance of Java Gadget code.

A Non-Shared Metadata Management Scheme for Large Distributed File Systems (대용량 분산파일시스템을 위한 비공유 메타데이타 관리 기법)

  • Yun, Jong-Byeon;Park, Yang-Bun;Lee, Seok-Jae;Jang, Su-Min;Yoo, Jae-Soo;Kim, Hong-Yeon;Kim, Young-Kyun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.4
    • /
    • pp.259-273
    • /
    • 2009
  • Most of large-scale distributed file systems decouple a metadata operation from read and write operations for a file. In the distributed file systems, a certain server named a metadata server (MDS) maintains metadata information in file system such as access information for a file, the position of a file in the repository, the namespace of the file system, and so on. But, the existing systems used restrictive metadata management schemes, because most of the distributed file systems designed to focus on the distributed management and the input/output performance of data rather than the metadata. Therefore, in the existing systems, the metadata throughput and expandability of the metadata server are limited. In this paper, we propose a new non-shared metadata management scheme in order to provide the high metadata throughput and scalability for a cluster of MDSs. First, we derive a dictionary partitioning scheme as a new metadata distribution technique. Then, we present a load balancing technique based on the distribution technique. It is shown through various experiments that our scheme outperforms existing metadata management schemes in terms of scalability and load balancing.

BeanFS: A Distributed File System for Large-scale E-mail Services (BeanFS: 대규모 이메일 서비스를 위한 분산 파일 시스템)

  • Jung, Wook;Lee, Dae-Woo;Park, Eun-Ji;Lee, Young-Jae;Kim, Sang-Hoon;Kim, Jin-Soo;Kim, Tae-Woong;Jun, Sung-Won
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.4
    • /
    • pp.247-258
    • /
    • 2009
  • Distributed file systems running on a cluster of inexpensive commodity hardware are being recognized as an effective solution to support the explosive growth of storage demand in large-scale Internet service companies. This paper presents the design and implementation of BeanFS, a distributed file system for large-scale e-mail services. BeanFS is adapted to e-mail services as follows. First, the volume-based replication scheme alleviates the metadata management overhead of the central metadata server in dealing with a very large number of small files. Second, BeanFS employs a light-weighted consistency maintenance protocol tailored to simple access patterns of e-mail message. Third, transient and permanent failures are treated separately and recovering from transient failures is done quickly and has less overhead.

A Content-Aware toad Balancing Technique Based on Histogram Transformation in a Cluster Web Server (클러스터 웹 서버 상에서 히스토그램 변환을 이용한 내용 기반 부하 분산 기법)

  • Hong Gi Ho;Kwon Chun Ja;Choi Hwang Kyu
    • Journal of Internet Computing and Services
    • /
    • v.6 no.2
    • /
    • pp.69-84
    • /
    • 2005
  • As the Internet users are increasing rapidly, a cluster web server system is attracted by many researchers and Internet service providers. The cluster web server has been developed to efficiently support a larger number of users as well as to provide high scalable and available system. In order to provide the high performance in the cluster web server, efficient load distribution is important, and recently many content-aware request distribution techniques have been proposed. In this paper, we propose a new content-aware load balancing technique that can evenly distribute the workload to each node in the cluster web server. The proposed technique is based on the hash histogram transformation, in which each URL entry of the web log file is hashed, and the access frequency and file size are accumulated as a histogram. Each user request is assigned into a node by mapping of (hashed value-server node) in the histogram transformation. In the proposed technique, the histogram is updated periodically and then the even distribution of user requests can be maintained continuously. In addition to the load balancing, our technique can exploit the cache effect to improve the performance. The simulation results show that the performance of our technique is quite better than that of the traditional round-robin method and we can improve the performance more than $10\%$ compared with the existing workload-aware load balancing(WARD) method.

  • PDF

Performance Improvement of Cluster File System Regarding Usage Frequency of File Resources (파일 자원 사용 빈도를 고려한 클러스터 파일시스템의 성능 향상)

  • Choi, Chang-Yeol;Chung, Ji-Yung;Kim, Sung-Soo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2000.10b
    • /
    • pp.949-952
    • /
    • 2000
  • 공유 저장장치에 접근할 수 있는 경로를 다중화함으로써 클러스터 파일시스템 내 한 노드에서 발생하는 결함을 허용할 수 있다. 공유 저장장치의 다중접근은 파일 이동과 파일 중복으로 이루어진다. 본 논문에서는 전역 파일 시스템의 평균 응답 시간을 줄이기 위해 파일 자원의 접근률에 따라 우선 순위를 두어 파일이동과 과일 중복 시 발생할 수 있는 클러스터 파일 시스템의 성능 저하 및 서비스 중단을 막기 위한 방법을 제안한다. 파일 자원의 접근률은 파일 처리 시에 사용되는 블록, 아이노드(mode), 슈퍼블록과 같은 자원 사용 빈도를 통해 얻어질 수 있다.

  • PDF

Design and Implementation of The Communication Module for a Linux Cluster File System (리눅스 클러스터 파일 시스템을 위한 통신모듈의 설계 및 구현)

  • 박의수;유찬곤;손호신;최현호;김형식;유관종
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04a
    • /
    • pp.649-651
    • /
    • 2002
  • 클러스터 파일 시스템은 기존의 클러스터링 기술을 파일 시스템에 적용하여, 각 노드 단위로 파일 시스템을 구성할 때 발생되는 저장 공간과 대역폭의 제약문제를 극복하기 위한 방법이다. 클러스터 파일 시스템은 하나의 원본 파일을 여러 노드에 나누어 저장하므로, 효율적인 노드간 데이터 통신을 필요로 하며, 노드 내부에서도 클러스터 파일 시스템과 어플리케이션과의 전용 데이터 교환 메커니즘을 지원해야 한다. 본 논문에서는 클러스터 파일 시스템이 안정적이고 효율적인 방법으로 멀티미디어 데이터를 분산 저장하기 위하여 필요한 통신 모듈을 설계 및 구현한다.

  • PDF

Design and Implementation of Consistency Check Program for Storage Area Network Based Cluster File System (SAN기반 클러스터 파일 시스템의 일관성 검증을 위한 검사 프로그램의 설계 및 구현)

  • 백주현;박성용;이장선;오상규
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04a
    • /
    • pp.70-72
    • /
    • 2002
  • 본 논문에서는 기존 파일 시스템의 일관성 검사 프로그램 및 클러스터 파일 시스템의 특징 분석을 통해 클러스터 파일 시스템의 일관성을 검사하는데 필요한 프로그램 설계 요소 및 구현 방법에 관해 논한다. 나아가 이러한 설계 방법에 따라 구현된 프로그램을 실제 상용화된 클러스터 파일 시스템에 적용함으로써 검사 프로그램의 기능을 검증한다 이를 위해 뮤테이션 테스트(Mutation Test) 기법을 도입하여 일관성 검사 프로그램의 신뢰성 검증 방법을 고찰하고 그에 따른 테스트 결과를 살펴본다.

  • PDF

Failure Detection in the Linux Cluster File System $SANique^{TM}$ (리눅스 클러스터 화일 시스템 $SANique^{TM}$의 오류 탐지 기법)

  • 임화정;이규웅;이장선;오상규
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.37-39
    • /
    • 2002
  • 본 논문은 SAN(storage area network)강에 네트워크-부착형 (network-attached) 저장 장치들을 직접 연결하여 화일 서버 없이 직접 데이터 전송이 가능한 SAN 기반의 리눅스 클러스터 공유 화일 시스템인 SANique$^{TM}$의 오류 탐지 기법 및 회복 기법에 대하여 기술한다. 클러스터 내의 노드 오류에 의해 발생하는 "split-brain"오류 상황 및 문제점을 공유 화일 시스템 환경 하에서 성의하고, 이 문제를 해결할 수 있는 오류 탐지기법을 제시한다.

  • PDF

Parallel Processing of k-Means Clustering Algorithm for Unsupervised Classification of Large Satellite Images: A Hybrid Method Using Multicores and a PC-Cluster (대용량 위성영상의 무감독 분류를 위한 k-Means Clustering 알고리즘의 병렬처리: 다중코어와 PC-Cluster를 이용한 Hybrid 방식)

  • Han, Soohee;Song, Jeong Heon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.445-452
    • /
    • 2019
  • In this study, parallel processing codes of k-means clustering algorithm were developed and implemented in a PC-cluster for unsupervised classification of large satellite images. We implemented intra-node code using multicores of CPU (Central Processing Unit) based on OpenMP (Open Multi-Processing), inter-nodes code using a PC-cluster based on message passing interface, and hybrid code using both. The PC-cluster consists of one master node and eight slave nodes, and each node is equipped with eight multicores. Two operating systems, Microsoft Windows and Canonical Ubuntu, were installed in the PC-cluster in turn and tested to compare parallel processing performance. Two multispectral satellite images were tested, which are a medium-capacity LANDSAT 8 OLI (Operational Land Imager) image and a high-capacity Sentinel 2A image. To evaluate the performance of parallel processing, speedup and efficiency were measured. Overall, the speedup was over N / 2 and the efficiency was over 0.5. From the comparison of the two operating systems, the Ubuntu system showed two to three times faster performance. To confirm that the results of the sequential and parallel processing coincide with the other, the center value of each band and the number of classified pixels were compared, and result images were examined by pixel to pixel comparison. It was found that care should be taken to avoid false sharing of OpenMP in intra-node implementation. To process large satellite images in a PC-cluster, code and hardware should be designed to reduce performance degradation caused by file I / O. Also, it was found that performance can differ depending on the operating system installed in a PC-cluster.