• Title/Summary/Keyword: a distributed storage

Search Result 611, Processing Time 0.027 seconds

Study of Optimization through Performance Analysis of Parallel Distributed Filesystem (병렬 분산파일시스템의 성능 분석을 통한 최적화 연구)

  • Yoon, JunWeon;Song, Ui-Sung
    • Journal of Digital Contents Society
    • /
    • v.17 no.5
    • /
    • pp.409-416
    • /
    • 2016
  • Recently, Big Data issue has become a buzzword and universities, industries and research institutes have been efforts to collect, analyze various data enabled. These things includes accumulated data from the past, even if it is not possible to analysis at this present immediately a which has the potential means. And we are obtained a valuable result from the collected a large amount of data via the semantic analysis. The demand for high-performance storage system that can handle large amounts of data required is increasing around the world. In addition, it must provide a distributed parallel file system that stability to multiple users too perform a variety of analyzes at the same time by connecting a large amount of the accumulated data In this study, we identify the I/O bandwidth of the storage system to be considered, and performance of the metadata in order to provide a file system in stability and propose a method for configuring the optimal environment.

Distributed In-Memory Caching Method for ML Workload in Kubernetes (쿠버네티스에서 ML 워크로드를 위한 분산 인-메모리 캐싱 방법)

  • Dong-Hyeon Youn;Seokil Song
    • Journal of Platform Technology
    • /
    • v.11 no.4
    • /
    • pp.71-79
    • /
    • 2023
  • In this paper, we analyze the characteristics of machine learning workloads and, based on them, propose a distributed in-memory caching technique to improve the performance of machine learning workloads. The core of machine learning workload is model training, and model training is a computationally intensive task. Performing machine learning workloads in a Kubernetes-based cloud environment in which the computing framework and storage are separated can effectively allocate resources, but delays can occur because IO must be performed through network communication. In this paper, we propose a distributed in-memory caching technique to improve the performance of machine learning workloads performed in such an environment. In particular, we propose a new method of precaching data required for machine learning workloads into the distributed in-memory cache by considering Kubflow pipelines, a Kubernetes-based machine learning pipeline management tool.

  • PDF

EPICS Based Vacuum Chamber Temperature Control System for PAL Storage Ring (가속기 저장링 진공 챔버 온도측정용 제어시스템 개발 및 EPICS 적용)

  • Yoon, J.C.;Choi, J.Y.;Kang, H.S.
    • Proceedings of the KIEE Conference
    • /
    • 2005.07d
    • /
    • pp.2652-2654
    • /
    • 2005
  • A vacuum chamber temperature control system of Pohang Accelerator Laboratory (PAL) storage ring is a subsystem upgraded PAL control system, which is based upon Experimental Physics and Industrial Control System (EPICS) [1]. There are two control components, data acquisition system (SA120 data logger), development control system IOC (Input/Output Controller) at the storage ring of PAL. There are 240 vacuum chamber at the storage ring. It was a very important problem to solve how to monitor such a large number of vacuum chamber temperature distributed around the ring. The IOC connect MODBUS/JBUS field network to asynchronous serial ports for communication with serial device. It can simultaneously control up to 4 data acquisition systems. Upon receiving a command from a IOC running under Windows2k through the network, the IOC communicate through the slave serial interface ports to SA120. We added some software components on the top of EPICS toolkit. The design of the vacuum control system is discussed. This paper describes the development vacuum chamber temperature control system and how the design of this system.

  • PDF

Securing Sensitive Data in Cloud Storage (클라우드 스토리지에서의 중요데이터 보호)

  • Lee, Shir-Ly;Lee, Hoon-Jae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.871-874
    • /
    • 2011
  • The fast emerging of network technology and the high demand of computing resources have prompted many organizations to outsource their storage and computing needs. Cloud based storage services such as Microsoft's Azure and Amazon's S3 allow customers to store and retrieve any amount of data, at anytime from anywhere via internet. The scalable and dynamic of the cloud storage services help their customer to reduce IT administration and maintenance costs. No doubt, cloud based storage services brought a lot of benefits to its customer by significantly reducing cost through optimization increased operating and economic efficiencies. However without appropriate security and privacy solution in place, it could become major issues to the organization. As data get produced, transferred and stored at off premise and multi tenant cloud based storage, it becomes vulnerable to unauthorized disclosure and unauthorized modification. An attacker able to change or modify data while data inflight or when data is stored on disk, so it is very important to secure data during its entire life-cycle. The traditional cryptography primitives for the purpose of data security protection cannot be directly adopted due to user's lose control of data under off premises cloud server. Secondly cloud based storage is not just a third party data warehouse, the data stored in cloud are frequently update by the users and lastly cloud computing is running in a simultaneous, cooperated and distributed manner. In our proposed mechanism we protect the integrity, authentication and confidentiality of cloud based data with the encrypt- then-upload concept. We modified and applied proxy re-encryption protocol in our proposed scheme. The whole process does not reveal the clear data to any third party including the cloud provider at any stage, this helps to make sure only the authorized user who own corresponding token able to access the data as well as preventing data from being shared without any permission from data owner. Besides, preventing the cloud storage providers from unauthorized access and making illegal authorization to access the data, our scheme also protect the data integrity by using hash function.

An Efficient Implementation of Mobile Raspberry Pi Hadoop Clusters for Robust and Augmented Computing Performance

  • Srinivasan, Kathiravan;Chang, Chuan-Yu;Huang, Chao-Hsi;Chang, Min-Hao;Sharma, Anant;Ankur, Avinash
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.989-1009
    • /
    • 2018
  • Rapid advances in science and technology with exponential development of smart mobile devices, workstations, supercomputers, smart gadgets and network servers has been witnessed over the past few years. The sudden increase in the Internet population and manifold growth in internet speeds has occasioned the generation of an enormous amount of data, now termed 'big data'. Given this scenario, storage of data on local servers or a personal computer is an issue, which can be resolved by utilizing cloud computing. At present, there are several cloud computing service providers available to resolve the big data issues. This paper establishes a framework that builds Hadoop clusters on the new single-board computer (SBC) Mobile Raspberry Pi. Moreover, these clusters offer facilities for storage as well as computing. Besides the fact that the regular data centers require large amounts of energy for operation, they also need cooling equipment and occupy prime real estate. However, this energy consumption scenario and the physical space constraints can be solved by employing a Mobile Raspberry Pi with Hadoop clusters that provides a cost-effective, low-power, high-speed solution along with micro-data center support for big data. Hadoop provides the required modules for the distributed processing of big data by deploying map-reduce programming approaches. In this work, the performance of SBC clusters and a single computer were compared. It can be observed from the experimental data that the SBC clusters exemplify superior performance to a single computer, by around 20%. Furthermore, the cluster processing speed for large volumes of data can be enhanced by escalating the number of SBC nodes. Data storage is accomplished by using a Hadoop Distributed File System (HDFS), which offers more flexibility and greater scalability than a single computer system.

Dynamic Cluster Management of Hadoop Distributed Filesystem (하둡 분산 파일시스템의 동적 클러스터 관리 기법)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.435-437
    • /
    • 2016
  • Hadoop Distributed File System(HDFS) is a file system for distributed processing of big data by replicating data to distributed data nodes. HDFS cluster shows a great scalability up to thousands of nodes, but it assumes a exclusive node cluster with numerous nodes for the big data processing. Various operational-purpose worker systems used by office are hardly considered as a part of cluster. This paper discusses this problem and proposes a dynamic cluster management technique to increase storage capability and analytic performance of hadoop cluster. The propsed technique can add legacy systems to the cluster and can remove them from the cluster dynamically depending on their availability.

  • PDF

A Group based Privacy-preserving Data Perturbation Technique in Distributed OSN (분산 OSN 환경에서 프라이버시 보호를 위한 그룹 기반의 데이터 퍼튜베이션 기법)

  • Lee, Joohyoung;Park, Seog
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.12
    • /
    • pp.675-680
    • /
    • 2016
  • The development of various mobile devices and mobile platform technology has led to a steady increase in the number of online social network (OSN) users. OSN users are free to communicate and share information through activities such as social networking, but this causes a new, user privacy issue. Various distributed OSN architectures are introduced to address the user privacy concern, however, users do not obtain technically perfect control over their data. In this study, the control rights of OSN user are maintained by using personal data storage (PDS). We propose a technique to improve data privacy protection that involves making a group with the user's friend by generating and providing fake text data based on user's real text data. Fake text data is generated based on the user's word sensitivity value, so that the user's friends can receive the user's differential data. As a result, we propose a system architecture that solves possible problems in the tradeoff between service utility and user privacy in OSN.

A Data Centric Storage based on Adaptive Local Trajectory for Sensor Networks (센서네트워크를 위한 적응적 지역 트라젝토리 기반의 데이터 저장소 기법)

  • Lim, Hwa-Jung;Lee, Joa-Hyoung;Yang, Dong-Il;Tscha, Yeong-Hwan;Lee, Heon-Guil
    • The KIPS Transactions:PartC
    • /
    • v.15C no.1
    • /
    • pp.19-30
    • /
    • 2008
  • Sensor nodes are used as a storage space in the data centric storage method for sensor networks. Sensor nodes save the data to the node which is computed by hash table and users also access to the node to get the data by using hash table. One of the problems which the data centric storage method has is that queries from many users who are interested in the popular data could be concentrated to one node. In this case, responses for queries could be delayed and the energy of heavy loaded node could be dissipated fast. This would lead to reduction of network life time. In this paper, ALT, Data Centric Storage based on Adaptive Local Trajectory, is proposed as scalable data centric storage method for sensor network. ALT constructs trajectory around the storage node. The scope of trajectory is increased or decreased based on the query frequency. ALT distributes the query processing loads to several nodes so that delay of response is reduced and energy dissipation is also distributed.

Non-Disruptive Server Management for Sustainable Resource Service Based on On-Premise (온-프레미스 기반 지속적인 자원 서비스를 위한 서버 무중단 기법)

  • Kim, Hyun-Woo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.12
    • /
    • pp.295-300
    • /
    • 2018
  • The rapid development of IT, many conventional passive jobs have been automated. This automation increases the leisure time of many people and various services are being developed for them. In addition, with the advent of smart devices that are compact and portable, it is possible to use various internet services without any time and place discretion. Various studies based on virtualization are under way to efficiently store and process large data generated by many devices and services. Desktop Storage Virtualization (DSV), which integrates and provides users with on-premise-based distributed desktop resources during these studies, uses virtualization to consolidate unused resources within distributed, legacy desktops. This DSV is very important for providing high reliability to users. In addition, research on hierarchical structure and resource integration for efficient data distribution storage processing in a distributed desktop-based resource integration environment is underway. However, there is a lack of research on efficient operation in case of server failure in on-premise resource integration environment. In this paper, we propose Non-disruptive Server Management (NSM) which can actively cope with the failure of desktop server in distributed desktop storage environment based on on-premise. NSM is easy to add and remove desktops in a desktop-based integrated environment. In addition, an alternative server is actively performed in response to a failure occurrence.

Design and Implementation of Multiple Filter Distributed Deduplication System Applying Cuckoo Filter Similarity (쿠쿠 필터 유사도를 적용한 다중 필터 분산 중복 제거 시스템 설계 및 구현)

  • Kim, Yeong-A;Kim, Gea-Hee;Kim, Hyun-Ju;Kim, Chang-Geun
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.10
    • /
    • pp.1-8
    • /
    • 2020
  • The need for storage, management, and retrieval techniques for alternative data has emerged as technologies based on data generated from business activities conducted by enterprises have emerged as the key to business success in recent years. Existing big data platform systems must load a large amount of data generated in real time without delay to process unstructured data, which is an alternative data, and efficiently manage storage space by utilizing a deduplication system of different storages when redundant data occurs. In this paper, we propose a multi-layer distributed data deduplication process system using the similarity of the Cuckoo hashing filter technique considering the characteristics of big data. Similarity between virtual machines is applied as Cuckoo hash, individual storage nodes can improve performance with deduplication efficiency, and multi-layer Cuckoo filter is applied to reduce processing time. Experimental results show that the proposed method shortens the processing time by 8.9% and increases the deduplication rate by 10.3%.