• Title/Summary/Keyword: 대용량 스토리지

Search Result 90, Processing Time 0.024 seconds

Design of System for Avoiding upload of Identical-file using SA Hash Algorithm (SA 해쉬 알고리즘을 이용한 중복파일 업로드 방지 시스템 설계)

  • Hwang, Sung-Min;Kim, Seog-Gyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.81-89
    • /
    • 2014
  • In this paper, we propose SA hash algorithm to avoid upload identical files and design server system using proposed SA hash algorithm. Client to want to upload file examines the value of SA hash and if the same file is found in server system client use the existing file without upload. SA hash algorithm which is able to examine the identical-file divides original file into blocks of n bits. Original file's mod i bit and output hash value's i bit is calculated with XOR operation. It is SA hash algorithm's main routine to repeat the calculation with XOR until the end of original file. Using SA hash algorithm which is more efficient than MD5, SHA-1 and SHA-2, we can design server system to avoid uploading identical file and save storage capacity and upload-time of server system.

A Design and Implementation for a Reliable Data Storage in a Digital Tachograph (디지털 자동차운행기록계에서 안정적인 데이터 저장을 위한 설계 및 구현)

  • Baek, Sung Hoon;Son, Myunghee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.1 no.2
    • /
    • pp.71-78
    • /
    • 2012
  • The digital tachograph is a device that automatically records speed and distance of a vehicle, together with the driver's activity and vehicle status at an accident. It records vehicle speed, break status, acceleration, engine RPM, longitude and latitude of GPS, accumulated distance, and so on. European Commission regulation made digital tachographs mandatory for all trucks from 2005. Republic of Korea made digital tachographs mandatory for all new business vehicles from 2011 and is widening the range of vehicles that must install digital tachographs year by year. This device is used to analyze driver's daily driving information and car accidents. Under a car accident that makes the device reliability unpredictable, it is very important to store driving information with maximum reliability for its original mission. We designed and implemented a practical digital tachograph. This paper presents a storage scheme that consists of a first storage device with small capacity at a high reliability and a second storage device with large capacity at a low cost in order to reliably records data with a hardware at a low cost. The first storage device records data in a SLC NAND flash memory in a log-structured style. We present a reverse partial scan that overcomes the slow scan time of log-structured storages at the boot stage. The scheme reduced the scan time of the first storage device by 1/50. In addition, our design includes a scheme that fast stores data at a moment of accident by 1/20 of data transfer time of a normal method.

An Efficient Method for Estimating Optimal Path of Secondary Variable Calculation on CFD Applications (전산유체역학 응용에서의 효율적인 최적 2차 변수 계산 경로 추정 기법)

  • Lee, Joong-Youn;Kim, Min Ah;Hur, Youngju
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.12
    • /
    • pp.1-9
    • /
    • 2016
  • Computational Fluid Dynamics(CFD) is a branch of fluid mechanics that solves partial differential equations which represent fluid flows by a set of algebraic equations using computers. Even though it requires multifarious variables, only selected ones are stored because of the lack of storage capacity. It causes the requirement of secondary variable calculations at analyzing time. In this paper, we suggest an efficient method to estimate optimal calculation paths for secondary variables. First, we suggest a converting technique from a dependency graph to a ordinary directed graph. We also suggest a technique to find the shortest path from any initial variables to target variables. We applied our method to a tool for data analysis and visualization to evaluate the efficiency of the proposed method.

Implications for Memory Reference Analysis and System Design to Execute AI Workloads in Personal Mobile Environments (개인용 모바일 환경의 AI 워크로드 수행을 위한 메모리 참조 분석 및 시스템 설계 방안)

  • Seokmin Kwon;Hyokyung Bahn
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.31-36
    • /
    • 2024
  • Recently, mobile apps that utilize AI technologies are increasing. In the personal mobile environment, performance degradation may occur during the training phase of large AI workload due to limitations in memory capacity. In this paper, we extract memory reference traces of AI workloads and analyze their characteristics. From this analysis, we observe that AI workloads can cause frequent storage access due to weak temporal locality and irregular popularity bias during memory write operations, which can degrade the performance of mobile devices. Based on this observation, we discuss ways to efficiently manage memory write operations of AI workloads using persistent memory-based swap devices. Through simulation experiments, we show that the system architecture proposed in this paper can improve the I/O time of mobile systems by more than 80%.

Wear Leveling Technique using Random Selection Method in Flash Storage (플래시 스토리지에서 랜덤 선택 방법을 활용한 마모도 평준화 기법)

  • Jung Kyu Park;Eun Young Park
    • Journal of Internet of Things and Convergence
    • /
    • v.10 no.3
    • /
    • pp.13-18
    • /
    • 2024
  • Recently, reliability has become more important as flash-based storage devices are actively used in cloud servers and data centers. Flash memory chips have limitations in reading/writing, so if writing is concentrated in one location, the chip can no longer be used. To solve this problem and improve reliability, it is necessary to equalize the wear of flash memory chips. However, in order to equalize the wear of flash memory with increasing capacity, the workload increases proportionally. In particular, when searching for a block with the maximum/minimum number of deletions for all blocks of a flash memory chip, the cost increases depending on the capacity of the storage device. In this paper, a random selection method of blocks was applied to solve the previous problem. When k is the randomly selected block, actual experimental results confirmed that searching all blocks with an k value of 4 or more yields similar results.

Design of the Flexible Buffer Node Technique to Adjust the Insertion/Search Cost in Historical Index (과거 위치 색인에서 입력/검색 비용 조정을 위한 가변 버퍼 노드 기법 설계)

  • Jung, Young-Jin;Ahn, Bu-Young;Lee, Yang-Koo;Lee, Dong-Gyu;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.18D no.4
    • /
    • pp.225-236
    • /
    • 2011
  • Various applications of LBS (Location Based Services) are being developed to provide the customized service depending on user's location with progress of wireless communication technology and miniaturization of personalized device. To effectively process an amount of vehicles' location data, LBS requires the techniques such as vehicle observation, data communication, data insertion and search, and user query processing. In this paper, we propose the historical location index, GIP-FB (Group Insertion tree with Flexible Buffer Node) and the flexible buffer node technique to adjust the cost of data insertion and search. the designed GIP+ based index employs the buffer node and the projection storage to cut the cost of insertion and search. Besides, it adjusts the cost of insertion and search by changing the number of line segments of the buffer node with user defined time interval. In the experiment, the buffer node size influences the performance of GIP-FB by changing the number of non-leaf node of the index. the proposed flexible buffer node is used to adjust the performance of the historical location index depending on the applications of LBS.

Analysis of Factors for Korean Women's Cancer Screening through Hadoop-Based Public Medical Information Big Data Analysis (Hadoop기반의 공개의료정보 빅 데이터 분석을 통한 한국여성암 검진 요인분석 서비스)

  • Park, Min-hee;Cho, Young-bok;Kim, So Young;Park, Jong-bae;Park, Jong-hyock
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.10
    • /
    • pp.1277-1286
    • /
    • 2018
  • In this paper, we provide flexible scalability of computing resources in cloud environment and Apache Hadoop based cloud environment for analysis of public medical information big data. In fact, it includes the ability to quickly and flexibly extend storage, memory, and other resources in a situation where log data accumulates or grows over time. In addition, when real-time analysis of accumulated unstructured log data is required, the system adopts Hadoop-based analysis module to overcome the processing limit of existing analysis tools. Therefore, it provides a function to perform parallel distributed processing of a large amount of log data quickly and reliably. Perform frequency analysis and chi-square test for big data analysis. In addition, multivariate logistic regression analysis of significance level 0.05 and multivariate logistic regression analysis of meaningful variables (p<0.05) were performed. Multivariate logistic regression analysis was performed for each model 3.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

SSD-based RAID-6 System Architecture for Reliability and Performance Enhancement (신뢰성 향상과 성능개선을 위해 다양한 Erasure 코드를 적용한 SSD 기반 RAID-6 시스템 구조)

  • Song, Jae-Seok;Huh, Joon-Moo;Yang, Yu-Seok;Kim, Deok-Hwan
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.6
    • /
    • pp.47-56
    • /
    • 2010
  • HDD-based RAIDs have been used in high-capacity storage systems for traditional data server. However, their data reliability are relatively low and they consume lots of power since hard disk drive is weak on shock and its power consumption is high due to frequent spindle motor operation. Therefore, this paper presents new SSD based RAID system architecture using various erasure codes. The proposed methode applys Reed-Solomon, EVENODD, and Liberation coding schemes onto file system level and device driver level, respectively. Besides, it uses data allocation method to minimize the side effect of reducing the lifespan of SSD. Detail experimental results show that Liberation code increase wear-leveling rates of SSD based RAID-6 more than other codes. The SSD based RAID system applying erasure codes at the device driver level shows better performance than that at the file system level. I/O performance of RAID-6 system using SSD is 4.5%~8.5% higher than that of using HDD and the power consumption of the RAID system using SSD is 18%~40% less than that of using HDD.

Design and Implementation of an Execution-Provenance Based Simulation Data Management Framework for Computational Science Engineering Simulation Platform (계산과학공학 플랫폼을 위한 실행-이력 기반의 시뮬레이션 데이터 관리 프레임워크 설계 및 구현)

  • Ma, Jin;Lee, Sik;Cho, Kum-won;Suh, Young-kyoon
    • Journal of Internet Computing and Services
    • /
    • v.19 no.1
    • /
    • pp.77-86
    • /
    • 2018
  • For the past few years, KISTI has been servicing an online simulation execution platform, called EDISON, allowing users to conduct simulations on various scientific applications supplied by diverse computational science and engineering disciplines. Typically, these simulations accompany large-scale computation and accordingly produce a huge volume of output data. One critical issue arising when conducting those simulations on an online platform stems from the fact that a number of users simultaneously submit to the platform their simulation requests (or jobs) with the same (or almost unchanging) input parameters or files, resulting in charging a significant burden on the platform. In other words, the same computing jobs lead to duplicate consumption computing and storage resources at an undesirably fast pace. To overcome excessive resource usage by such identical simulation requests, in this paper we introduce a novel framework, called IceSheet, to efficiently manage simulation data based on execution metadata, that is, provenance. The IceSheet framework captures and stores each provenance associated with a conducted simulation. The collected provenance records are utilized for not only inspecting duplicate simulation requests but also performing search on existing simulation results via an open-source search engine, ElasticSearch. In particular, this paper elaborates on the core components in the IceSheet framework to support the search and reuse on the stored simulation results. We implemented as prototype the proposed framework using the engine in conjunction with the online simulation execution platform. Our evaluation of the framework was performed on the real simulation execution-provenance records collected on the platform. Once the prototyped IceSheet framework fully functions with the platform, users can quickly search for past parameter values entered into desired simulation software and receive existing results on the same input parameter values on the software if any. Therefore, we expect that the proposed framework contributes to eliminating duplicate resource consumption and significantly reducing execution time on the same requests as previously-executed simulations.