• Title/Summary/Keyword: SmallFile

Search Result 235, Processing Time 0.03 seconds

Processing Method of Mass Small File Using Hadoop Platform (하둡 플랫폼을 이용한 대량의 스몰파일 처리방법)

  • Kim, Chang-Bok;Chung, Jae-Pil
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.4
    • /
    • pp.401-408
    • /
    • 2014
  • Hadoop is composed with MapReduce programming model for distributed processing and HDFS distributed file system. Hadoop is suitable framework for big data processing, but processing of mass small files have many problems. The processing of mass small file in hadoop have problems to created one mapper per one file, and it have problems to needed many memory for store of meta information of file. This paper have comparison evaluation processing method of mass small file with various method in hadoop platform. The processing of general compression format is inadequate because of processing by one mapper regardless of data size. The processing of sequence and hadoop archive file is removed memory problem of namenode by compress and combine of small file. Hadoop archive file is faster then sequence file about combine time of small file. The processing using CombineFileInputFormat class is needed not combine of small file, and it have similar speed big data processing method.

A File Merging Scheme for Efficient Handling of Small Files in Hadoop Distributed File System (Hadoop Distribute file system에서 Small file을 효과적으로 처리하기 위한 파일 병합 기법 연구)

  • Park, Jong-Chang;Youn, Hee-Yong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.15-17
    • /
    • 2013
  • HDFS(Hadoop Distribute File System)는 대용량 파일 처리를 목적으로 설계 되었으며 현재 이상적인 분산 파일 시스템으로 각광 받고 있다. 이러한 HDFS는 기존 분산파일 시스템과 많은 유사성을 가지고 있으나, Fault Tolerance를 제공하고, 데이터 엑세스 패턴을 스트리밍 방식으로 지원하여 대용량 파일을 효율적으로 저장할 수 있다는 차별성을 가지고 있다. 하지만 실제 HDFS 데이터 집합에는 Small file이 차지하는 비중이 상당히 높으며, 이러한 다수의 Small file 은 데이터 처리에 있어 높은 비용을 초래할 뿐 아니라 Master Node 의 파일 처리 및 메모리 성능에 악영향을 미친다. 따라서 본 논문에서는 HDFS에서 Small file 이 미치는 영향을 분석하고 이러한 문제점을 해결 할 수 있는 로컬 인덱스 파일기반의 파일 병합 기법을 제안한다.

Co-Writing Multiple Files Based on Directory Locality for High Performance of Small File Writes (디렉토리 지역성을 활용한 작은 파일들의 모아 쓰기 기법)

  • Lee, Kyung-Jae;Ahn, Woo-Hyun;Oh, Jae-Won
    • The KIPS Transactions:PartA
    • /
    • v.15A no.5
    • /
    • pp.275-286
    • /
    • 2008
  • Fast File System(FFS) utilizes large disk bandwidth to improve the write performance of large files. One way to improve the performance is to write multiple blocks of a large file at a single disk I/O through the disk bandwidth. However, rather than disk bandwidth, the performance of small file writes is limited by disk access times significantly impacted by disk movements such as disk seek and rotation because FFS writes each of small files at a single disk write. We propose CW-FFS (Co-Writing Fast File System) to improve the write performance of small files by minimizing the disk movements that are needed to write small files to disks. Its key technique called co-writing scheme is to dynamically collect multiple small files named by a given directory and then write them at a single disk I/O to contiguous disk locations. Co-writing several small files at a single disk I/O reduces multiple disk movements that are needed for small file writes to one single disk movement, thus increasing the overall write performance of write-intensive applications. Furthermore, a file allocation scheme is introduced to prevent co-writing scheme from having a negative impact on disk spatial locality of small files named by a given directory. The measurement of our technique implemented in the OpenBSD 4.0 shows that CW-FFS increases the performance of small file writes over FFS in the range from 5 to 35% in the Postmark benchmark.

A Chinese Restaurant Game for Distributed Cooperative Caching in Small Cell Networks

  • Chen, Junliang;Wang, Gang;Wang, Fuxiang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.1
    • /
    • pp.222-236
    • /
    • 2019
  • Wireless content caching in small cell networks has recently been considered as a promising way to alleviate the congestion of the backhaul in emerging heterogenous cellular network. However, how to select files which are cached in SBSs and how to make SBSs work together is an important issue for cooperative cache research for the propose of reducing file download time. In this paper, a Cooperative-Greedy strategy (CGS) among cache-enabled small base stations (SBSs) in small cell network is proposed, in order to minimize the download time of files. This problem is formulated as a Chinese restaurant game.Using this game model, we can configure file caching schemes based on file popularity and the spectrum resources allocated to several adjacent SBSs. Both the existence and uniquencess of a Nash equilibrium are proved. In the theoretical analysis section, SBSs cooperate with each other in order to cache popular files as many as possible near UEs. Simulation results show that the CGS scheme outperforms other schemes in terms of the file-download time.

The Study on Reduction of Scanning Path Build Time According to Control of STL file Slicing Height - Application of Small Jewellery (STL File 슬라이싱 높이 조정에 따른 주사경로 생성시간 저감에 관한 연구 - 소형 보석류에 적용)

  • Kim Tae Ho;Kim Min Ju;Lee Seung Soo;Jeon Eon Chan
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.22 no.12 s.177
    • /
    • pp.205-210
    • /
    • 2005
  • This paper addresses the correlation between the change of file size and the scanning path build time by the slicing height of STL file. Though the study about STL file has been achieved quite actively scanning path build time using STL file is not investigated so much to be satisfied. The file size depends on the number of polygon created by the slicing height specified. And this number of polygons increases in a regular rate. The correlation between the number of polygons and the scanning path build time is examined and verified.

The File Splitting Distribution Scheme Using the P2P Networks with The Mesh topology (그물망 위상의 P2P 네트워크를 활용한 파일 분리 분산 방안)

  • Lee Myoung-Hoon;Park Jung-Su;Kim Jin-Hong;Jo In-June
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.8
    • /
    • pp.1669-1675
    • /
    • 2005
  • Recently, the small sized wireless terminals have problems of processing of large sized file because of the trends of a small sized terminals and a large sized files. Moreover, the web servers or the file servers have problems of the overload because of the concentration with many number of files to the them. Also, There is a security vulnerability of the data processing caused by the processing with a unit of the independent file. To resolve the problems, this paper proposes a new scheme of fat splining distribution using the P2P networks with the mesh topology. The proposed scheme is to distribute blocks of file into any peer of P2P networks. It can do that the small sized wireless terminals can process the large size file, the overload problems of a web or file servers can solve because of the decentralized files, and, the security vulnerability of the data processing is mitigated because of the distributed processing with a unit of the blocks to the peers.

Distributed File Placement and Coverage Expansion Techniques for Network Throughput Enhancement in Small-cell Network (소형셀 네트워크 전송용량 향상을 위한 분산 파일저장 및 커버리지 확장 기법)

  • Hong, Jun-Pyo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.1
    • /
    • pp.183-189
    • /
    • 2018
  • This paper proposes distributed file placement and coverage expansion techniques for mitigating the traffic bottleneck in backhaul for small-cell networks. In order to minimize the backhaul load with limited memory space, the proposed scheme controls the coverage and file placement of base station according to file popularity distribution and memory space of base stations. In other words, since the cache hit ratio is low when there is small memory capacity or widespread file popularity distribution, the base stations expand its coverage and cache different set of files for the user located in overlapped area to exploit multiple cached file sets of base stations. Our simulation results show that the proposed scheme outperforms the conventional cache strategy in terms of network throughput when there is small memory capacity or widespread file popularity distribution.

A Study on the Improving Performance of Massively Small File Using the Reuse JVM in MapReduce (MapReduce에서 Reuse JVM을 이용한 대규모 스몰파일 처리성능 향상 방법에 관한 연구)

  • Choi, Chul Woong;Kim, Jeong In;Kim, Pan Koo
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.9
    • /
    • pp.1098-1104
    • /
    • 2015
  • With the widespread use of smartphones and IoT (Internet of Things), data are being generated on a large scale, and there is increased for the analysis of such data. Hence, distributed processing systems have gained much attention. Hadoop, which is a distributed processing system, saves the metadata of stored files in name nodes; in this case, the main problems are as follows: the memory becomes insufficient; load occurs because of massive small files; scheduling and file processing time increases because of the increased number of small files. In this paper, we propose a solution to address the increase in processing time because of massive small files, and thus improve the processing performance, using the Reuse JVM method provided by Hadoop. Through environment setting, the Reuse JVM method modifies the JVM produced conventionally for every task, so that multiple tasks are reused sequentially in one JVM. As a final outcome, the Reuse JVM method showed the best processing performance when used together with CombineFileInputFormat.

A File System for Large-scale NAND Flash Memory Based Storage System

  • Son, Sunghoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.9
    • /
    • pp.1-8
    • /
    • 2017
  • In this paper, we propose a file system for flash memory which remedies shortcomings of existing flash memory file systems. Besides supporting large block size, the proposed file system reduces time in initializing file system significantly by adopting logical address comprised of erase block number and bitmap for pages in the block to find a page. The file system is suitable for embedded systems with limited main memory since it has small in-memory data structures. It also provides efficient management of obsolete blocks and free blocks, which contribute to the reduction of file update time. Finally the proposed file system can easily configure the maximum file size and file system size limits, which results in portability to emerging larger flash memories. By conducting performance evaluation studies, we show that the proposed file system can contribute to the performance improvement of embedded systems.

DJFS: Providing Highly Reliable and High-Performance File System with Small-Sized NVRAM

  • Kim, Junghoon;Lee, Minho;Song, Yongju;Eom, Young Ik
    • ETRI Journal
    • /
    • v.39 no.6
    • /
    • pp.820-831
    • /
    • 2017
  • File systems and applications try to implement their own update protocols to guarantee data consistency, which is one of the most crucial aspects of computing systems. However, we found that the storage devices are substantially under-utilized when preserving data consistency because they generate massive storage write traffic with many disk cache flush operations and force-unit-access (FUA) commands. In this paper, we present DJFS (Delta-Journaling File System) that provides both a high level of performance and data consistency for different applications. We made three technical contributions to achieve our goal. First, to remove all storage accesses with disk cache flush operations and FUA commands, DJFS uses small-sized NVRAM for a file system journal. Second, to reduce the access latency and space requirements of NVRAM, DJFS attempts to journal compress the differences in the modified blocks. Finally, to relieve explicit checkpointing overhead, DJFS aggressively reflects the checkpoint transactions to file system area in the unit of the specified region. Our evaluation on TPC-C SQLite benchmark shows that, using our novel optimization schemes, DJFS outperforms Ext4 by up to 64.2 times with only 128 MB of NVRAM.