DOI QR코드

DOI QR Code

파일 유형과 크기에 따른 적응형 중복 제거 알고리즘 설계

Design of Adaptive Deduplication Algorithm Based on File Type and Size

  • 투고 : 2019.10.15
  • 심사 : 2019.11.07
  • 발행 : 2020.02.29

초록

오늘날 사용자의 데이터 증가에 따른 데이터 중복으로 인해 다양한 중복 제거 연구가 이루어졌다. 그러나 상대적으로 개인 스토리지에 대한 연구는 미진하다. 개인 스토리지는 고성능 컴퓨터와는 다르게 CPU와 메모리의 자원 사용을 줄이면서 중복 제거를 수행할 필요가 있다. 본 논문에서는 개인 스토리지에서 중복 제거율을 적절히 유지하며 부하를 낮추기 위해 파일의 타입과 크기에 따라 FSC(Fixed Size Chunking)와 WFH(Whole File Chunking)를 선택적으로 적용하는 적응형 알고리즘을 제안한다. 제안한 파일 시스템은 LessFS에 비해 최초 파일 Write 시에는 Write 소요 시간은 1.3배 이상 높았으나 메모리의 사용은 3배 이상 감소하였고, Rewrite 시는 LessFS에 비하여 Write 소요시간이 2.5배 이상 빨라지는 것을 실험을 통하여 확인하였다.

Today, due to the large amount of data duplication caused by the increase in user data, various deduplication studies have been conducted. However, research on personal storage is relatively poor. Personal storage, unlike high-performance computers, needs to perform deduplication while reducing CPU and memory resource usage. In this paper, we propose an adaptive algorithm that selectively applies fixed size chunking (FSC) and whole file chunking (WFH) according to the file type and size in order to maintain the deduplication rate and reduce the load in personal storage. We propose an algorithm for minimization. The experimental results show that the proposed file system has more than 1.3 times slower at first write operation but less than 3 times reducing in memory usage compare to LessFS and it is 2.5 times faster at rewrite operation.

키워드

참고문헌

  1. Watson Customer Engagement, "10 Key Marketing Trends for 2017 and Ideas for Exceeding Customer Expectations," White Paper, IBM Marketing Cloud, Jul. 2017.
  2. D. Reinsel, J. Gantz, and J. Rydning, "Data Age 2025: The Evolution of Data to Life-Critical," An IDC White Paper Sponsored by SEAGATE, pp. 1-25, Apr. 2017.
  3. Dr. P. Kumar, and E. V. Pavithra, "Survey on Deduplication in Cloud Environment," International Journal of Engineering Research & Technology (IJERT), vol. 7, no. 02, pp. 20-23, 2018.
  4. M. Rosenblum, and J. K Ousterhout, "The Design and Implementation of a Log-Structured File System," ACM Transctions on Computer Systems(TOCS), vol. 10, no. 1, pp. 26-52, 1992. https://doi.org/10.1145/146941.146943
  5. W. J. Bolosky, S. Corbin, D. Goebel, and J. R. Douceur, "Single Instance Storage in Windows 2000," WSS'00 Proceedings of the 4th conference on USENIX Windows System Symposium, Seattle, WA, vol. 4, pp. 13-24, 2000.
  6. LessFS. [Internet]. Available: https://fedoraproject.org/wiki/Features/LessFS/.
  7. D. H. Kim, S. J. Song, and B. Y. Choi, Existing Deduplication Techniques. In: Data Deduplication for Data Optimization for Storage and Network Systems, Springer, 2017.
  8. M. O. Rabin, "Fingerprinting by Random Polynomials," Center for Research in Computing Technology, Harvard University, Technical Report TR-15-81, 1981.
  9. A. Muthitacharoen, B. Chen, and D. Mazieres, "A Low-Bandwidth Network File System," in Proceedings of the eighteenth ACM symposium on Operating systems principles, New York: NY, pp. 174-187. 2001.
  10. SDFS, [Internet], Available: https://github.com/opendedup/sdfs/.
  11. Y. Zhou, Y. Deng, L. T. Yang, R. Yang, and L. Si, "LDFS: A Low Latency In-line Data Deduplication File System," IEEE Access, vol. 6, pp. 15743-15753, 2018. https://doi.org/10.1109/access.2018.2800763
  12. M. K. Yoon, "A constant-time chunking algorithm for packet-level deduplication," ICT Express, vol. 5, no. 2, pp. 131-135, 2019. https://doi.org/10.1016/j.icte.2018.05.005
  13. J. H. Myeong, I. C. Hwang, and O. Y. Kwon, "Design Flexible Deduplication Filesystem on Personal Computer Environment," in Proceedings of the 2018 International Conference on Future Information & Communication Engineering, Pattaya, Thailand, vol. 10, no.1, pp. 277-280, 2018.
  14. J. R. Douceur, and W. J. Bolosky, "A Large-Scale Study of File-System Contents," in Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, vol. 27, pp. 59-70, Jun. 1999.
  15. Y. Fu, H. Jiang, N. Xiao, L. Tian, F. Liu, and L. Xu, "Application-Aware Local-Global Source Deduplication for Cloud Backup Services of Personal Storage," IEEE Transactions on Parallel and Distributed Systems, vol. 25, no. 5, pp. 1155-1165, May. 2014. https://doi.org/10.1109/TPDS.2013.167
  16. A. Kishan, and R. Wagh, "A Study of Performance NoSQL Databases," International Journal of Innovative Research in Advanced Engineering (IJIRAE), vol. 4, no. 4, pp. 32-36, 2017.