Storage System Performance Enhancement Using Duplicated Data Management Scheme

중복 데이터 관리 기법을 통한 저장 시스템 성능 개선

  • Published : 2010.02.15

Abstract

Traditional storage server suffers from duplicated data blocks which cause an waste of storage space and network bandwidth. To address this problem, various de-duplication mechanisms are proposed. Especially, lots of works are limited to backup server that exploits Contents-Defined Chunking (CDC). In backup server, duplicated blocks can be easily traced by using Anchor, therefore CDC scheme is widely used for backup server. In this paper, we propose a new de-duplication mechanism for improving a storage system. We focus on efficient algorithm for supporting general purpose de-duplication server including backup server, P2P server, and FTP server. The key idea is to adapt stride scheme on traditional fixed block duplication checking mechanism. Experimental result shows that the proposed mechanism can minimize computation time for detecting duplicated region of blocks and efficiently manage storage systems.

기존의 전통적인 저장 서버는 중복 데이터 블록에 의해서 저장 공간과 네트워크 대역폭의 낭비가 발생되고 있다. 이와 같은 문제를 해결하기 위하여, 다양한 중복 제거 메커니즘이 제시되었으나, 대부분 Contents-Defined Chunking (CDC) 기법을 사용하는 백업 서버에 한정되었다. 왜냐하면 CDC 기법은 앵커를 사용하여 중복 블록을 쉽게 추적할 수 있기 때문에 파일의 업데이트를 관찰하기 유리한 백업 시스템에서 널리 사용되고 있는 것이다. 본 논문에서는 저장 시스템 성능을 개선하기 위하여, 새로운 중복 제거 메커니즘을 제시하고 있다. 범용적인 중복제거 서버를 구축하기 위한 효율적인 알고리즘에 초점을 맞추고 있으며, 이를 통하여 백업 서버, P2P 서버, FTP 서버와 같은 다양한 시스템에 활용이 가능하게 하는 것을 목표로 한다. 실험 결과 제안한 알고리즘이 중복 영역의 블록을 찾아내는 시간을 최소화하고 효율적으로 저장 시스템을 관리하는 것을 보였다.

Keywords

References

  1. J.S. Robin and C.E. Irvine. Analysis of the Intel Pentium's ability to support a secure virtual machine monitor. In Proceedings of the 9th USENIX Security Symposium, Denver, CO, August 2000.
  2. KyoungSoo Park, Sunghwan Ihm, Mic Bowman, and Vivek S. Pai., "Supporting Practical Content- Addressable Caching with CZIP Compression," In Proceedings of the USENIX Annual Technical Conference, Santa Clara, CA, June 2007.
  3. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, ACM SIGCOMM 2001, San Deigo, CA, August 2001, pp.149-160.
  4. L. P. Cox, C. D. Murray, and B. D. Noble. Pastiche: Making backup cheap and easy. In Proc. 5th USENIX OSDI, Boston, MA, Dec. 2002.
  5. R. L. Rivest, "The MD5 Message Digest Algorithm," Request for Comments(RFC) 1321, Internet Activities Board, 1992.
  6. RFC 3174, "US Secure Hash Algorithm 1 (SHA-1)"
  7. A. Tridgell. Efficient algorithms for sorting and synchronization. PhD thesis, The Austrailian National University, 1999.
  8. plan9 home page, http://plan9.bell-labs.com/plan9/
  9. QUINLAN, S., AND DORWARD, S. "Venti: a new approach to archival storage," In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST), 2002.
  10. Athicha Muthitacharoen, Benjie Chen, and David Mazieres, "A Low-Bandwidth Network File System," In Proceedings of the Symposium on Operating Systems Principles (SOSP'01), pp.174-187, 2001.
  11. M. O. Rabin, "Fingerprinting by random polynomials," Technical Report TR-15-81, Center for Research in Computing Technology, Harvard University, 1981.
  12. Constantine P. Sapuntzakis, Ramesh Chandra, BenPfaff, Jim Chow, Monica S. Lam, and Mendel Rosenblum. Optimizing the Migration of Virtual Computers. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), December 2002.
  13. D. Bobbarjung, Suresh Jagannathan, C. Dubnicki. Improving Duplicate Elimination in Storage Systems, ACM Transactions on Storage, November 2006.
  14. K. Eshghi and H.K. Tang, A Framework for Analyzing and Improving Content-Based Chunking Algorithms. Hewlett-Packard Labs Technical Report TR 2005-30.
  15. Fred Douglis and Arun Iyengar. Application-specific Delta-encoding via Resemblance Detection. In Proceedings of 2003 USENIX Technical Conference, pp.113-126, San Antonio, Texas, USA, 2003.
  16. Purushottam Kulkarni, Fred Douglis, Jason La Voie, and John M. Tracey, "Redundancy Elimination Within Large Collections of Files," In Proceedings of 2004 USENIX Technical Conference, Boston, Massachusetts, USA, 2004.
  17. B. Zhu, K. Li, and H. Patterson, "Avoiding the disk bottleneck in the data domain deduplication file system," in Proceedings of the Seventh USENIX Conference on File and Storage Technologies (FAST), pp.269-282, 2008.
  18. Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Campbell, "Sparse Indexing, Large Scale, Inline Deduplication Using Sampling and Locality," In Proceedings of the Seventh USENIX Conference on File and Storage Technologies (FAST) 2009, San Francisco, CA.
  19. Jim Gray, Catharine van Ingen, "Empirical Measurements of Disk Failure Rates and Error Rates," Microsoft Research Technical Report MSR-TR- 2005-166, 2005.
  20. centos home page, http://www.centos.org/
  21. vmware home page, http://www.vmware.com/
  22. fedoraproject home page, http://www.fedoraproject.org/