DOI QR코드

DOI QR Code

HTSC and FH HTSC: XOR-based Codes to Reduce Access Latency in Distributed Storage Systems

  • Shuai, Qiqi (Department of Electrical and Electronic Engineering, The University of Hong Kong) ;
  • Li, Victor O.K. (Department of Electrical and Electronic Engineering, The University of Hong Kong)
  • Received : 2015.04.30
  • Published : 2015.12.31

Abstract

A massive distributed storage system is the foundation for big data operations. Access latency performance is a key metric in distributed storage systems since it greatly impacts user experience while existing codes mainly focus on improving performance such as storage overhead and repair cost. By generating parity nodes from parity nodes, in this paper we design new XOR-based erasure codes hierarchical tree structure code (HTSC) and high failure tolerant HTSC (FH HTSC) to reduce access latency in distributed storage systems. By comparing with other popular and representative codes, we show that, under the same repair cost, HTSC and FH HTSC codes can reduce access latency while maintaining favorable performance in other metrics. In particular, under the same repair cost, FH HTSC can achieve lower access latency, higher or equal failure tolerance and lower computation cost compared with the representative codes while enjoying similar storage overhead. Accordingly, FH HTSC is a superior choice for applications requiring low access latency and outstanding failure tolerance capability at the same time.

Keywords

Acknowledgement

Supported by : The University of Hong Kong

References

  1. S. Ghemawat, H. Gobioff, and S. T. Leung, "The Google file system," ACM SIGOPS Operating Systems Review, vol. 37, pp. 29-43, 1997.
  2. J. Li and B. Li, "Erasure coding for cloud storage systems: A survey," Tsinghua Science and Technology, vol. 18, pp. 259-272, 2013. https://doi.org/10.1109/TST.2013.6522585
  3. A. G. Dimakis et al., "A survey on network codes for distributed storage," Proc. IEEE, vol. 99, 2011, pp. 476-489. https://doi.org/10.1109/JPROC.2010.2096170
  4. C. Huang et al., "Erasure Coding in Windows Azure Storage," in Proc. USENIX ATC, (Boston, USA), 2012, pp. 15-26.
  5. M. Sathiamoorthy et al., "Xoring elephants: Novel erasure codes for big data," in Proc. LDB, (Trento, Italy), 2013, pp. 325-336.
  6. A. G. Dimakis et al., "Network coding for distributed storage systems," IEEE Trans. Inf. Theory, vol. 56, pp. 4539-4551, 2010. https://doi.org/10.1109/TIT.2010.2054295
  7. A. Rudra, P. K. Dubey, C. S. Jutla, V. Kumar, J. R. Rao, and P. Rohatgi, "Efficient Rijndael encryption implementation with composite field arithmetic," in Proc. CHES, Springer, pp. 171-184, 2001.
  8. M. Foley, "High availability HDFS," in Proc. IEEEMSST, (Asilomar Conference Grounds Pacific Grove, USA), vol. 12, 2012.
  9. J. Brutlag, "Speed matters for Google web search," Google, June, 2009.
  10. S. B. Wicker and V. K. Bhargava, "Reed-Solomon codes and their applications," John Wiley & Sons, 1999.
  11. Q. Shuai, V. O. K. Li, and Y. Zhu, "Performance models of access latency in cloud storage systems," in Fourth Workshop on Architectures and Systems for Big Data, (Minneapolis, USA), June, 2014.
  12. K. V. Rashmi, N. B. Shah, and P. V. Kumar, "Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a productmatrix construction," in IEEE Trans. Inf. Theory, vol. 57, pp. 5227-5239, 2011. https://doi.org/10.1109/TIT.2011.2159049
  13. V. R. Cadambe et al., "Asymptotic interference alignment for optimal repair of MDS codes in distributed data storage," IEEE Trans. Inf. Theory, vol. 59, pp. 2974-2987, 2013. https://doi.org/10.1109/TIT.2013.2237752
  14. N. B. Shah et al., "Explicit codes minimizing repair bandwidth for distributed storage," Information Theory Workshop, (Cairo, Egypt), 2010, pp. 1-5.
  15. V. R. Cadambe, S. A. Jafar, and H. Maleki, "Distributed data storage with minimum storage regenerating codes-exact and functional repair are asymptotically equally efficient," in arXiv preprint arXiv:1004.4299, 2010.
  16. N. B. Shah et al., "Interference alignment in regenerating codes for distributed storage: Necessity and code constructions," IEEE Trans. Inf. Theory, vol. 58, pp. 2134-2158, 2012. https://doi.org/10.1109/TIT.2011.2178588
  17. A. Duminuco and E. Biersack, "A practical study of regenerating codes for peer-to-peer backup systems," in Proc. IEEE ICDCS, (Montreal, Canada), 2009, pp. 376-384.
  18. A. Duminuco and E.W. Biersack, "Hierarchical codes: A flexible tradeoff for erasure codes in peer-to-peer storage systems," Peer-to-peer Networking and Applications,, vol. 3, pp. 52-66, 2010. https://doi.org/10.1007/s12083-009-0044-8
  19. M. Blaum et al., "Evenodd: An efficient scheme for tolerating double disk failures in raid architectures," IEEE Trans. Computers, vol. 44, pp. 192-202, 1995. https://doi.org/10.1109/12.364531
  20. L. Xu and J. Bruck, "X-code: MDS array codes with optimal encoding," IEEE Trans. Computers, vol. 45, pp. 272-276, 1999.
  21. P. Corbett et al., "Row-diagonal parity for double disk failure correction," in Proc. 3rd USENIX Conference on File and Storage Technologies, (San Francisco, USA), 2014, pp. 1-14.
  22. C. Huang and L. Xu, "Star: An efficient coding scheme for correcting triple storage node failures," IEEE Trans. Computers, vol. 57, pp. 889-901, 2008. https://doi.org/10.1109/TC.2007.70830
  23. J. L. Hafner, "Weaver codes: Highly fault tolerant erasure codes for storage systems," in FAST, (San Francisco, USA), 2005, pp. 16-16.
  24. L. Huang et al., "Codes can reduce queueing delay in data centers," in Proc. IEEE ISIT, (Cambridge, USA), 2012, pp. 2766-2770.
  25. N. B. Shah, K. Lee, and K. Ramchandran, "The MDS queue: Analysing the latency performance of erasure codes," in Proc. IEEE ISIT, (Honolulu, USA), 2014, pp. 861-865.
  26. G. Joshi, Y. Liu, and E. Soljanin, "On the delay-storage trade-off in content download from coded distributed storage systems," IEEE J. Sel. Areas Commun., vol. 32, pp. 989-997, 2014. https://doi.org/10.1109/JSAC.2014.140518
  27. N. B. Shah, K. Lee, and K. Ramchandran, "When do redundant requests reduce latency?," in Allerton Conf, (Monticello, USA), 2013.
  28. G. Liang and U. C. Kozat, "Fast Cloud: Pushing the envelope on delay performance of cloud storage with coding," IEEE/ACM Trans. Netw., vol. 22, pp. 2012-2025, 2014. https://doi.org/10.1109/TNET.2013.2289382
  29. L. E. Dickson, "Linear Groups: With an exposition of the Galois field theory," Courier Dover Publications, 2003.
  30. K. Rashmi et al., "A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster" Presented as part of the 5th USENIX Workshop on Hot Topics in Storage and File Systems, (San Jose, USA), 2013.
  31. E. Pinheiro, W. D. Weber, and L. A. Barroso, "Failure trends in a large disk drive population," in FAST, (San Jose, USA), 2007, pp. 17-23.
  32. K. M. Greenan, X. Li, and J. J. Wylie, "Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs," in IEEE 26th MSST, (Incline Village, USA), 2010, pp. 1-14.
  33. L. Kleinrock, "Queueing Systems: Volume 2: Computer Applications," John Wiley & Sons, New York, 1976.
  34. A. Fikes, "Storage architecture and challenges," Talk at the Faculty Summit, 2010.
  35. D. Ford et al., "Availability in globally distributed storage systems," in OSDI, (Vancouver, Canada), 2010, pp. 61-74.
  36. D. Borthakur et al., "HDFS RAID," in Hadoop User Group Meeting, 2010.
  37. S. Nath et al., "Subtleties in tolerating correlated failures in wide-area storage systems," in NSDI, (San Jose, USA), 2006, pp. 225-238.