DOI QR코드

DOI QR Code

TeT: Distributed Tera-Scale Tensor Generator

분산 테라스케일 텐서 생성기

  • 전병수 (서울대학교 컴퓨터공학부) ;
  • 이정우 (서울대학교 컴퓨터공학부) ;
  • 강유 (서울대학교 컴퓨터공학부)
  • Received : 2016.03.30
  • Accepted : 2016.06.24
  • Published : 2016.08.15

Abstract

A tensor is a multi-dimensional array that represents many data such as (user, user, time) in the social network system. A tensor generator is an important tool for multi-dimensional data mining research with various applications including simulation, multi-dimensional data modeling/understanding, and sampling/extrapolation. However, existing tensor generators cannot generate sparse tensors like real-world tensors that obey power law. In addition, they have limitations such as tensor sizes that can be processed and additional time required to upload generated tensor to distributed systems for further analysis. In this study, we propose TeT, a distributed tera-scale tensor generator to solve these problems. TeT generates sparse random tensor as well as sparse R-MAT and Kronecker tensor without any limitation on tensor sizes. In addition, a TeT-generated tensor is immediately ready for further tensor analysis on the same distributed system. The careful design of TeT facilitates nearly linear scalability on the number of machines.

많은 종류의 데이터들은 텐서로 표현될 수 있다. 텐서란 다차원 배열을 의미하며, 그 예로 (사용자, 사용자, 시간)으로 이루어진 소셜 네트워크 데이터가 있다. 이러한 다차원 데이터 분석에 있어서 텐서 생성기는 시뮬레이션, 다차원 데이터 모델링 및 이해, 샘플링/외삽법 등 다양한 응용이 가능하다. 하지만, 존재하는 텐서 생성기들은 실제 세계의 텐서처럼 멱 법칙을 따르는 특성과 희박성을 갖는 텐서를 생성할 수 없다. 또한, 처리가능한 텐서 크기에 한계가 존재하고, 분산시스템에서 추가 분석을 하려면 텐서를 분산시스템에 업로드 하는 추가비용이 든다. 본 논문은 분산 테라스케일 텐서 생성기(TeT)를 제안함으로써 이러한 문제를 해결하고자 한다. TeT는 희박성을 갖는 랜덤 텐서와 희박성과 멱 법칙을 따르는 특성을 갖는 Recursive-MATrix 텐서, 크로네커 텐서를 크기 제한없이 생성할 수 있다. 또한, TeT에서 생성된 텐서는 같은 분산 시스템에서 추가적인 텐서분석이 가능하다. TeT는 효율적인 설계로 인해 거의 선형적인 머신확장성을 보인다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. A. Karatzoglou, X. Amatriain, L. Baltrunas, N. Oliver, "Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering," Proc. of the 4th ACM conference on Recommender Systems (RecSys), pp. 79-86, 2010.
  2. J. Sun, S. Papadimitriou, C.-Y. Lin, N. Cao, S. Liu, W. Qian, "MultiVis: Content-based Social Network Exploration Through Multi-way Visual Analysis," Proc. of the SIAM International Conference on Data Mining (SDM) 2009, pp. 1064-1075, 2009.
  3. K. Maruhashi, F. Guo, C. Faloutsos, "Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis," Proc. of the International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 2011, pp. 203-210, 2011.
  4. Jimeng Sun, Dacheng Tao, Christos Faloutsos, "Beyond streams and graphs: dynamic tensor analysis," Proc. of the 12nd ACM SIGKDD, pp. 374-383, 2006.
  5. J. Sun, H. Zeng, H. Liu, Y. Lu, Z. Chen, "Cubesvd: a novel approach to personalized web search," Proc. of the 14th International Conference on WWW, pp. 382-390, 2005.
  6. Tamara G. Kolda, Brett W. Bader, Joseph P. Kenny, "Higher-Order Web Link Analysis Using Multilinear Algebra," Proc. of the 5th IEEE ICDM, pp. 242-249, 2005.
  7. T. Kolda, B. Bader, "The tophits model for higherorder web link analysis," Proc. of the Workshop on Link Analysis, Counterterrorism and Security, Vol. 7, pp. 26-29, 2006.
  8. Peter A. Chew, Brett W. Bader, Tamara G. Kolda, Ahmed Abdelali, "Cross-language information retrieval using PARAFAC2," Proc. of the 13rd ACM SIGKDD, pp. 143-152, 2007.
  9. Evrim Acar, Canan Aykut-Bingol, Haluk Bingol, Rasmus Bro, Bulent Yener, "Multiway analysis of epilepsy tensors," Journal of ISMB/ECCB (Supplement of Bioinformatics), pp. 10-18, 2007.
  10. Albert-Laszlo Barabasi, "Linked: The New Science of Networks," Perseus Publishing, first edition, 2002.
  11. J. Leskovec, J. Kleinberg, C. Faloutsos, "Graphs over time: Densification laws, shrinking diamaters and possible explanations," Proc. of the 11st ACM SIGKDD, pp. 177-187, 2005.
  12. E. Papalexakis, C. Faloutsos, N. Sidiropoulos, "ParCube: Sparse Parallelizable Tensor Deompositions," Proc. of the 16th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 521-536, 2012.
  13. J. Choi, S. Vishwanathan, "DFacTo: Distributed Factorization of Tensors," Proc. of the 28th Annual Conference on Neural Information Processing Systems (NIPS), pp. 1296-1304, 2014.
  14. Tensor Toolbox [Online]. Available: http://www.sandia.gov/-tgkolda/TensorToolbox/index-2.6.html.
  15. PyTensor [Online]. Available: http://www.cs.cmu.edu/-cjl/papers/CMU-CS-10-102.pdf
  16. TensorFlow [Online]. Available: https://www.tensorflow.org/
  17. ByungSoo Jeon, Inah Jeon, U Kang, "TeGViz: Distributed Tera-Scale Graph Generation and Visualization," Proc. of the 15th IEEE ICDM Workshop, pp. 1620-1623, 2015
  18. J. Dean, S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Proc. of the 6th Symposium on OSDI 2004, pp. 137-150, 2004.
  19. Apache Hadoop [Online]. Available: http://hadoop.apache.org/
  20. D. Chakrabarti, Y. Zhan, C. Faloutsos, "R-mat: A recursive model for graph mining," Proc. of the SIAM International Conference on Data Mining (SDM) 2004, pp. 442-446, 2004.
  21. J. Leskovec, D. Chakrabarti, J. M. Kleinberg, C. Faloutsos, "Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication," Proc. of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pp. 133-145, 2005.
  22. Graphstream library [Online]. Available: http://graphstream-project.org/
  23. D. Bader, K. Madduri, "GTgraph: A suite of synthetic random graph generators," [Online]. Available: http://www.cse.psu.edu/-kxm85/software/GTgraph/.
  24. NetworkX [Online]. Available: http://networkx.github.io/