Graph Database Solution for Higher Order Spatial Statistics in the Era of Big Data

  • Sabiu, Cristiano G. (Yonsei University, Department of Astronomy) ;
  • Kim, Juhan (Korea Institute for Advanced Study, Center for Advanced Computation)
  • Published : 2019.04.10

Abstract

We present an algorithm for the fast computation of the general N-point spatial correlation functions of any discrete point set embedded within an Euclidean space of ${\mathbb{R}}n$. Utilizing the concepts of kd-trees and graph databases, we describe how to count all possible N-tuples in binned configurations within a given length scale, e.g. all pairs of points or all triplets of points with side lengths < rmax. Through benchmarking we show the computational advantage of our new graph-based algorithm over more traditional methods. We show that all 3-point configurations up to and beyond the Baryon Acoustic Oscillation scale (~200 Mpc in physical units) can be performed on current Sloan Digital Sky Survey (SDSS) data in reasonable time. Finally we present the first measurements of the 4-point correlation function of ~0.5 million SDSS galaxies over the redshift range 0.43< z <0.7. We present the publicly available code GRAMSCI (GRAph Made Statistics for Cosmological Information; bitbucket.org/csabiu/gramsci), under a GNU General Public License.

Keywords