Spatial Computation on Spark Using GPGPU

GPGPU를 활용한 스파크 기반 공간 연산

  • 손찬승 (건국대학교 컴퓨터공학과) ;
  • 김대희 (건국대학교 컴퓨터공학과) ;
  • 박능수 (건국대학교 컴퓨터공학과)
  • Received : 2016.07.25
  • Accepted : 2016.08.03
  • Published : 2016.08.31


Recently, as the amount of spatial information increases, an interest in the study of spatial information processing has been increased. Spatial database systems extended from the traditional relational database systems are difficult to handle large data sets because of the scalability. SpatialHadoop extended from Hadoop system has a low performance, because spatial computations in SpationHadoop require a lot of write operations of intermediate results to the disk, resulting in the performance degradation. In this paper, Spatial Computation Spark(SC-Spark) is proposed, which is an in-memory based distributed processing framework. SC-Spark is extended from Spark in order to efficiently perform the spatial operation for large-scale data. In addition, SC-Spark based on the GPGPU is developed to improve the performance of the SC-Spark. SC-Spark uses the advantage of the Spark holding intermediate results in the memory. And GPGPU-based SC-Spark can perform spatial operations in parallel using a plurality of processing elements of an GPU. To verify the proposed work, experiments on a single AMD system were performed using SC-Spark and GPGPU-based SC-Spark for Point-in-Polygon and spatial join operation. The experimental results showed that the performance of SC-Spark and GPGPU-based SC-Spark were up-to 8 times faster than SpatialHadoop.


Supported by : 건국대학교


  1. A. Eldawy and Mohamed F. Mokbel, "SpatialHadoop: A MapReduce Framework for Spatial Data," 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 1352-1363, Apr., 2015.
  2. A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. H. Saltz, "Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce," PVLDB, pp.1009-1020, 2013.
  3. M. Zaharia, M. Chowdhury, Michael J. Franklin, S. Shenker, and I. Stoica, "Spark: Cluster Computing with Working Sets," Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp.10-10, 2010.
  4. W. Tom, "Hadoop The Definitive Guide," O'Reilly Media, 2009.
  5. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, Michael. J. Franklin, S. Shenker, and I. Stoica, "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing," Proc. of the 9th USENIX Conference on Networked Systems Design and Implementation, pp.2-2, 2012.
  6. Open Geospatial Consortium, Inc, "The OpenGIS Abstraction Specification Topic 5: Features, Version 5.0," 2009 [Internet],
  7. J. Kalojanov and P. Slusallek, "A Parallel Algorithm for Construction of Uniform Grids," in Proceedings of High Performance Graphics, pp. 23-28, 2009.
  8. T. Kaldewey, G. Lohman, R. Mueller, and P. Volk, "GPU Join Processing Revisited," in Proceedings of the Eighth International Workshop on Data Management on New Hardware (DaMoN 2012), pp.55-62, 2012.
  9. B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander, "Relational Joins on Graphics Processors," in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp.511-524, 2008.
  10. H. Samet, "Spatial Data Structures," in Modern Database Systems: The Object Model, Interoperability and Beyond, W. Kim, Ed., pp.361-385, Addison Wesley/ACM, pp.361-385, 1995.
  11. S. You, J. Zhang, and L. Gruenwald, "Large-Scale Spatial Join Query Processing in Cloud," in Proceedings of IEEE CloudDM'15, pp.34-41, 2015.
  12. O. Segal, P. Colangelo, N. Nasiri, Z. Qian, and M. Margala, "SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters," arXiv:1505.01120, 2015.
  13. JinYung Hong, MyungJoong Jeon, YoungTack Park, "Scalable Ontology Reasoning Using GPU Cluster Approach," Journal of KIISE, Vol.43, No.1, pp.61-70, 2016.