DOI QR코드

DOI QR Code

Development of Big Data System for Energy Big Data

에너지 빅데이터를 수용하는 빅데이터 시스템 개발

  • 송민구 (서울대학교 차세대융합기술연구원)
  • Received : 2017.08.22
  • Accepted : 2017.11.16
  • Published : 2018.01.15

Abstract

This paper proposes a Big Data system for energy Big Data which is aggregated in real-time from industrial and public sources. The constructed Big Data system is based on Hadoop and the Spark framework is simultaneously applied on Big Data processing, which supports in-memory distributed computing. In the paper, we focus on Big Data, in the form of heat energy for district heating, and deal with methodologies for storing, managing, processing and analyzing aggregated Big Data in real-time while considering properties of energy input and output. At present, the Big Data influx is stored and managed in accordance with the designed relational database schema inside the system and the stored Big Data is processed and analyzed as to set objectives. The paper exemplifies a number of heat demand plants, concerned with district heating, as industrial sources of heat energy Big Data gathered in real-time as well as the proposed system.

본 논문은 산업 현장과 민간에서 실시간으로 수집되는 에너지 빅데이터를 수용하는 빅데이터 시스템을 제안한다. 구축된 빅데이터 시스템은 하둡(Hadoop) 기반이며, 빅데이터 처리에 있어 인메모리(in-memory) 분산처리 컴퓨팅을 지원하는 스파크(Spark) 프레임워크가 동시에 적용되었다. 본문에서는 지역난방에 사용되는 열에너지 형태의 빅데이터에 초점을 두어, 입출력되는 에너지의 특성을 고려하며 실시간 수집되는 빅데이터를 적재, 관리, 처리 및 분석하는 방법을 다룬다. 이 때, 외부에서 유입되는 빅데이터는 시스템 내부에 설계된 관계형 데이터베이스 스키마에 따라 저장하고 관리되며, 저장된 빅데이터는 설정된 목적에 따라 처리하고 분석된다. 제안된 빅데이터 시스템과 더불어 지역난방과 관련한 복수의 실증현장으로부터 실시간으로 수집되는 열에너지 빅데이터에 대해 시스템이 활용된 사례를 기술한다.

Keywords

Acknowledgement

Supported by : 한국에너지기술평가원 (KETEP)

References

  1. Apache Hadoop, [Online] Available: https://hadoop.apache.org/
  2. Apache Spark, [Online] Available: https://spark.apache.org/
  3. A. Pavlo et al., A Comparison of Approaches to Large-Scale Data Analysis, Proc. of the ACM SIGMOD International Conference on Management of Data, pp. 165-178, 2009.
  4. J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters, OSDI, 2004.
  5. A. Abouzeid et al., HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads, Proc. of the VLDB Endowment, pp. 922-933, 2009.
  6. M. Zaharia et al., Spark: Cluster Computing with Working Sets, HotCloud, pp. 10-10, 2010.
  7. M. Song, "Development of Heat Demand Management System for District Heating based on Big Data Platform," Communications of the Korean Institute of Information Scientists and Engineers, pp. 31-33, 2017.
  8. Apache HDFS, [Online] Available: https://hortonworks.com/apache/hdfs/
  9. M. Zaharia et al., Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, Proc. of the 9th USENIX Conference on Networked Systems Design and Implementation, pp. 2-2, 2012.
  10. Apache Kafka, [Online] Available: https://kafka.apache.org/
  11. Apache Flume, [Online] Available: https://flume.apache.org/
  12. Apache Sqoop, [Online] Available: https://sqoop.apache.org/
  13. Apache HBase, [Online] Available: https://hbase.apache.org/
  14. M. Stonebraker, "SQL databases v. NoSQL databases," Communications of the ACM, Vol. 53, No. 4, pp. 10-11, 2010. https://doi.org/10.1145/1721654.1721659
  15. Apache Zookeeper, [Online] Available: https://zookeper.apache.org/
  16. Apache Oozie, [Online] Available: https://oozie.apache.org/
  17. Spark Streaming, [Online] Available: http://spark.apache.org/streaming/
  18. Apache Spark SQL, [Online] Available: http://spark.apache.org/sql/
  19. S. Venkataraman et al., SparkR: Scaling R Programs with Spark, Proc. of the ACM SIGMOD International Conference on Management of Data, pp. 1099-1104, 2016.
  20. GraphX, [Online] Available: http://spark.apache.org/graphx/
  21. MLlib, [Online] Available: http://spark.apache.org/mllib/
  22. Apache Thrift, [Online] Available: https://thrift.apache.org/
  23. Tensorflow, [Online] Available: https://www.tensorflow.org/
  24. Apache Hadoop Yarn, [Online] Available: https://hortonworks.com/apache/yarn/
  25. T. Ivanov and S. Izberovic, Evaluating Hadoop Clusters with TPCx-HS, arXiv: 1509.03486, 2015.
  26. TPCx-HS, [Online] Available: https://www.tpc.org/tpcx-hs/
  27. R. Nambiar et al., Introducing TPCx-HS: The First Industry Standard for Benchmarking Big Data Systems, Performance Characterization and Benchmarking, Traditional to Big Data, Springer, pp. 1-12, 2014.
  28. O. O'Malley, TeraByte Sort on Apache Hadoop, [Online] Available: http://sortbenchmart.org/Yahoo-Hadoop.pdf, pp. 1-3, 2008.
  29. S. Y. Wu et al., "Exergy Transfer Effectiveness on Heat Exchanger for Finite Pressure Drop, Energy, pp. 2110-2120, 2007.