참고문헌
- Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., et al. (2015). Spark SQL: Relational data processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1383-1394. ACM.
- Bahmani, B., Moseley, B., Vattani, A., Kumar, R., and Vassilvitskii, S. (2012). Scalable k-means++. In Proceedings of the VLDB Endowment, 5, 622-633.
- Dean, J. and Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51, 107-113.
- dplyr (nd). dplyr: A grammar of data manipulation. https://github.com/hadley/dplyr. Accessed on: 2016-08-27.
- H2O.ai (nda). H2O.ai - AI for Business. http://www.h2o.ai/. Accessed on: 2016-08-30.
- H2O.ai (ndb). Sparkling Water. http://www.h2o.ai/product/sparkling-water/. Accessed on: 2016-08-30.
- HBase (nd). Apache HBase. https://hbase.apache.org. Accessed on: 2016-08-27.
- Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A. D., Katz, R. H., Shenker, S., and Stoica, I. (2011). Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 13th USENIX conference on Networked Systems Design and Implementation. USENIX Association.
- Hunter, T., Moldovan, T., Zaharia, M., Merzgui, S., Ma, J., Franklin, M. J., Abbeel, P., and Bayen, A. M. (2011). Scaling the mobile millennium system in the cloud. In Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM.
- Kim, H., Park, J., Jang, J., and Yoon, S. (2016). DeepSpark: Spark-based deep learning supporting asynchronous updates and Caffe compatibility. arXiv preprint arXiv:1602.08191.
- Kraska, T., Talwalkar, A., Duchi, J. C., Griffith, R., Franklin, M. J., and Jordan, M. I. (2013). MLbase: A distributed machine-learning system. In The 6th biennial Conference on Innovative Data Systems Research.
- Lakshman, A. and Malik, P. (2010). Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44, 35-40.
- Lehoucq, R. B., Sorensen, D. C., and Yang, C. (1998). ARPACK users' guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, 6, SIAM.
- Meng, X., Bradley, J., Yuvaz, B., Sparks, E., Venkataraman, S., Liu, D., et al. (2016). MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17, 1-7.
- Moritz, P., Nishihara, R., Stoica, I., and Jordan, M. I. (2015). SparkNet: Training deep networks in Spark. arXiv preprint arXiv:1511.06051.
- Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
- RStudio (nd). sparklyr-R interface for Apache Spark. http://spark.rstudio.com. Accessed on: 2016-08-27.
- Scala (nd). The Scala programming language. http://www.scala-lang.org. Accessed on: 2016-08-27.
- Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The Hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pages 1-10. IEEE.
- Spark (nd). Apache spark. https://spark.apache.org/. Accessed on: 2016-08-27.
- Spark-cassandra-connector (nd). Spark Cassandra Connector. https://github.com/datastax/spark-cassandraconnector. Accessed on: 2016-08-30.
- Spark-sklearn (nd). Scikit-learn integration package for Apache Spark. https://github.com/databricks/sparksklearn. Accessed on: 2016-08-30.
- Spark-tfocs (nd). TFOCS for Spark: A community port of TFOCS for Apache Spark. https://github.com/databricks/spark-tfocs. Accessed on: 2016-08-27.
- Spark Wiki (nda). Committers. https://cwiki.apache.org/confluence/display/SPARK/Committers. Accessed on: 2016-08-27.
- Spark Wiki (ndb). Powered By Spark. https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark. Accessed on: 2016-08-27.
- Sparkit-learn (nd). Sparkit-learn. https://github.com/lensacom/sparkit-learn. Accessed on: 2016-08-30.
- SparkR (nd). SparkR (R on spark). https://spark.apache.org/docs/latest/sparkr.html. Accessed on: 2016-08-27.
- TopicModeling (nd). Topic modeling on Apache Spark. https://github.com/intel-analytics/TopicModeling. Accessed on: 2016-08-30.
- Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., et al. (2013). Apache Hadoop YARN: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing, ACM.
- Xin, R., Deyhim, P., Ghodsi, A., Meng, X., and Zaharia, M. (2014a). GraySort on Apache Spark by Databricks. GraySort Competition.
- Xin, R. S., Crankshaw, D., Dave, A., Gonzalez, J. E., Franklin, M. J., and Stoica, I. (2014b). GraphX: Unifying data-parallel and graph-parallel analytics. arXiv preprint arXiv:1402.2394.
- Zadeh, R. B., Meng, X., Ulanov, A., Yavuz, B., Pu, L., Venkataraman, S., Sparks, E., Staple, A., and Zaharia, M. (2016). Matrix computations and optimization in Apache Spark. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 31-38), ACM.
- Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M. J., Shenker, S., and Stoica, I. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association.
- Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. (2010). Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing. USENIX Association.
- Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., and Stoica, I. (2013). Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (pp. 423-438), ACM.
- Zeppelin (nd). Apache Zeppelin. https://zeppelin.apache.org/. Accessed on: 2016-08-30.