Fig. 1. Example taxonomy trees of Age and Gender
Fig. 2. HDFS Architecture
Fig. 3. Hadoop Yarn Architecture
Fig. 4. Spark Workflow
Fig. 5. The Pseudocode for k-Anonymity in Spark Distributed Environment
Fig. 6. Load Files to HDFS
Fig. 9. K-Anonymity on Spark Map & Reduce
Fig. 7. Generalization Lattice Tree
Fig. 8. Make RDD and Partition, Cache
Fig. 10. The Pseudocode of Map for k-Anonymity
Fig. 11. The Pseudocode of Reduce for k-Anonymity
Fig. 12. Execution time comparison between non-distributed k-anonymity and distributed k-anonymity for varying number of records (k = 500)
Fig. 13. Execution time comparison between non-distributed k-anonymity and distributed k-anonymity for varying number of k value (data size = 0.5 GB)
Fig. 14. K-Anonymity on Spark Distributed System
Table 1. Original table and 2-Anonymized table
Table. 2. Server Specs and Spark Options used in the experiments
