과제정보
This study was supported by a grant from the JSPS Grants-in-Aid for Scientific Research (No. 20K11836) to Kai Cheng.
참고문헌
- A. Adir, R. Levy, and T. Salman, "Dynamic test data generation for data intensive applications," in Hardware and Software: Verification and Testing. Heidelberg, Germany: Springer, 2012, pp. 219-233.
- T. S. Buda, T. Cerqueus, J. Murphy, and M. Kristiansen, "VFDS: an application to generate fast sample databases," in Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, Shanghai, China, 2014, pp. 2048-2050.
- T. Rabl, M. Frank, H. M. Sergieh, and H. Kosch, "A data generator for cloud-scale benchmarking," in Performance Evaluation, Measurement and Characterization of Complex Systems. Heidelberg, Germany: Springer, 2011, pp. 41-56.
- T. Rabl, M. Danisch, M. Frank, S. Schindler, and H. A. Jacobsen, "Just can't get enough: synthesizing big data," in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 2015, pp. 1457-1462.
- K. Taneja, Y. Zhang, and T. Xie, "MODA: automated test generation for database applications via mock objects," in Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, Antwerp, Belgium, 2010, pp. 289-292.
- H. Wu, Y. Ning, P. Chakraborty, J. Vreeken, N. Tatti, and N. Ramakrishnan, "Generating realistic synthetic population datasets," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 12, no. 4, pp. 1-22, 2018. https://doi.org/10.1145/3182383
- K. Mason, S. Vejdan, and S. Grijalva, "An "on the fly" framework for efficiently generating synthetic big data sets," in Proceedings of 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, 2019, pp. 3379-3387.
- B. C. Fung, K. Wang, R. Chen, and P. S. Yu, "Privacy-preserving data publishing: a survey of recent developments," ACM Computing Surveys, vol. 42, no. 4, pp. 1-53, 2010. https://doi.org/10.1145/1749603.1749605
- M. Elliot and J. Domingo-Ferrer, "The future of statistical disclosure control," 2018 [Online]. Available: https://arxiv.org/abs/1812.09204.
- A. Dries, "Declarative data generation with problog," in Proceedings of the 6th International Symposium on Information and Communication Technology (SoICT), Hue City, Vietnam, 2015, pp. 17-24.
- D. C. Ince, "The automatic generation of test data," The Computer Journal, vol. 30, no. 1, pp. 63-69, 1987. https://doi.org/10.1093/comjnl/30.1.63
- J. E. Hoag and C. W. Thompson, "A parallel general-purpose synthetic data generator," ACM SIGMOD Record, vol. 36, no. 1, pp. 19-24, 2007. https://doi.org/10.1145/1276301.1276305
- L. Burnett, K. Barlow-Stewart, A. L. Proos, and H. Aizenberg, "The "GeneTrustee": a universal identification system that ensures privacy and confidentiality for human genetic databases," Journal of Law and Medicine, vol. 10, no. 4, pp. 506-513, 2003.
- H. Surendra and H. S. Mohan, "A review of synthetic data generation methods for privacy preserving data publishing," International Journal of Scientific & Technology Research, vol. 6, no. 3, pp. 95-101, 2017.
- A. Dandekar, R. A. M. Zen, and S. Bressan, "Comparative evaluation of synthetic data generation methods," 2017 [Online]. Available: https://sgcsc.sg/wp-content/uploads/sites/10/2020/05/RF-04.pdf.
- J. Fan, T. Liu, G. Li, J. Chen, Y. Shen, and X. Du, "Relational data synthesis using generative adversarial networks: a design space exploration," Proceedings of the VLDB Endowment, vol. 13, no. 11, pp. 1962-1975, 2020. https://doi.org/10.14778/3407790.3407802
- R. Malhotra and M. Garg, "An adequacy based test data generation technique using genetic algorithms," Journal of Information Processing Systems, vol. 7, no. 2, pp. 363-384, 2011. https://doi.org/10.3745/JIPS.2011.7.2.363
- S. Sabharwal and M. Aggarwal, "Test set generation for pairwise testing using genetic algorithms," Journal of Information Processing Systems, vol. 13, no. 5, pp. 1089-1102, 2017.
- P. Fisher, N. Aljohani, and J. Baek, "Generation of finite inductive, pseudo random, binary sequences," Journal of Information Processing Systems, vol. 13, no. 6, pp. 1554-1574, 2017. https://doi.org/10.3745/JIPS.01.0021
- J. Kwak and Y. Sung, "Path generation method of UAV autopilots using max-min algorithm," Journal of Information Processing Systems, vol. 14, no. 6, pp. 1457-1463, 2018.
- J. Gray, P. Sundaresan, S. Englert, K. Baclawski, and P. J. Weinberger, "Quickly generating billion-record synthetic databases," in Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, MN, 1994, pp. 243-252.
- J. M. Stephens and M. Poess, "MUDD: a multi-dimensional data generator," ACM SIGSOFT Software Engineering Notes, vol. 29, no. 1, pp. 104-109, 2004. https://doi.org/10.1145/974043.974060
- N. Bruno and S. Chaudhuri, "Flexible database generators," in Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, 2005, pp. 1097-1107.
- R. Cox, "Regular expression matching can be simple and fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)," 2007 [Online]. Available: https://swtch.com/~rsc/regexp/regexp1.html.
- M. D. McIlroy, "Enumerating the strings of regular languages," Journal of Functional Programming, vol. 14, no. 5, pp. 503-518, 2004. https://doi.org/10.1017/S0956796803004982
- K. Thompson, "Programming techniques: regular expression search algorithm," Communications of the ACM, vol. 11, no. 6, pp. 419-422, 1968. https://doi.org/10.1145/363347.363387
- M. Poess and C. Floyd, "New TPC benchmarks for decision support and web commerce," ACM SIGMOD Record, vol. 29, no. 4, pp. 64-71, 2000. https://doi.org/10.1145/369275.369291
- A. Crolotte and A. Ghazal, "Introducing skew into the TPC-H benchmark," in Topics in Performance Evaluation, Measurement and Characterization. Heidelberg, Germany: Springer, 2012, pp. 137-145.
- D. J. DeWitt, J. F. Naughton, D. A. Schneider, and S. Seshadri, "Practical skew handling in parallel joins," in Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), Vancouver, Canada, 1992, pp. 27-40.
- E. Lo, N. Cheng, W. W. Lin, W. K. Hon, and B. Choi, "MyBenchmark: generating databases for query workloads," The VLDB Journal, vol. 23, pp. 895-913, 2014. https://doi.org/10.1007/s00778-014-0354-1
- M. O. Rabin and D. Scott, "Finite automata and their decision problems," IBM Journal of Research and Development, vol. 3, no. 2, pp. 114-125, 1959. https://doi.org/10.1147/rd.32.0114
- M. Cognetta, Y. S. Han, and S. C. Kwon, "Incremental computation of infix probabilities for probabilistic finite automata," in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 2732-2741.
- Github, "ReverseRegex: use regular expressions to generate text strings," 2020 [Online]. Available: https://github.com/icomefromthenet/ReverseRegex.
- H. Ping, J. Stoyanovich, and B. Howe, "Datasynthesizer: privacy-preserving synthetic datasets," in Proceedings of the 29th International Conference on Scientific and Statistical Database Management, Chicago, IL, 2017, pp. 1-5.
- J. Drechsler, "Using support vector machines for generating synthetic datasets," in Privacy in Statistical Databases. Heidelberg, Germany: Springer, 2010, pp. 148-161.
- G. Caiola and J. P. Reiter, "Random forests for generating partially synthetic, categorical data," Transactions on Data Privacy, vol. 3, no. 1, pp. 27-42, 2010.
- X. Wu, Y. Wang, S. Guo, and Y. Zheng, "Privacy preserving database generation for database application testing," Fundamenta Informaticae, vol. 78, no. 4, pp. 595-612, 2007.
- J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao, "Privbayes: private data release via Bayesian networks," in Proceedings of International Conference on Management of Data (SIGMOD), Snowbird, UT, 2014, pp. 1423-1434.
- N. C. Abay, Y. Zhou, M. Kantarcioglu, B. Thuraisingham, and L. Sweeney, "Privacy preserving synthetic data release using deep learning," in Machine Learning and Knowledge Discovery in Databases. Cham, Switzerland: Springer, 2019, pp. 510-526.