DOI QR코드

DOI QR Code

Query Optimization on Large Scale Nested Data with Service Tree and Frequent Trajectory

  • Wang, Li (Basic Courses Department, Shanghai Institute of Tourism) ;
  • Wang, Guodong (Teaching Affairs Office, Shanghai Institute of Tourism)
  • Received : 2020.09.15
  • Accepted : 2020.12.04
  • Published : 2021.02.28

Abstract

Query applications based on nested data, the most commonly used form of data representation on the web, especially precise query, is becoming more extensively used. MapReduce, a distributed architecture with parallel computing power, provides a good solution for big data processing. However, in practical application, query requests are usually concurrent, which causes bottlenecks in server processing. To solve this problem, this paper first combines a column storage structure and an inverted index to build index for nested data on MapReduce. On this basis, this paper puts forward an optimization strategy which combines query execution service tree and frequent sub-query trajectory to reduce the response time of frequent queries and further improve the efficiency of multi-user concurrent queries on large scale nested data. Experiments show that this method greatly improves the efficiency of nested data query.

Keywords

Acknowledgement

The authors gratefully acknowledge the financial supports from Cultivating Academic Key Teacher project by Shanghai Institute of Tourism (No. E3-0250-20-001-031).

References

  1. A. Patrizio, "IDC: Expect 175 zettabytes of data worldwide by 2025," 2018 [Online]. Available: https://www.networkworld.com/article/3325397/idc-expect-175-zettabytes-of-data-worldwide-by2025.html.
  2. S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis, "Dremel: interactive analysis of web-scale datasets," Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 330-339, 2010. https://doi.org/10.14778/1920841.1920886
  3. L. Wang, D. Peng, and P. Jiang, "Improving the performance of precise query processing on large-scale nested data with UniHash index," International Journal of Database Theory and Application, vol. 8, pp. 111-128, 2015.
  4. J. Ning, J. Liu, and D. Ye, "Novel approach for extracting XML schema definition based on content model graph," Computer Science, vol. 37, no. 6, pp. 179-185, 2010. https://doi.org/10.3969/j.issn.1002-137X.2010.06.042
  5. Y. J. Fan, C. H. Zhang, S. Y. Wang, and Y. F. Hu, "IRST(k,l)-Index: an efficient XML structural index for branching path queries," Journal of Chinese Computer Systems, vol. 30, no. 8, pp. 1546-1554, 2009.
  6. Y. Lu, W. Wang, J. Li, and C. Liu, "XClean: providing valid spelling suggestions for XML keyword queries," in Proceedings of 2011 IEEE 27th International Conference on Data Engineering, Hannover, Germany, 2011, pp. 661-672.
  7. Z. Y. Qin, Y. Tang, H. Z. Xu, and U. Huang, "Study on keyword retrieval based on keyword density for XML data," Journal of Software, vol. 30, no. 4, pp. 1062-1077, 2019.
  8. D. P. Wei and D. Luo, "An XML keyword query algorithm based on interval reserved coding," Computer and Modernization, vol. 2019, no. 10, pp. 17-20, 2019.
  9. B. Kimelfeld and Y. Sagiv, "Matching twigs in probabilistic XML," in Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austrai, 2017, pp. 27-38.
  10. D. Li, Z. Deng, and Z. Li, "Structural join processing for XML based on MapReduce," Journal of Frontiers of Computer Science & Technology, vol. 10, no. 8, pp. 1080-1091, 2016.
  11. S. Rosnan, N. Abd Rahman, S. M. Hatim, and Z. H. Ghul, "Performance evaluation of inverted files, B-Tree and B+ Tree indexing algorithm on Malay text," in Proceedings of 2019 4th International Conference and Workshops on Recent Advances and Innovations in Engineering (ICRAIE), Kedah, Malaysia, 2019, pp. 1-6.
  12. A. Bandura and O. Skaskiv, "Functions analytic in a unit ball of bounded L-index in joint variables," Journal of Mathematical Sciences, vol. 227, no. 1, pp. 1-12, 2017. https://doi.org/10.1007/s10958-017-3570-6
  13. C. Ma, H. Xu, B. Yao, L. Wang, and H. Zhu, "XML temporal query technology based on CB+-tree index," Journal of Chongqing University of Science and Technology (Natural Science Edition), vol. 2016, no. 5, pp. 75-77, 2016.
  14. A. V. Nori, J. Gaur, S. Rai, S. Subramoney, and H. Wang, "Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies," in Proceedings of 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Los Angeles, CA, 2018, pp. 96-109.
  15. R. Tandon, "The capacity of cache aided private information retrieval," in Proceedings of 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, 2017, pp. 1078-1082.
  16. X. L. Qin, W. B. Zhang, O. J. Wei, W. Wang, H. Zhong, and T. Huang, "Progress and challenges of distributed caching techniques in cloud computing," Journal of Software, vol. 24, no. 1, pp. 50-66, 2013. https://doi.org/10.3724/SP.J.1001.2013.04276
  17. M. S. A. Khaleel, S. E. F. Osman, and H. A. N. Sirour, "Proposed ALFUR using intelegent agent comparing with LFU, LRU, size and PCCIA cache replacement techniques," in Proceedings of 2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE), Khartoum, Sudan, 2017, pp. 1-6.
  18. P. Boonma, J. Natwichai, K. Khwanngern, and P. Nantawad, "DAHS: a distributed data-as-a-service framework for data analytics in healthcare," in Advances on P2P, Parallel, Grid, Cloud and Internet Computing. Cham, Switzerland: Springer, 2018, pp. 486-495.
  19. D. Jiang and L. Li, "Frequent itemset mining algorithm based on UFP-tree," Computer Technology and Development, vol. 2019, no. 10, pp. 175-180, 2019.
  20. C. Zhao, Z. Sun, and J. Zhang, "Frequent subtree mining based on projected branch," Journal of Computer Research and Development, vol. 43, no. 3, pp. 456-462, 2006. https://doi.org/10.1360/crad20060313
  21. S. Hido and H. Kawano, "AMIOT: induced ordered tree mining in tree-structured databases," in Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), Houston, TX, 2005, pp. 170-177.
  22. F. Luccio, A. Mesa Enriquez, P. Olivares Rieumont, and L. Pagli, "Bottom-up subtree isomorphism for unordered labeled trees," Dipartimento di Informatica, Universita di Pisa, Italy, 2004.
  23. Y. F. Yang, D. Y. Wang, and Y. J. Hu, "Positive and negative association rule mining on XML data streams in database as a service concept," Manufacturing Automation, vol. 34, no. 10, pp. 109-112, 2012. https://doi.org/10.3969/j.issn.1009-0134.2012.5(x).32