DOI QR코드

DOI QR Code

Efficiently Processing Skyline Query on Multi-Instance Data

  • Chiu, Shu-I (Computer Science, National Chengchi University) ;
  • Hsu, Kuo-Wei (Computer Science, National Chengchi University)
  • Received : 2017.05.16
  • Accepted : 2017.07.20
  • Published : 2017.10.31

Abstract

Related to the maximum vector problem, a skyline query is to discover dominating tuples from a set of tuples, where each defines an object (such as a hotel) in several dimensions (such as the price and the distance to the beach). A tuple, an instance of an object, dominates another tuple if it is equally good or better in all dimensions and better in at least one dimension. Traditionally, skyline queries are defined upon single-instance data or upon objects each of which is associated with an instance. However, in some cases, an object is not associated with a single instance but rather by multiple instances. For example, on a review website, many users assign scores to a product or a service, and a user's score is an instance of the object representing the product or the service. Such data is an example of multi-instance data. Unlike most (if not all) others considering the traditional setting, we consider skyline queries defined upon multi-instance data. We define the dominance calculation and propose an algorithm to reduce its computational cost. We use synthetic and real data to evaluate the proposed methods, and the results demonstrate their utility.

Keywords

References

  1. X. Liu, D. N. Yang, M. Ye, and W. C. Lee, "U-skyline: a new skyline query for uncertain databases," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 4, pp. 945-960, 2013. https://doi.org/10.1109/TKDE.2012.33
  2. S. Borzsony, D. Kossmann, and K. Stocker, "The skyline operator," in Proceedings of the 17th International Conference on Data Engineering, Heidelberg, Germany, 2001, pp. 421-430.
  3. J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, "Skyline with presorting: theory and optimizations," in Intelligent Information Processing and Web Mining, Berlin, Germany: Springer, 2005, pp. 595-604.
  4. K. L. Tan, P. K. Eng, and B. C. Ooi, "Efficient progressive skyline computation," in Proceedings of the 27th International Conference on Very Large Data Bases, Roma, Italy, 2001, pp. 301-310.
  5. D. Papadias, Y. Tao, G. Fu, and B. Seeger, "An optimal and progressive algorithm for skyline queries," in Proceedings of ACM SIGMOD International Conference on Management of Data, San Diego, CA, 2003, pp. 467-478.
  6. S. I. Chiu and K. W. Hsu, "Skyline query processing for rating data," in Proceedings of the Pacific Asia Conference on Information Systems, Chiayi, Taiwan, 2016, pp. 305.
  7. D. Papadias, Y. Tao, G. Fu, and B. Seeger, "Progressive skyline computation in database systems," ACM Transactions on Database Systems, vol. 30, no. 1, pp. 41-82, 2005. https://doi.org/10.1145/1061318.1061320
  8. J. Chomicki, P. Godfrey, J. Gryz, and D. Liang, "Skyline with presorting," in Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, 2003, pp. 717-719.
  9. P. Godfrey, R. Shipley, and J. Gryz, "Maximal vector computation in large data sets," in Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, 2005, pp. 229-240.
  10. S. Zhang, N. Mamoulis, and D. W. Cheung, "Scalable skyline computation using object-based space partitioning," in Proceedings of ACM SIGMOD International Conference on Management of Data, 2009, pp. 483-494.
  11. D. Kossmann, F. Ramsak, and S. Rost, "Shooting stars in the sky: an online algorithm for skyline queries," in Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, China, 2002, pp. 275-286.
  12. I. Bartolini, P. Ciaccia, and M. Patella, "Efficient sort-based skyline evaluation," ACM Transactions on Database Systems, vol. 33, no. 4, pp. 1-45, 2008.
  13. J. L. Bentley, K. L. Clarkson, and D. B. Levine, "Fast linear expected-time algorithms for computing maxima and convex hulls," Algorithmica, vol. 9, no. 2, pp. 168-183, 1993. https://doi.org/10.1007/BF01188711
  14. H. T. Kung, F. Luccio, and F. P. Preparata, "On finding the maxima of a set of vectors," Journal of the ACM, vol. 22, no. 4, pp. 469-476, 1975. https://doi.org/10.1145/321906.321910
  15. F. P. Preparata and M. I. Shamos, "Introduction," in Computational Geometry, New York, NY: Springer, 1985, pp. 1-35.
  16. M. J. Atallah and Y. Qi, "Computing all skyline probabilities for uncertain data," in Proceedings of the 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Providence, RI, 2009, pp. 279-287.
  17. J. Pei, B. Jiang, X. Lin, and Y. Yuan, "Probabilistic skylines on uncertain data," in Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, 2007, pp. 15-26.
  18. B. Jiang, J. Pei, X. Lin, and Y. Yuan, "Probabilistic skylines on uncertain data: model and boundingpruning-refining methods," Journal of Intelligent Information Systems, vol. 38, no. 1, pp. 1-39, 2012. https://doi.org/10.1007/s10844-010-0141-4
  19. M. J. Atallah, Y. Qi, and H. Yuan, "Asymptotically efficient algorithms for skyline probabilities of uncertain data," ACM Transactions on Database Systems (TODS), vol. 36, no. 2, article no. 12, 2011.
  20. Q. Zhang, P. Ye, X. Lin, and Y. Zhang, "Skyline probability over uncertain preferences," in Proceedings of the 16th International Conference on Extending Database Technology, Genoa, Italy, 2013, pp. 395-405.
  21. J. B. Rocha-Junior, A. Vlachou, C. Doulkeridis, and K. Norvag, "Efficient processing of top-k spatial preference queries," in Proceedings of the 37th International Conference on Very Large Data Bases, Seattle, WA, 2010, pp. 93-104.
  22. A. Cosgaya-Lozano, A. Rau-Chaplin, and N. Zeh, "Parallel computation of skyline queries," in Proceedings of the 21st International Symposium on High Performance Computing Systems and Applications, Saskatoon, Canada, 2007, p. 12.
  23. P. Wu, C. Zhang, Y. Feng, B. Y. Zhao, D. Agrawal, and A. El Abbadi, "Parallelizing skyline queries for scalable distribution," in Proceedings of the 10th International Conference on Extending Database Technology, Munich, Germany, 2006, pp. 112-130.
  24. B. Cui, H. Lu, Q. Xu, L. Chen, Y. Dai, and Y. Zhou, "Parallel distributed processing of constrained skyline queries by filtering," in Proceedings of the IEEE 24th International Conference on Data Engineering, Cancun, Mexico, 2008, pp. 546-555.
  25. H. Kohler, J. Yang, and X. Zhou, "Efficient parallel skyline processing using hyperplane projections," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Athens, Greece, 2011, pp. 85-96.
  26. A. Vlachou, C. Doulkeridis, and Y. Kotidis, "Angle-based space partitioning for efficient parallel skyline computation," in Proceedings of the ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, 2008, pp. 227-238.
  27. W. Bryc, The Normal Distribution: Characterizations with Applications. New York, NY: Springer, 1995.
  28. G. Casella and R. L. Berger, Statistical Inference, 2nd ed. Pacific Grove, CA: Duxbury Press, 2001.
  29. H. Wang, Y. Lu, and C. X. Zhai, "Latent aspect rating analysis without aspect keyword supervision," in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, 2011, pp. 618-626.
  30. H. Wang, Y. Lu, and C. Zhai, "Latent aspect rating analysis on review text data: a rating regression approach," in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, 2010, pp. 783-792.