DOI QR코드

DOI QR Code

Finding Top-k Answers in Node Proximity Search Using Distribution State Transition Graph

  • Park, Jaehui (SW & Contents Research Laboratory, ETRI) ;
  • Lee, Sang-Goo (Computer Science and Engineering Department, Seoul National University)
  • Received : 2015.03.11
  • Accepted : 2016.04.04
  • Published : 2016.08.01

Abstract

Considerable attention has been given to processing graph data in recent years. An efficient method for computing the node proximity is one of the most challenging problems for many applications such as recommendation systems and social networks. Regarding large-scale, mutable datasets and user queries, top-k query processing has gained significant interest. This paper presents a novel method to find top-k answers in a node proximity search based on the well-known measure, Personalized PageRank (PPR). First, we introduce a distribution state transition graph (DSTG) to depict iterative steps for solving the PPR equation. Second, we propose a weight distribution model of a DSTG to capture the states of intermediate PPR scores and their distribution. Using a DSTG, we can selectively follow and compare multiple random paths with different lengths to find the most promising nodes. Moreover, we prove that the results of our method are equivalent to the PPR results. Comparative performance studies using two real datasets clearly show that our method is practical and accurate.

Keywords

References

  1. B.-W. On et al., "Comparative Study of Name Disambiguation Problem using a Scalable Blocking-Based Framework," ACM/IEEE-CS Joint Conf. Digi. Libraries, NY, USA, June 7-11, 2005, pp. 344-353.
  2. Z. Gyongyi, H. Garcia-Molina, and J. Pedersen, "Combating Web Spam with Trustrank," Int. Conf. Very Large Databases, Toronto, Canada, Aug. 29-Sept. 3, 2004, pp. 576-587.
  3. H. Yanagimoto and M. Yoshioka, "Relationship Strength Estimation for Social Media Using Folksonomy and Network Analysis," IEEE Int. Conf. Fuzzy Syst., Brisbane, Austrailia, June 10-15, 2012, pp. 1-8.
  4. G. Jeh and J. Widom, "Scaling Personalized Web Search," Int. Conf. World Wide Web, Budapest, Hungary, May 20-24, 2003, pp. 271-279.
  5. H. Tong, C. Faloutsos, and J.Y. Pan, "Fast Random Walk with Restart and its Applications," IEEE Int. Conf. Data Mining, Hong Kong, Dec. 18-22, 2006, pp. 613-622.
  6. G. Jeh and J. Widom, "SimRank: A Measure of Structural-Context Similarity," ACM Int. Conf. Knowl. Discovery Data Mining, Alberta, Canada, July 23-26, 2002, pp. 538-543.
  7. X. Ye and T. Sakurai, "Robust Similarity Measure for Spectral Clustering Based on Shared Neighbors," ETRI J., vol. 38, no. 3, June 2016. pp. 540-550. https://doi.org/10.4218/etrij.16.0115.0517
  8. S. Govindaraj and K. Gopalakrishnan, "Intensified Sentiment Analysis of Customer Product Reviews Using Acoustic and Textual Features," ETRI J., vol. 38, no. 3, June 2016, pp. 494-501. https://doi.org/10.4218/etrij.16.0115.0684
  9. H. Jeon and S. Lee, "Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation," ETRI J., vol. 38, no. 3, June 2016, pp. 487-493. https://doi.org/10.4218/etrij.16.0115.0499
  10. B. Bahmani, A. Chowdhury, and A. Goel, "Fast Incremental and Personalized Pagerank," Int. Conf. Very Large Databases, Singapore, Sept. 13-17, 2010, pp. 173-184.
  11. D. Fogaras et al., "Towards Scaling Fully Personalized Pagerank: Algorithms, Lower Bounds, and Experiements," Internet Math., vol. 2, no. 3, 2005, pp. 333-358 https://doi.org/10.1080/15427951.2005.10129104
  12. Y. Fujiwara et al., "Efficient Personalized Pagerank with Accuracy Assurance," ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Beijing, China, Aug. 12-16, 2012, pp. 15-23.
  13. M. Gupta, A. Pathak, and S. Chakrabarti, "Fast Algorithms for Top-k Personalized Pagerank Queries," Int. Conf. World Wide Web, Beijing, China, Apr. 21-25, 2008, pp. 1225-1226.
  14. F. Zhu et al., "Incremental and Accuracy-Aware Personalized Pagerank through Scheduled Approximation," Proc. VLDB Endowment, vol. 6, no. 6, Apr. 2013, pp. 481-492. https://doi.org/10.14778/2536336.2536348
  15. B. Bahmani, K. Chakrabarti, and D. Xin, "Fast Personalized Pagerank on Mapreduce," ACM Int. Conf. Manag. Data, Athens, Greece, June 12-16, 2011, pp. 973-984.
  16. T.H. Haveliwala, "Topic-Sensitive Pagerank," Int. Conf. World Wide Web, Honolulu, HI, USA, May 7-11, 2002, pp. 517-526.
  17. P. Berkhin, "Bookmark-Coloring Algorithm for Personalized Pagerank Computing," Internet Math., vol. 3, no. 1, 2006, pp. 41-61. https://doi.org/10.1080/15427951.2006.10129116
  18. K. Avrachenkov et al., "Quick Detection of Top-k Personalized Pagerank Lists," Int. Workshop Algorithms Models Web-Graph, Atlanta, GA, USA, May 27-29, 2011, pp. 50-61.
  19. S. Brin and L. Page. "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Int. Conf. World Wide Web, Brisbane, Australia, Apr.14-18, 1998, pp. 107-117.
  20. G. Strang. "Introduction to Linear Algebra," Welleselly, MA, USA: Wellesley-Cambridge Press, 2009.
  21. J. Huang, X. Zhang, Y. Zhang, X. Zou, and L. Zeng, "Speech Denoising via Low-Rank and Sparse Matrix Decomposition," ETRI J., vol. 36, no. 1, Feb. 2014, pp. 167-170. https://doi.org/10.4218/etrij.14.0213.0033
  22. S.D. Kamvar et al., "Extrapolation Methods for Accelerating Pagerank Computations," Int. Conf. World Wide Web, Budapest, Hungary, May 20-24, 2003, pp. 261-270.
  23. R. Andersenet al., "Local Computation of Pagerank Contributions," Int. Conf. Algorithms Models Web-Graph, San Diego, CA, USA, Dec. 11-12, pp. 150-165.
  24. Y. Fujiwara et al., "Fast and Exact Top-k Search for Random Walk with Restart," Proc. VLDB Endowment, vol. 5, no. 3, 2012, pp. 442-453. https://doi.org/10.14778/2140436.2140441
  25. Y. Wu et al., "Fast and Unified Local Search for Random Walk Based k-Nearest-Neighbor Query in Large Graphs," ACM Int. Conf. Manag. Data, Snowbird, UT, USA, June 22-27, 2014, pp. 1139-1150.
  26. A.W.Yu, N. Mamoulis, and H. Su, "Reverse Top-k Search Using Random Walk with Restart," Proc. VLDB Endowment, vol. 7, no. 5, 2014, pp. 401-412. https://doi.org/10.14778/2732269.2732276
  27. J. Vuillemin, "A Data Structure for Manipulating Priority Queues," Commun. ACM, vol. 21, no.4, Apr. 1978, pp. 309-315. https://doi.org/10.1145/359460.359478
  28. Y. Sun et al., "PathSim: Meta Path-Based Top-k Similarity Search in Heterogeneous Information Networks," Int. Conf. Very Large Databases, Seattle, WA, USA, Aug. 29-Sept. 3, 2011, pp. 992-1003.