폭소노미 사이트를 위한 랭킹 프레임워크 설계: 시맨틱 그래프기반 접근

A Folksonomy Ranking Framework: A Semantic Graph-based Approach

  • Park, Hyun-Jung (Institute of Management Research, Seoul National University) ;
  • Rho, Sang-Kyu (Graduate School of Business, Seoul National University)
  • 투고 : 2011.02.18
  • 심사 : 2011.06.28
  • 발행 : 2011.06.30

초록

In collaborative tagging systems such as Delicious.com and Flickr.com, users assign keywords or tags to their uploaded resources, such as bookmarks and pictures, for their future use or sharing purposes. The collection of resources and tags generated by a user is called a personomy, and the collection of all personomies constitutes the folksonomy. The most significant need of the folksonomy users Is to efficiently find useful resources or experts on specific topics. An excellent ranking algorithm would assign higher ranking to more useful resources or experts. What resources are considered useful In a folksonomic system? Does a standard superior to frequency or freshness exist? The resource recommended by more users with mere expertise should be worthy of attention. This ranking paradigm can be implemented through a graph-based ranking algorithm. Two well-known representatives of such a paradigm are Page Rank by Google and HITS(Hypertext Induced Topic Selection) by Kleinberg. Both Page Rank and HITS assign a higher evaluation score to pages linked to more higher-scored pages. HITS differs from PageRank in that it utilizes two kinds of scores: authority and hub scores. The ranking objects of these pages are limited to Web pages, whereas the ranking objects of a folksonomic system are somewhat heterogeneous(i.e., users, resources, and tags). Therefore, uniform application of the voting notion of PageRank and HITS based on the links to a folksonomy would be unreasonable, In a folksonomic system, each link corresponding to a property can have an opposite direction, depending on whether the property is an active or a passive voice. The current research stems from the Idea that a graph-based ranking algorithm could be applied to the folksonomic system using the concept of mutual Interactions between entitles, rather than the voting notion of PageRank or HITS. The concept of mutual interactions, proposed for ranking the Semantic Web resources, enables the calculation of importance scores of various resources unaffected by link directions. The weights of a property representing the mutual interaction between classes are assigned depending on the relative significance of the property to the resource importance of each class. This class-oriented approach is based on the fact that, in the Semantic Web, there are many heterogeneous classes; thus, applying a different appraisal standard for each class is more reasonable. This is similar to the evaluation method of humans, where different items are assigned specific weights, which are then summed up to determine the weighted average. We can check for missing properties more easily with this approach than with other predicate-oriented approaches. A user of a tagging system usually assigns more than one tags to the same resource, and there can be more than one tags with the same subjectivity and objectivity. In the case that many users assign similar tags to the same resource, grading the users differently depending on the assignment order becomes necessary. This idea comes from the studies in psychology wherein expertise involves the ability to select the most relevant information for achieving a goal. An expert should be someone who not only has a large collection of documents annotated with a particular tag, but also tends to add documents of high quality to his/her collections. Such documents are identified by the number, as well as the expertise, of users who have the same documents in their collections. In other words, there is a relationship of mutual reinforcement between the expertise of a user and the quality of a document. In addition, there is a need to rank entities related more closely to a certain entity. Considering the property of social media that ensures the popularity of a topic is temporary, recent data should have more weight than old data. We propose a comprehensive folksonomy ranking framework in which all these considerations are dealt with and that can be easily customized to each folksonomy site for ranking purposes. To examine the validity of our ranking algorithm and show the mechanism of adjusting property, time, and expertise weights, we first use a dataset designed for analyzing the effect of each ranking factor independently. We then show the ranking results of a real folksonomy site, with the ranking factors combined. Because the ground truth of a given dataset is not known when it comes to ranking, we inject simulated data whose ranking results can be predicted into the real dataset and compare the ranking results of our algorithm with that of a previous HITS-based algorithm. Our semantic ranking algorithm based on the concept of mutual interaction seems to be preferable to the HITS-based algorithm as a flexible folksonomy ranking framework. Some concrete points of difference are as follows. First, with the time concept applied to the property weights, our algorithm shows superior performance in lowering the scores of older data and raising the scores of newer data. Second, applying the time concept to the expertise weights, as well as to the property weights, our algorithm controls the conflicting influence of expertise weights and enhances overall consistency of time-valued ranking. The expertise weights of the previous study can act as an obstacle to the time-valued ranking because the number of followers increases as time goes on. Third, many new properties and classes can be included in our framework. The previous HITS-based algorithm, based on the voting notion, loses ground in the situation where the domain consists of more than two classes, or where other important properties, such as "sent through twitter" or "registered as a friend," are added to the domain. Forth, there is a big difference in the calculation time and memory use between the two kinds of algorithms. While the matrix multiplication of two matrices, has to be executed twice for the previous HITS-based algorithm, this is unnecessary with our algorithm. In our ranking framework, various folksonomy ranking policies can be expressed with the ranking factors combined and our approach can work, even if the folksonomy site is not implemented with Semantic Web languages. Above all, the time weight proposed in this paper will be applicable to various domains, including social media, where time value is considered important.

키워드

과제정보

연구 과제 주관 기관 : Seoul National University

참고문헌

  1. Abel, F., Henze, N., and Krause, D., "Analyzing Ranking Algorithms in Folksonomy Systems," Technical Report, L3S Research Center, 2008.
  2. Abel, F., Henze, N., and Krause, D., "Ranking in Folksonomy Systems: Can Context Help?," Proceeding of the 17th ACM Conference on Information and Knowledge Management, 2008, pp. 1429-1430.
  3. Bao, S., Wu, S., Fei, B., Xue, G., Su, z., and Yu, Y., "Optimizing Web Search Using Social Annotations," WWW, Banff, Alberta, Canada, 2007.
  4. Brin, S. and Page, L., "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Computer Networks and ISDN Systems, Vol. 30, No. 1-7, 1998, pp. 107-119. https://doi.org/10.1016/S0169-7552(98)00110-X
  5. Dom, B., Eiron, I., Cozzi, A., and Zhang, Y., "Graph-based Ranking Algorithms for Email Expertise Analysis," In Proc. of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, USA, 2003, pp. 42-48.
  6. Feltovich, P.J., Prietula, M.J., and Ericsson, K.A., "Studies of Expertise from Psychological Perspectives," In the Cambridge Handbook of Expertise and Expert Performance, Cambridge University Press, USA, 2006, pp. 41-68.
  7. Hammond, T., Hannay, T., Lund, B., and Scott, J., "Social Bookmarking Tools(i): A General Review," D-Lib Magazine, Vol. 11, No. 4, 2005.
  8. Haveliwala, T.H., "Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search," IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 4, 2003, pp. 784-796. https://doi.org/10.1109/TKDE.2003.1208999
  9. Haveliwala, T.H., Efficient Computation of PageRank, Unpublished Manuscript, Stanford University, 1999.
  10. Hayes, C. and Avesani, P., "Using Tags and Clustering to Identify Topic-relevant Blogs," In International Conference on Weblogs and Social Media, 2007.
  11. Hayes, C., Avesani, P., and Veeramachaneni, S., "An Analysis of the Use of Tags in a Blog Recommender Systems," In Proceedings of the IJCAI, 2007.
  12. Hotho, A., Jaschke, R., Schmitz, C., and Stumme, G., "Information Retrieval in Folksonomies: Search and Ranking," In York Sure and John Domingue (Eds.), The Semantic Web: Research and Applications, LNAI, Heidelberg, Springer, Vol. 4011, 2006, pp. 411-426.
  13. Hotho, A., Jaschke, R., Schmitz, C., and Stumme, G., "FolkRank: A Ranking Algorithm for Folksonomies," In Proc. of FGIR 2006.
  14. Kendall, M.G., "A New Measure of Rank Correlation," Biometrika, Vol. 30, No. 1-2, 1938, pp. 81-93. https://doi.org/10.1093/biomet/30.1-2.81
  15. Kleinberg, J., "Authoritative Sources in a Hyperlinked Environment," Journal of the ACM, Vol. 46, No. 5, 1999, pp. 604-632. https://doi.org/10.1145/324133.324140
  16. Klyne, G. and Carroll, J. (Eds.), "Resource Description Framework (RDF): Concepts and Abstract Syntax," W3C Recommendation, 2004.
  17. Manola, F. and Miller, E. (Eds.), "RDF Primer," W3C Recommendation, 2004.
  18. Mika, P., "Ontologies are us: A Unified Model of Social Networks and Semantics," Journal of Web Semantics, Vol. 5, No. 1, 2007, pp. 5-15. https://doi.org/10.1016/j.websem.2006.11.002
  19. Noll, M.G. and Meinel, C., "Exploring Social Annotations for Web Document Classification," In Proc. of ACM Symposium on Applied Computing, Fortaleza, Brazil, 2008, pp. 2315-2320.
  20. Noll, M.G., Yeung, C.A., Gibbins, N., Meinel, C., and Shadbolt, N., "Telling Experts from Spammers: Expertise Ranking in Folksonomies," In SIGIR: Proceedings of the 32nd International ACM SIGIR Conference on Research and development in Information Retrieval NY, USA: ACM, 2009, pp. 612-619.
  21. Orlicki, J.I., Fierens, P.I., and Alvarez-Hamelin, J.I., "Faceted Ranking in Collaborative Tagging Systems: Efficient Algorithms for Ranking Users Based on a Set of Tags," In INSTICC Press, WEBIST 2009: Proceedings of the 5th International Conference on Web Information Systems and Technologies, Lisboa, Portugal, 2009, pp. 626-633.
  22. Page, L., Brin, S., Motwani, R., and Winograd, T., The Page Rank Citation Ranking: Bringing Order to the Web, Technical Report, Stanford University, 1998.
  23. Park, H., Rho, S., and Park, J., "A Link-Based Ranking Algorithm for Semantic Web Resources: A Class-Oriented Approach Independent of Link Direction," Journal of Database Management, Vol. 22, No. 1, 2011, pp. 1-25.
  24. Wang, J., Chen, Z., Tao, L., Ma, W.-Y., and Wenyin, L., "Ranking User's Relevance to a Topic through Link Analysis on Web Logs," In WIDM: Proceedings of the 4th International Workshop on Web Information and Data Management, USA, 2002, pp. 49-54.
  25. Wetzker, R., Zimmermann, C., and Bauckhage, C., "Analyzing Social Bookmarking Systems: A Del.icio.us Cookbook," In Proc. of Mining Social Data Workshop, 2008, pp. 26-30.
  26. Zhang, J., Ackerman, M.S., and Adamic, L., "Expertise Networks in Online Communities: Structure and Algorithms," In Proc. of WWW Conference, Ban, Canada, 2007, pp. 221-230.
  27. Zhou, D., Orshanskiy, S.A., Zha, H., and Giles, C.L., "Co-ranking Authors and Documents in a Heterogeneous Network," In Proc. of 7th IEEE International Conference on Data Mining, Washington, USA, 2007, pp. 739-744.