Impact of snowball sampling ratios on network characteristics estimation: A case study of Cyworld

스노우볼 샘플링 비율에 따른 네트워크의 특성 변화: 싸이월드의 사례 연구

  • 곽해운 (한국과학기술원 전산학과) ;
  • 한승엽 (한국과학기술원 전산학과) ;
  • 안용열 (한국과학기술원 물리학과) ;
  • 문수복 (한국과학기술원 전산학과) ;
  • 정하웅 (한국과학기술원 물리학과)
  • Published : 2006.10.20

Abstract

Today's social networking services have tens of millions of users, and are growing fast. Their sheer size poses a significant challenge in capturing and analyzing their topological characteristics. Snowball sampling is a popular method to crawl and sample network topologies, but requires a high sampling ratio for accurate estimation of certain metrics. In this work, we evaluate how close topological characteristics of snowball sampled networks are to the complete network. Instead of using a synthetically generated topology, we use the complete topology of Cyworld ilchon network. The goal of this work is to determine sampling ratios for accurate estimation of key topological characteristics, such as the degree distribution, the degree correlation, the assortativity, and the clustering coefficient.

Keywords