DOI QR코드

DOI QR Code

Twitter Crawling System

  • Received : 2015.09.20
  • Accepted : 2015.10.12
  • Published : 2015.09.30

Abstract

We are living in epoch of information when Internet touches all aspects of our lives. Therefore, it provides a plenty of services each of which benefits people in different ways. Electronic Mail (E-mail), File Transfer Protocol (FTP), Voice/Video Communication, Search Engines are bright examples of Internet services. Between them Social Network Services (SNS) continuously gain its popularity over the past years. Most popular SNSs like Facebook, Weibo and Twitter generate millions of data every minute. Twitter is one of SNS which allows its users post short instant messages. They, 100 million, posted 340 million tweets per day (2012)[1]. Often big amount of data contains lots of noisy data which can be defined as uninteresting and unclassifiable data. However, researchers can take advantage of such huge information in order to analyze and extract meaningful and interesting features. The way to collect SNS data as well as tweets is handled by crawlers. Twitter crawler has recently emerged as a great tool to crawl Twitter data as well as tweets. In this project, we develop Twitter Crawler system which enables us to extract Twitter data. We implemented our system in Java language along with MySQL. We use Twitter4J which is a java library for communicating with Twitter API. The application, first, connects to Twitter API, then retrieves tweets, and stores them into database. We also develop crawling strategies to efficiently extract tweets in terms of time and amount.

Keywords

References

  1. https://en.wikipedia.org/ , 2015
  2. Z. Xu, R. Lu., L. Xiang and Q. Yang, "Discovering User Interest on Twitter with a Modified Author-Topic Model," IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, pp. 422-429, 2011.
  3. X. Wang, L. Tokarchuk, F. Cuadrado and S. Poslad, " Exploiting Hashtags for Adaptive Microblog Crawling," IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 311-315, 2013.
  4. Y. Kim and K. Shim, "TWITOBI: A Recommendation System for Twitter Using Probabilistic Modeling," 11th IEEE International Conference on Data Mining, pp. 340-349, 2011.
  5. M. Yang and H. Rim, "Identifying interesting Twitter contents using topical analysis," Expert Systems with Applications, Vol. 41, pp. 4330-4336, 2014. https://doi.org/10.1016/j.eswa.2013.12.051
  6. M. Yigit, B. Bilgin and A. Karahoca, "Extended topology based recommendation system for unidirectional social networks," Expert Systems with Applications, Vol. 42, pp. 3653-3661, 2015. https://doi.org/10.1016/j.eswa.2014.12.043
  7. S. Saif, Y. He, Z. Fernandez and H. Alani, " Contextual semantics for sentiment analysis of Twitter," Information Processing and Management, 2015.
  8. L. Cagliero, T. Cerquitelli, P. Garza and Grimaudo, " Twitter data analysis by means of Strong Flipping Generalized Itemsets," Journal of Systems and Software, Vol. 94, pp. 16-29, 2014. https://doi.org/10.1016/j.jss.2014.03.060
  9. https://dev.twitter.com/rest/public/rate-limits, 2015