DOI QR코드

DOI QR Code

Semantic-Based K-Means Clustering for Microblogs Exploiting Folksonomy

  • Heu, Jee-Uk (Dept. of Computer Science and Engineering, Hanyang University)
  • 투고 : 2018.07.17
  • 심사 : 2018.08.25
  • 발행 : 2018.12.31

초록

Recently, with the development of Internet technologies and propagation of smart devices, use of microblogs such as Facebook, Twitter, and Instagram has been rapidly increasing. Many users check for new information on microblogs because the content on their timelines is continually updating. Therefore, clustering algorithms are necessary to arrange the content of microblogs by grouping them for a user who wants to get the newest information. However, microblogs have word limits, and it has there is not enough information to analyze for content clustering. In this paper, we propose a semantic-based K-means clustering algorithm that not only measures the similarity between the data represented as a vector space model, but also measures the semantic similarity between the data by exploiting the TagCluster for clustering. Through the experimental results on the RepLab2013 Twitter dataset, we show the effectiveness of the semantic-based K-means clustering algorithm.

키워드

E1JBB0_2018_v14n6_1438_f0001.png 이미지

Fig. 1. System architecture.

E1JBB0_2018_v14n6_1438_f0002.png 이미지

Fig. 2. Comparison of semantic-based and original K-means results.

E1JBB0_2018_v14n6_1438_f0003.png 이미지

Fig. 3. Ratio of each semantic-based K-means cluster result (K=20).

Table 1. Results of semantic-based K-means clustering (K=20)

E1JBB0_2018_v14n6_1438_t0001.png 이미지

참고문헌

  1. M. S. C. Sapul, T. H. Aung, and R. Jiamthapthaksin, "Trending topic discovery of Twitter Tweets using clustering and topic modeling algorithms," in Proceedings of 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE), Nakhon Si Thammarat, Thailand, 2017, pp. 1-6.
  2. K. H. Lim, S. Karunasekera, and A. Harwood, "ClusTop: a clustering-based topic modelling algorithm for twitter using word networks," in Proceedings of IEEE International Conference on Big Data, Boston, MA, 2017, pp. 2009-2018.
  3. J. Xu, B. Xu, P. Wang, S. Zheng, G. Tian, and J. Zhao, "Self-taught convolutional neural networks for short text clustering," Neural Networks, vol. 88, pp. 22-31, 2017. https://doi.org/10.1016/j.neunet.2016.12.008
  4. L. Kotlerman, I. Dagan, and O. Kurland, "Clustering small-sized collections of short texts," Information Retrieval Journal, vol. 21, no. 4, pp. 273-306, 2018. https://doi.org/10.1007/s10791-017-9324-8
  5. B. Wang, M. Liakata, A. Zubiaga, and R. Procter, "A hierarchical topic modelling approach for tweet clustering," in Social Informatics. Cham: Springer, 2017, pp. 378-390.
  6. C. T. Zheng, C. Liu, and H. S. Wong, "Corpus-based topic diffusion for short text clustering," Neurocomputing, vol. 275, pp. 2444-2458, 2018. https://doi.org/10.1016/j.neucom.2017.11.019
  7. S. Dhuria, H. Taneja, and K. Taneja, "NLP and ontology based clustering: an integrated approach for optimal information extraction from social web," in Proceedings of 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 2016, pp. 1765-1770.