Semantic-Based K-Means Clustering for Microblogs Exploiting Folksonomy

Heu, Jee-Uk;

doi:10.3745/JIPS.04.0097

Journal of Information Processing Systems

제14권6호
/
Pages.1438-1444
/
2018
/
1976-913X(pISSN)
/
2092-805X(eISSN)

한국정보처리학회 (Korea Information Processing Society)

DOI QR Code

Semantic-Based K-Means Clustering for Microblogs Exploiting Folksonomy

Heu, Jee-Uk (Dept. of Computer Science and Engineering, Hanyang University)

투고 : 2018.07.17
심사 : 2018.08.25
발행 : 2018.12.31

https://doi.org/10.3745/JIPS.04.0097 인용 PDF KSCI HTML

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Recently, with the development of Internet technologies and propagation of smart devices, use of microblogs such as Facebook, Twitter, and Instagram has been rapidly increasing. Many users check for new information on microblogs because the content on their timelines is continually updating. Therefore, clustering algorithms are necessary to arrange the content of microblogs by grouping them for a user who wants to get the newest information. However, microblogs have word limits, and it has there is not enough information to analyze for content clustering. In this paper, we propose a semantic-based K-means clustering algorithm that not only measures the similarity between the data represented as a vector space model, but also measures the semantic similarity between the data by exploiting the TagCluster for clustering. Through the experimental results on the RepLab2013 Twitter dataset, we show the effectiveness of the semantic-based K-means clustering algorithm.

키워드

E1JBB0_2018_v14n6_1438_f0001.png 이미지

Fig. 1. System architecture.

E1JBB0_2018_v14n6_1438_f0002.png 이미지

Fig. 2. Comparison of semantic-based and original K-means results.

E1JBB0_2018_v14n6_1438_f0003.png 이미지

Fig. 3. Ratio of each semantic-based K-means cluster result (K=20).

Table 1. Results of semantic-based K-means clustering (K=20)

E1JBB0_2018_v14n6_1438_t0001.png 이미지

참고문헌

M. S. C. Sapul, T. H. Aung, and R. Jiamthapthaksin, "Trending topic discovery of Twitter Tweets using clustering and topic modeling algorithms," in Proceedings of 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE), Nakhon Si Thammarat, Thailand, 2017, pp. 1-6.
K. H. Lim, S. Karunasekera, and A. Harwood, "ClusTop: a clustering-based topic modelling algorithm for twitter using word networks," in Proceedings of IEEE International Conference on Big Data, Boston, MA, 2017, pp. 2009-2018.
J. Xu, B. Xu, P. Wang, S. Zheng, G. Tian, and J. Zhao, "Self-taught convolutional neural networks for short text clustering," Neural Networks, vol. 88, pp. 22-31, 2017. https://doi.org/10.1016/j.neunet.2016.12.008
L. Kotlerman, I. Dagan, and O. Kurland, "Clustering small-sized collections of short texts," Information Retrieval Journal, vol. 21, no. 4, pp. 273-306, 2018. https://doi.org/10.1007/s10791-017-9324-8
B. Wang, M. Liakata, A. Zubiaga, and R. Procter, "A hierarchical topic modelling approach for tweet clustering," in Social Informatics. Cham: Springer, 2017, pp. 378-390.
C. T. Zheng, C. Liu, and H. S. Wong, "Corpus-based topic diffusion for short text clustering," Neurocomputing, vol. 275, pp. 2444-2458, 2018. https://doi.org/10.1016/j.neucom.2017.11.019
S. Dhuria, H. Taneja, and K. Taneja, "NLP and ontology based clustering: an integrated approach for optimal information extraction from social web," in Proceedings of 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 2016, pp. 1765-1770.

Journal of Information Processing Systems

Semantic-Based K-Means Clustering for Microblogs Exploiting Folksonomy

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)