DOI QR코드

DOI QR Code

User Oriented clustering of news articles using Tweets Heterogeneous Information Network

트위트 이형 정보 망을 이용한 뉴스 기사의 사용자 지향적 클러스터링

  • Received : 2013.10.31
  • Accepted : 2013.12.02
  • Published : 2013.12.31

Abstract

With the emergence of world wide web, in particular web 2.0 the rapidly growing amount of news articles has created a problem for users in selection of news articles according to their requirements. To overcome this problem different clustering mechanism has been proposed to broadly categorize news articles. However these techniques are totally machine oriented techniques and lack users' participation in the process of decision making for membership of clustering. In order to overcome the issue of zero-participation in the process of clustering news articles in this paper we have proposed a framework for clustering news articles by combining users' judgments that they post on twitter with the news articles to cluster the objects. We have employed twitter hash-tags for this purpose. Furthermore we have computed the credibility of users' based on frequency of retweets for their tweets in order to enhance the accuracy of the clustering membership function. In order to test performance of proposed methodology, we performed experiments on tweets messages tweeted during general election 2013 in Pakistan. Our results proved over claim that using users' output better outcome can be achieved then ordinary clustering algorithms.

월드와이드 웹, 특히 web 2.0의 출현과 함께 뉴스 기사들의 양이 엄청나게 증가하면서 독자들이 그들의 요건에 맞춰 뉴스기사를 선택하는데 어려움이 있다. 이러한 문제를 해결하기 위해서 여러 클러스터링 메커니즘이 뉴스기사들을 분별하도록 제안되었다. 하지만, 이러한 기법들은 완전히 기계 지향적 기법들이고, 클러스터링의 멤버쉽을 결정하는 과정에 사용자의 참여가 제외되어 있다. 본 논문에서는 뉴스 기사 클러스터링 처리과정에서 참여문제를 해결하기 위해서, 객체들을 클러스터링하는 뉴스 기사와 트위터에 포스트하려는 사용자의 결정을 조합하므로써 뉴스 기사를 클러스터링하는 프레임워크를 제안한다. 우리는 이를 위해 트위터 해쉬-태그를 이용할 수 있도록 했다. 더욱이, 트윗된 글에 대한 리트윗 빈번도에 기반하여 사용자의 신용도를 계산하므로써, 클러스터링 멤버쉽 함수의 정확도를 개선시키려 한다. 제안된 방법에 대한 성능을 보이기 위해, 2013년도에 파키스탄에서 있었던 선거동안에 발생한 메시지를 이용했다. 우리의 결과를 통해 사용자의 결과를 이용하므로써, 일반 클러스터링보다 더 나은 결과물이 달성될 수 있음을 보였다.

Keywords

References

  1. Y. Hu, E. M. Milios, J. Blustein, "Interactive Feature Selection for Document Clustering", In. In the 26th Symposium On Applied Computing, 2011, Tiwan, ACM Press
  2. E. Kontopoulos, C. Berberidis, T. Dergiades, N. Bassiliades, "Ontology-based sentiment analysis of twitter posts", In. Expert Systems with Applications, Vol 40, (2013) pp. 4065-4074 https://doi.org/10.1016/j.eswa.2013.01.001
  3. B. Liu, X. Li, W. S. Lee, P. S. Yu, "Text classification by labeling words" In. AAAI, 2004, ACM Press
  4. M. Newman, "Networks: An Introduction", Oxford Univ. Press, 2010.
  5. D. Saez-Trumper, G. Comarela, V. Almeida, R. Baeza-Yates, F. Benevenuto, "Finding trendsetters in information networks", In. Proc. KDD '12: 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012, ACM, pp. 1014-1022
  6. D. Park, "Intuitive Fuzzy C-Means Algorithm", IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 2009, pp. 83-88
  7. Szabo, L. N. Castro, M. R. Delgado, "FaiNet: An Immune Algorithm for Fuzzy Clustering", In Proc. WCCI 2012 IEEE World Congress on Computational Intelligence, 2012, IEEE press
  8. S. Fortunato, "Community detection in graphs." Physics Reports, 486(3-5):75-174, 2010. https://doi.org/10.1016/j.physrep.2009.11.002
  9. Rigutini, L.; Maggini, M., "A semi-supervised document clustering algorithm based on EM," Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on, vol., no., pp.200,206, 19-22 Sept. 2005.
  10. R. Huang and W. Lam. "An active learning framework for semi-supervised document clustering with language modeling," Data & Knowledge Engineering, Vol. 68 Issue 1, pp49-67, 2009 Elsevier https://doi.org/10.1016/j.datak.2008.08.008
  11. Y. Huang, H. Yeh, V. Soo. "Network-based inferring drug-disease associations from chemical, genomic and phenotype data", 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1-6, 2012
  12. S. S. Ravichandran, D. Sathya, R. Shanmugapriya, G. Isvariyaa, "Rule-base data mining systems for customer queries", 2012 Third International Conference on Computing Communication & Networking Technologies (ICCCNT), pp. 1-5, 2012
  13. W. Li, Y. Xu, J. Yang, Z. Tang, "Finding structural patterns in complex networks", 2012 IEEE Fifth International Conference on Advanced Computational Intelligence (ICACI), pp. 23 - 27, 2012
  14. K, G., Potamias, M., Terzi, E., "Clustering Large Probabilistic Graphs", IEEE Transactions on Knowledge and Data Engineering, Volume: 25, Issue: 2, pp. 325 - 336, 2013 https://doi.org/10.1109/TKDE.2011.243
  15. Y. Dong, D. Shen, T. Nie, Y. Kou, "Discovering Relationships among Data Resources in DataSpace ", Sixth Web Information Systems and Applications Conference, 2009, WISA 2009, pp. 76-81, IEEE Press
  16. D. Brickley, L. Miller, "FOAF Vocabulary Specification", 2010 http://xmlns.com/foaf/spec/
  17. "Twitter API", https://dev.twitter.com/
  18. Jo Hyeon, Hong Jong-hyun, Choeh Joon Yeon, Kim Soung Hie,"A recommendation algorithm which reflects tag and time information of social network," Journal of Korean Society for Internet Information, v.14, no.2, 2013, pp.15-24. https://doi.org/10.7472/jksii.2013.14.2.15
  19. Sam-Yull Hong, Jae-Cheol Oh,"Comparative analysis on Social Network Service users access : Based on Twitter, Facebook, KakaoStory", Journal of Korean Society for Internet Information, v.13, no.6, 2012, pp.9-16. https://doi.org/10.7472/jksii.2012.13.6.9
  20. Tai-Wan Kim, Bumjun Park, Taekeun Park,"An Augmented Memory System using Associated Words and Social Network Service", Journal of Korean Society for Internet Information, v.11, no.6, 2010, pp.41-50.