DOI QR코드

DOI QR Code

Table based Single Pass Algorithm for Clustering News Articles

  • Jo, Tae-Ho (School of Computer and Information Science, Inha University)
  • Published : 2008.09.01

Abstract

This research proposes a modified version of single pass algorithm specialized for text clustering. Encoding documents into numerical vectors for using the traditional version of single pass algorithm causes the two main problems: huge dimensionality and sparse distribution. Therefore, in order to address the two problems, this research modifies the single pass algorithm into its version where documents are encoded into not numerical vectors but other forms. In the proposed version, documents are mapped into tables and the operation on two tables is defined for using the single pass algorithm. The goal of this research is to improve the performance of single pass algorithm for text clustering by modifying it into the specialized version.

Keywords

References

  1. C. Ambroise, and G. Govaert, 'Convergence of an EM-type algorithm for spatial clustering', Pattern Recognition Letters, Vol 19, No 10, pp919-927, 1998 https://doi.org/10.1016/S0167-8655(98)00076-2
  2. A. Banerjee, I. Dhillon, J. Ghosh, and S. Sra, 'Generative model-based clustering of directional data', The Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp19-28, 2003
  3. G. Bote, P. Vincent, M. A. Felix, and V. H. Solana, 'Document Organization using Kohonen's Algorithm', Information Processing and Management, Vol 38, No 1, pp79-89, 2002
  4. V. Hatzivassiloglou, L. Gravano, and A. Maganti, 'An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering', The Proceedings of 23rd SIGIR, pp224-231, 2000
  5. T. Jo and M. Lee, 'The Evaluation Measure of Text Clustering for the Variable Number of Clusters', Lecture Notes in Computer Science, Vol 4492, pp871-879, 2007 https://doi.org/10.1007/978-3-540-72393-6_104
  6. S. Kaski, T. Honkela, K. Lagus and T. Kohonen, 'WEBSOMSelf Organizing Maps of Document Collections', Neurocomputing, Vol 21, pp101-117, 1998 https://doi.org/10.1016/S0925-2312(98)00039-3
  7. T. Kohonen, S. Kaski, K. Lagus, J. Salojarvi, V. Paatero, and A. Saarela, 'Self Organization of a Massive Document Collection', IEEE Transaction on Neural Networks, Vol 11, No 3, pp574-585, 2000 https://doi.org/10.1109/72.846729
  8. T. M. Mitchell, Machine Learning, McGraw-Hill, 1997
  9. A. Vinokourov and M. Girolami, 'A Probabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents', The Proceedings of 15th International Conference on Pattern Recognition, pp182-185, 2000
  10. H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, 'Text Classification with String Kernels', Journal of Machine Learning Research, Vol 2, No 2, pp419-444, 2002 https://doi.org/10.1162/153244302760200687
  11. T. Jo and N. Japkowicz, 'Text Clustering using NTSO', The Proceedings of IJCNN, pp558-563, 2005