DOI QR코드

DOI QR Code

Representation of Texts into String Vectors for Text Categorization

  • Jo, Tae-Ho (School of Computer and Information Engineering, Inha University)
  • Received : 2009.08.07
  • Accepted : 2010.02.16
  • Published : 2010.06.30

Abstract

In this study, we propose a method for encoding documents into string vectors, instead of numerical vectors. A traditional approach to text categorization usually requires encoding documents into numerical vectors. The usual method of encoding documents therefore causes two main problems: huge dimensionality and sparse distribution. In this study, we modify or create machine learning-based approaches to text categorization, where string vectors are received as input vectors, instead of numerical vectors. As a result, we can improve text categorization performance by avoiding these two problems.

Keywords

Cited by

  1. An innovative multi-segment strategy for the classification of legal judgments using the k-nearest neighbour classifier 2017, https://doi.org/10.1007/s40747-017-0042-z
  2. How to Improve Text Summarization and Classification by Mutual Cooperation on an Integrated Framework vol.60, 2016, https://doi.org/10.1016/j.eswa.2016.05.001
  3. Statistical Text Summarization Using a Category-Based Language Model on a Bootstrapping Framework vol.27, pp.03, 2018, https://doi.org/10.1142/S0218213018500148