Figure 2.1. The process of finding an elbow point.
Figure 3.1. Performance comparison with M1–M6 method for entire categories.
Table 2.1. IGM calculation example
Table 2.2. Document-term weighted matrix generation scheme M1–M6
Table 3.1. PDF files and terms of Periodical publication by institute
Table 3.2. Performance comparison with M1–M6 method for individual categories
References
- Chen, K. and Zong, C. (2003). A new weighting algorithm for linear classifier. In Proceedings of 2003 International Conference on Natural Language Processing and Knowledge Engineering, 650-655.
- Chen, K., Zhang, Z., Long, J., and Zhang, H. (2016). Turning from TF-IDF to TF-IGM for term weighting in text classification, Expert System with Applications, 66, 245-260. https://doi.org/10.1016/j.eswa.2016.09.009
- Cho, S. G., Cho, J. H., and Kim, S. B. (2015). Discovering meaningful trends in the inaugural addresses of United States Presidents Via text mining, Journal of Korean Institute of Industrial Engineers, 41, 453-460. https://doi.org/10.7232/JKIIE.2015.41.5.453
- Dumais, S. (1991). Improving the retrieval of information from external sources, Behavior Research Methods, Instruments & Computers, 23, 229-236. https://doi.org/10.3758/BF03203370
- Hornik, K., Meyer, D., and Karatzoglou, A. (2006). Support vector machines in R, Journal of Statisticcal Software, 15, 1-28.
- Jung, M.J. (2017). A study on clustering methods for proximity data in text mining (Master thesis), Pusan National University.
- Lee, M. R. and Bae, H. K. (2002). Design of keyword extraction system using TFIDF, The Korean Society for Cognitive Science, 13, 1-11.
- Miner, G., Elder, J., and Hill, T. (2012). Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications, Academic Press, Seoul.
- Nakov, P., Popova, A., and Mateev, P. (2001), Weight functions impact on LSA performance. In Proceeding of the Recent Advances in Natural language processing, Bulgaria, 187-193.
- Ren, F. and Sohrab, M. G. (2013). Class-indexing-based term weighting for automatic text classification, Information Sciences, 236, 109-125. https://doi.org/10.1016/j.ins.2013.02.029
- Satopaa, V., Albrecht, J., Irwin, D., and Raghavan, B. (2011). Finding a "kneedle" in a Haystack: Detecting Knee Points in System Behavior, Distributed Computing Systems Workshops (ICDCSW) 2011 31st International Conference on, IEEE, 166-171.
- Yang, Y. and Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of the ACM SIGIR Conference on Research and Development in International Retrieval, 42-49.