Acknowledgement
This research was funded by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Korean Government (MSIT) (No. 2021R1F1A1049387) and this result was supported by "Regional Innovation Strategy (RIS)" through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (MOE) (2021RIS-002).
References
- A. Mucherino, P. J. Papajorgji, P. M. Pardalos, A. Mucherino, P. J. Papajorgji, and P. M. Pardalos, "K-nearest neighbor classification," in Data Mining in Agriculture. New York, NY: Springer, 2009, pp. 83-106. https://doi.org/10.1007/978-0-387-88615-2_4
- Y. Wang, J. Hodges, and B. Tang, "Classification of web documents using a naive bayes method," in Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence, Sacramento, CA, USA, 2003, pp. 560-564. https://doi.org/10.1109/TAI.2003.1250241
- S. Mayor and B. Pant, "Document classification using support vector machine," International Journal of Engineering Science and Technology, vol. 4, no. 4, pp. 1741-1745, 2012.
- W. M. Noormanshah, P. N. Nohuddin, and Z. Zainol, "Document categorization using decision tree: preliminary study," International Journal of Engineering & Technology, vol. 7, no. 4.34, pp. 437-440, 2018. https://doi.org/10.14419/ijet.v7i4.34.26907
- J. Kalita, "Detecting and extracting events from text documents," 2016 [Online]. Available: https://arxiv.org/abs/1601.04012.
- A. Badia, J. Ravishankar, and T. Muezzinoglu, "Text extraction of spatial and temporal information," in Proceedings of 2007 IEEE Intelligence and Security Informatics, New Brunswick, NJ, USA, 2007, pp. 381-381. https://doi.org/10.1109/ISI.2007.379527
- C. G. Lim, Y. S. Jeong, and H. J. Choi, "Survey of temporal information extraction," Journal of Information Processing Systems, vol. 15, no. 4, pp. 931-956, 2019. https://doi.org/10.3745/JIPS.04.0129
- A. Feriel and M. K. Kholladi, "Automatic extraction of spatio-temporal information from Arabic text documents," International Journal of Computer Science & Information Technology, vol. 7, no. 5, pp. 97-107, 2015. https://doi.org/10.5121/ijcsit.2015.7507
- B. Kim, Y. Yang, J. S. Park, and H. J. Jang, "A convolution neural network-based representative spatiotemporal documents classification for big text data," Applied Sciences, vol. 12, no. 8, article no. 3843, 2022. https://doi.org/10.3390/app12083843
- J. Chen, H. Huang, S. Tian, and Y. Qu, "Feature selection for text classification with Naive Bayes," Expert Systems with Applications, vol. 36, no. 3, pp. 5432-5435, 2009. https://doi.org/10.1016/j.eswa.2008.06.054
- H. Pavel, "How to build and apply Naive Bayes classification for spam filtering," 2020 [Online]. Available: https://towardsdatascience.com/how-to-build-and-apply-naive-bayes-classification-for-spam-filtering-2b8d3308501.
- V. Mitra, C. J. Wang, and S. Banerjee, "Text classification: a least square support vector machine approach," Applied Soft Computing, vol. 7, no. 3, pp. 908-914, 2007. https://doi.org/10.1016/j.asoc.2006.04.002
- M. Z. Islam, J. Liu, J. Li, L. Liu, and W. Kang, "A semantics aware random forest for text classification," in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 2019, pp. 1061-1070. https://doi.org/10.1145/3357384.3357891
- Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015. https://doi.org/10.1038/nature14539
- Z. Zhong, Y. Gao, Y. Zheng, and B. Zheng, "Efficient spatio-temporal recurrent neural network for video deblurring," in Computer Vision-ECCV 2020. Cham, Switzerland: Springer, 2020, pp. 191-207. https://doi.org/10.1007/978-3-030-58539-6_12
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," 2013 [Online]. Available: https://arxiv.org/abs/1301.3781.
- T. Huang, "A CNN model for SMS spam detection," in Proceedings of 2019 4th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Hohhot, China, 2019, pp. 851-85110. https://doi.org/10.1109/ICMCCE48743.2019.00195
- S. Liu and I. Lee, "Sequence encoding incorporated CNN model for Email document sentiment classification," Applied Soft Computing, vol. 102, article no. 107104, 2021. https://doi.org/10.1016/j.asoc.2021.107104
- E. Mutabazi, J. Ni, G. Tang, and W. Cao, "review on medical textual question answering systems based on deep learning approaches," Applied Sciences, vol. 11, no. 12, article no. 5456, 2021. https://doi.org/10.3390/app11125456
- M. Kim, K. Chae, S. Lee, H. J. Jang, and S. Kim, "Automated classification of online sources for infectious disease occurrences using machine-learning-based natural language processing approaches," International Journal of Environmental Research and Public Health, vol. 17, no. 24, article no. 9467, 2020. https://doi.org/10.3390/ijerph17249467
- B. Ban, "A survey on awesome Korean NLP datasets," in Proceedings of 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, South Korea, 2022, pp. 1615-1620. https://doi.org/10.1109/ICTC55196.2022.9952930
- D. T. Vu, G. Yu, C. Lee, and J. Kim, "Text data augmentation for the Korean language," Applied Sciences, vol. 12, no. 7, article no. 3425, 2022. https://doi.org/10.3390/app12073425
- Y. Kim, J. H. Kim, J. M. Lee, M. J. Jang, Y. J. Yum, S. Kim, et al., "A pre-trained BERT for Korean medical natural language processing," Scientific Reports, vol. 12, article no. 13847, 2022. https://doi.org/10.1038/s41598-022-17806-8
- J. Shin, H. Song, H. Lee, and J. Park, "Constructing Korean abusive language dataset using machine translation," in Proceedings of the Korea Computer Congress (KCC), Jeju, South Korea, 2022.
- J. Seo, S. Lee, L. Liu, and W. Choi, "TA-SBERT: token attention sentence-BERT for improving sentence representation," IEEE Access, vol. 10, pp. 39119-39128, 2022. https://doi.org/10.1109/ACCESS.2022.3164769
- C. Toraman, E. H. Yilmaz, F. Sahinuc, and O. Ozcelik, "Impact of tokenization on language models: an analysis for Turkish," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, no. 4, article no. 116, 2023. https://doi.org/10.1145/3578707
- M. Alkaoud and M. Syed, "On the importance of tokenization in Arabic embedding models," in Proceedings of the 5th Arabic Natural Language Processing Workshop, Virtual Event (Barcelona, Spain), 2020, pp. 119-129.
- S. Li, J. Hu, Y. Cui, and J. Hu, "DeepPatent: patent classification with convolutional neural networks and word embedding," Scientometrics, vol. 117, pp. 721-744, 2018. https://doi.org/10.1007/s11192-018-2905-5