References
- Y. J. Kim, H. S. Kim, and H. S. Kim, "Understanding the Effects of COVID-19 on the Starbucks Perception through Big Data Analytics: A Comparative Study," Culinary Science & Hospitality Research, vol. 27, no. 6, pp. 276-279, 2021.
- Y. R. Suh, K. P. Koh, and J. W. Lee, "An analysis of the change in media's reports and attitudes about face masks during the COVID-19 pandemic in South Korea: a study using Big Data latent dirichlet allocation (LDA) topic modelling," Journal of the Korea Institute of Information and Communication Engineering, vol. 25, no. 5, pp. 731-740, 2021. https://doi.org/10.6109/JKIICE.2021.25.5.731
- C. H. Lee, K. H. Kang, Y. H. Kim, H. N. Lim, J. H. Ku, and K. H. Kim, "A Study on the Factors of Well-aging through Big Data Analysis: Focusing on Newspaper Articles," Journal of the Korea Academia-Industrial cooperation Society, vol. 22, no. 5 pp. 354-360, 2021. https://doi.org/10.5762/KAIS.2021.22.5.354
- J. H. Lee, "Building an SNS Crawling System Using Python," Journal of the Korea Industrial Information Systems Research, vol. 23, no. 5, pp. 61-76, 2018. https://doi.org/10.9723/JKSIIS.2018.23.5.061
- C. Kohlschuer, P. Fankhauser, and W. Nejdl, "Boilerplate detection using shallow text features," in Proceedings of the third ACM international conference on Web Search and Data Mining (WSDM), New York: NY, pp. 441-450, 2010.
- W. M. Song and M. G. Kim, "Contents Extraction from HTML Documents using Text Block Context," Journal of KISS : Software and Applications, vol. 40, no. 3, pp. 155-163, 2013.
- H. G. Jeon and C. Koh, "Text Extraction Algorithm using the HTML Logical Structure Analysis," Journal of Digital Contents Society, vol. 16, no. 3, pp. 445-455, 2015. https://doi.org/10.9728/DCS.2015.16.3.445
- J. H. Mo and J. M. Yum "Korean Web Content Extraction using Tag Rank Position and Gradient Boosting," Journal of KIISE, vol. 44, no. 6, pp. 581-586, 2017. https://doi.org/10.5626/JOK.2017.44.6.581
- S. Wu, J. Liu, and J. Fan, "Automatic Web Content Extraction by Combination of Learning and Grouping," in Proceedings of the 24th International Conference on World Wide Web (WWW '15), pp. 1264-1274, 2015.
- S. H. Kim and H. J. Kim, "Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts," KIPS Transactions on Software and Data Engineering, vol. 6, no. 5, pp. 279-284, 2017. https://doi.org/10.3745/KTSDE.2017.6.5.279
- T. Vogels, O. E. Ganea, and C. Eickhoff, "Web2text: Deep structured boilerplate removal," in Proceedings of the 40th European Conference on Information Retrieval, pp. 167-179, 2018.
- J. Leonhardt, A. Anand, and M. Khosla, "Boilerplate Removal using a Neural Sequence Labeling Model," in Companion Proceedings of the Web Conference 2020 (WWW '20), New York: NY, pp. 226-229, 2020.
- J. H. Kim and E. G. Kim, "HTML Text Extraction Using Frequency Analysis," Journal of the Korea Institute of Information and Communication Engineering, vol. 25, no. 9, 2021.
- Tharwat, A, "Classification assessment methods," Applied Computing and Informatics, vol. 17 no. 1, pp. 168-192, 2021. https://doi.org/10.1016/j.aci.2018.08.003