DOI QR코드

DOI QR Code

Evaluation of Similarity Analysis of Newspaper Article Using Natural Language Processing

  • Ayako Ohshiro (Department of Business Administration, Okinawa International University) ;
  • Takeo Okazaki (Faculty of Engineering, University of the Ryukyus) ;
  • Takashi Kano (Graduate School of Economics Hitotsubashi University) ;
  • Shinichiro Ueda (Department of Clinical Research and Quality Management Graduate School of Medicine University of the Ryukyus Nishihara)
  • Received : 2024.06.05
  • Published : 2024.06.30

Abstract

Comparing text features involves evaluating the "similarity" between texts. It is crucial to use appropriate similarity measures when comparing similarities. This study utilized various techniques to assess the similarities between newspaper articles, including deep learning and a previously proposed method: a combination of Pointwise Mutual Information (PMI) and Word Pair Matching (WPM), denoted as PMI+WPM. For performance comparison, law data from medical research in Japan were utilized as validation data in evaluating the PMI+WPM method. The distribution of similarities in text data varies depending on the evaluation technique and genre, as revealed by the comparative analysis. For newspaper data, non-deep learning methods demonstrated better similarity evaluation accuracy than deep learning methods. Additionally, evaluating similarities in law data is more challenging than in newspaper articles. Despite deep learning being the prevalent method for evaluating textual similarities, this study demonstrates that non-deep learning methods can be effective regarding Japanese-based texts.

Keywords

References

  1. James Allan, Rahul Gupta, and Vikas Khandelwa, " Temporal Summaries of News Topics.", In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval,2001.
  2. Yusuke Hoshino, "A Preliminary Analysis of Newspaper Editorials on COVID-19 Using Natural Language Processing Technologies : Differences among Newspapers and Further Research", Musashino University Management Journal (5), 113-148, 2022
  3. Larry M. Manevitz, Malik Yousef, "One-Class SVMs for Document Classification", Journal of Machine Learning Research 2 ,139-154, 2001
  4. Fujii, Machiko, "The Present Condition and Issues of Municipal Ordinances in Merger of Nuncipalities : Case of the Ordinance Making of Koka City ", The bulletin of the Graduate School of Law, Ryukoku University, 181-214, 2007
  5. KAKUTA Tokuyasu, "An analysis of regulations of local governments using a supercomputer and the application to a regulation database", Nagoya University Journal of Law and Politics, Vol. 246, 69- 91, 2012
  6. TAKENAKA YOICHI, WAKAO TAKESHI, "Automatic Generation of Article Correspondence Tables for the Comparison of Local Government Statutes", Journal of natural language processing, Vol. 19 No. 3, pp.194- 212, 2012. https://doi.org/10.5715/jnlp.19.193
  7. Gaitake.K, Tomoya.S, Youiti.T, "Meizi minpou seitei zi ni okeru nitihutu minpou zyoubun no sansyou kankei sai suitei " (in Japanese) The 25th Annual Meeting of the Association for Natural Language Processing, pp.398-401, 2019.
  8. Gaitake.K, Tomoya.S, Youiti.T, "Meizi minpou to kakukokumin hou to no zyoubun ruizi kankei ni motozuku rikkyakuten no kaiseki" (in Japanese), The 26th Annual Meeting of the Association for Natural Language Processing, pp93-96, 2020
  9. Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova," Bidirectional Encoder Representations from Transformers", In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171-4186, 2019
  10. Quoc V. Le, Tomas Mikolov, "Distributed Representations of Sentences and Documents", proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1188-1196, 2014
  11. Ayako OHSHIRO, Shinichiro UEDA. " Feature extraction of each laws for clinical research and their relation" , Institute of Electronics, Information and Communication Engineers. 2019; D-5-4
  12. Ayako OHSHIRO, Shinichiro UEDA. "Interpretability of laws related to clinical research with text mining.", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS. 2019 ; NLC2019-33 (320)NLC2019- 33:35-40
  13. Ayako OHSHIRO, Takeo OKAZAKI, Shinichiro UEDA. "Visualization of clinical research-related laws using co-occurrence network", The Japanese Society of Clinical Pharmacology and Therapeutics (JSCPT).2024; 55(1): 57-62. https://doi.org/10.3999/jscpt.55.1_57
  14. Ayako OHSHIRO, Takeo OKAZAKI, Shinichiro UEDA. " Study on relationship visualization of clinical research-related laws using word- matching" , The Japanese Society of Clinical Pharmacology and Therapeutics (JSCPT). 2023 ; 54(1): 43-48. https://doi.org/10.3999/jscpt.54.1_43
  15. Ayako OHSHIRO, Takeo OKAZAKI, Shinichiro UEDA. ziko sougo zyouhou ryou to tango gun itti do wo ku mi a wase ta rinsyou kenkyuu kanren hourei no ruizisei hyouka no kentou (in Japanese). The 29th Annual Meeting of the Association for Natural Language Processing, pp1216-1219, 2023
  16. Mikolov, T.; Le, Q. Distributed representations of sentences and documents. Proceedings of the 31st International Conference on Machine Learning, ICML 2014, 2014, p.1188-1196.
  17. Hiroki T, Makoto N, "BERT wo moti i ta hikakuhou kenkyuu ni okeru ruizi zyoukou no taiou zu ke" (in Japanese), The 28th Annual Meeting of the Association for Natural Language Processing, pp948-951, 2022
  18. Kaito Koyama, Tomoya Sano, Yoichi, "The legislative study on Meiji civil code by machine learning", Fifteenth International Workshop on Juris-informatics (JURISIN 2021)
  19. Sinryou.H, "「wakariyasusa」 wo mezasi te ka ka re ta sinbun kizi no buntai teki tokutyou syakai gengogaku (in Japanese)", The Japanese journal of language in society, pp.43-54, vol.15(2015).
  20. Yuta Ichikawa, "PRELIMINARY STUDY ON DETECTION OF NEWSPAPER TREND AMONG THEIR PUBLISHERS USING TEXT-MINING APPROACH", Bulletin of graduate studies. Engineering Hosei University, Vol.57(2016)