DOI QR코드

DOI QR Code

Analysis Method Study of Film Text using Word Vectors of Language Model

언어모델의 단어벡터를 이용한 영화 텍스트 분석 기법 연구

  • 고광호 (성균관대학교 응용AI융합학부) ;
  • 백주련 (평택대학교 데이터정보학과)
  • Received : 2024.08.22
  • Accepted : 2024.11.05
  • Published : 2024.11.30

Abstract

LSTM, a deep learning technique for building language models, can be easily trained on systems with small computing resources, unlike large language models. In this paper, we propose a convergent technique to train LSTM-based language models on small-scale texts and perform objective semantic and relational analysis on the main topic words of the text using the word vectors of the vocabulary comprising the text. Using the word vectors of a small language model trained on the English script of the 2021 movie "Green Knight" directed by David Lowery as a text, we proposed a technique that can analyze the meaning and relationship of the main topic words. Through the similarity operation of the word vector, the meaning and symbolism of each theme word can be objectively analyzed with the similarity scores between the words. The relationship between each theme word can be intuitively recognized by displaying the dimensionality-reduced two-dimensional word vector. By using a small-scale language model of the LSTM method, we proposed a method to analyze complex texts using word vectors while minimizing the cost of learning.

언어모델을 구축하기 위한 딥러닝 기법인 LSTM의 경우 대형언어모델과 달리 컴퓨팅 자원이 작은 시스템에서도 수월하게 학습시킬 수 있다. 소규모 텍스트에 대해 LSTM 기반의 언어모델을 학습시키고, 텍스트를 구성하는 어휘의 단어벡터를 이용하여 해당 텍스트의 주요 주제어에 대해 객관적인 의미 및 관계 분석을 할 수 있는 융복합적인 기법을 제안하였다. 데이비드 로워리 감독의 2021년도 영화 '그린 나이트'의 영어 대본을 텍스트로 삼아 학습시킨 소규모 언어모델의 단어벡터를 이용하여 주요 주제어의 의미와 관계를 분석할 수 있는 기법을 제안하였다. 단어벡터의 유사도 연산을 통해 각 주제어들과 유사도가 높은 단어를 분석하여 그 의미와 상징성을 객관적으로 분석할 수 있고, 차원감소시킨 2차원 단어벡터를 도시하여 각 주제어들의 관계를 직관적으로 인식할 수 있었다. LSTM 방식의 소규모 언어모델을 이용하여 학습에 필요한 비용을 최소화하면서도 복잡한 텍스트를 분석할 수 있는 단어벡터 활용법을 제안하였다.

Keywords

References

  1. K. Hyungsuc, Y. Janghoon, "Analyzing Semantic Relations of Word Vectors trained by The Word2vec Model", Journal of KIISE, 46(10), pp. 1088-1093, 2019
  2. L. Hickman, et al., "Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations", Organizational Research Methods, 25(1), pp.114-146, 2022
  3. N. Fatima, et al., "A Systematic Literature Review on Text Generation Using Deep Neural Network Models", IEEE Access, 10, 53490-53503. 2022
  4. D. Lee, "Shame and Guilt As Sanctions Controlling Gawain's Behavior", Journal of British & American Studies, 59, pp. 87-112, 2023
  5. J. Kang, "The Geographical Imagination in Sir Gawain and the Green Knight: The Wilderness of Wirral", Medieval and Early Modern English Studies, 23(1), pp. 1-29, 2015
  6. M. Rasmussen, "Tweaking the Tradition: Gawain as Perceval in David Lowery's The Green Knight", Arthuriana, 34(20), pp. 62-78, 2024
  7. M. Eden, "Representing Sir Gawain and the Green Knight", Arthuriana, 34(2), pp. 16-61, 2024
  8. Q. Yanan and T. Fuqiang, "Keyword Extraction for Film Reviews Based on Social Network Analysis and Natural Language Technology," Journal of Technology and Information, 2022, DOI:10.1051/e3sconf/202018903019
  9. S. H. A. Latif, A. S. Alwan and A. M. Mohamed, "Principal Component Analysis as Tool for Data Reduction with an Application," EUREKA: Physics and Engineering, vol.5, pp.184-198, 2022, DOI:10.21303/2461-4262.2022.002577
  10. F. Heimerl, M. Gleicher, "Interactive Analysis of Word Vector Embeddings", Computer Graphics Forum, 37(3), pp. 253-265, 2018
  11. Z. Bingyu and N. Arefyev, "The Document Vectors Using Cosine Similarity Revisited," Proceedings of the Third Workshop on Insights from Negative Results in NLP, pp.129-133, Association for Computational Linguistics, 2022, DOI:10.18653/v1/2022.insights-1.17
  12. K. Kwangho, et al., "Input Dimension Reduction based on Continuous Word Vector for Deep Neural Network Language Model," Phonetics and Speech Sciences, 7(4), pp. 3-8, 2015
  13. K. Kusum and S. Panda, "Sentiment Analysis Using Global Vector and Long Short-termMemory," International Journal of Electrical and Computer Engineering (IJECE), 2022, DOI:10.11591/ijeecs.v26.i1.pp414-422
  14. Z. Wasik, "Uncovering the two conceptions of the linguistic sign in Saussure's lectures: An epistemological inquiry with comments on translational equivalence", Sign Systems Studies, 51(3/4), pp. 513-537, 2023