DOI QR코드

DOI QR Code

Attention-based word correlation analysis system for big data analysis

빅데이터 분석을 위한 어텐션 기반의 단어 연관관계 분석 시스템

  • Chi-Gon, Hwang (Department of Computer Engineering, IIT, Kwangwoon University) ;
  • Chang-Pyo, Yoon (Department Of Computer & Mobile Convergence, GyeongGi University of Science and Technology) ;
  • Soo-Wook, Lee (Glocal Education Center, Kwangwoon University)
  • Received : 2022.12.04
  • Accepted : 2022.12.27
  • Published : 2023.01.31

Abstract

Recently, big data analysis can use various techniques according to the development of machine learning. Big data collected in reality lacks an automated refining technique for the same or similar terms based on semantic analysis of the relationship between words. Since most of the big data is described in general sentences, it is difficult to understand the meaning and terms of the sentences. To solve these problems, it is necessary to understand the morphological analysis and meaning of sentences. Accordingly, NLP, a technique for analyzing natural language, can understand the word's relationship and sentences. Among the NLP techniques, the transformer has been proposed as a way to solve the disadvantages of RNN by using self-attention composed of an encoder-decoder structure of seq2seq. In this paper, transformers are used as a way to form associations between words in order to understand the words and phrases of sentences extracted from big data.

최근, 빅데이터 분석은 기계학습의 발전에 따른 다양한 기법들을 이용할 수 있다. 현실에서 수집된 빅데이터는 단어 간의 관계성에 대한 의미적 분석을 바탕으로 같거나 유사한 용어에 대한 자동화된 정제기법이 부족하다. 빅데이터는 일반적인 문장으로 기술되어 있다. 이러한 문제를 해결하기 위해 문장의 형태소 분석과 의미를 이해해야 할 필요가 있다. 이에 자연어를 분석하기 위한 기법인 NLP는 단어의 관계성과 문장을 이해할 수 있다. 본 논문에서는 빅데이터에서 추출된 문장에서 단어를 추출하여 단어 간의 연관 관계를 생성하는 방법을 연구한다. 이에 트랜스포머 기술을 이용한다.

Keywords

Acknowledgement

This paper was researched by Kwangwoon University's intramural academic research fund support in 2022.

References

  1. J. M. Jo, "Effectiveness of Normalization Pre-Processing of Big Data to the Machine Learning Performance," The Journal of the Korea institute of electronic communication sciences, vol. 14, no. 3, pp. 547-552, Jun. 2019. DOI: 10.13067/JKIECS.2019.14.3.547.
  2. J. M. Park, "A Study on the Performance of Document Summarization Using Transformer-Based Korean Pre-Trained Language Model," M. S. thesis, Ewha Womans University, Korea, 2022.
  3. S. M. Kim, I. S. Na, and J. H. Shin, "A Method on Associated Document Recommendation with Word Correlation Weights," Journal of Korea Multimedia Society, vol. 22, no. 2, pp. 250-259, Feb. 2019. DOI: 10.9717/kmms.2019.22.2.250.
  4. S. Y. Yoo and O. R. Jeong, "Korean Contextual Information Extraction System using BERT and Knowledge Graph," Journal of Internet Computing and Services(JICS), vol. 21, no. 3, pp. 123-131, Jun. 2020. DOI: 10.7472/jksii.2020.21.3.123.
  5. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention Is All You Need," in Proceedings of 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach: CA, USA, 2017.
  6. S. U. Park, "Analysis of the Status of Natural Language Processing Technology Based on Deep Learning," The Journal of Big Data, vol. 6, no. 1, pp. 63-81, Aug. 2021. DOI: 10.36498/kbigdt.2021.6.1.63.
  7. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of NAACL-HLT, Minneapolis: MN, USA, pp. 4171-4186, 2019.
  8. A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving Language Understanding by Generative PreTraining," [Internet]. Available: https://s3-us-west-2.amazonaws.com/openaiassets/research-covers/languageunsupervised/languageunderstandingpaper.pdf, 2018.
  9. T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv preprint, arXiv:1301.3781, 2013. DOI: 10.48550/arXiv.1301.3781.
  10. H. S. Yun and J. J. Jung, "Automated Fact Checking Model Using Efficient Transfomer," Journal of the Korea Institute of Information and Communication Engineering, vol. 25, no. 9, pp. 1275-1278, Sep. 2021. DOI: 10.6109/jkiice.2021.25.9.1275.