DOI QR코드

DOI QR Code

빅데이터 분석 도구 R을 이용한 비정형 데이터 텍스트 마이닝과 시각화

Text Mining and Visualization of Unstructured Data Using Big Data Analytical Tool R

  • Nam, Soo-Tai (Institute of General Education, Pusan National University) ;
  • Shin, Seong-Yoon (School of Computer Information & Communication Engineering, Kunsan National University) ;
  • Jin, Chan-Yong (Division of Information & Electronic Commerce, Wonkwang University)
  • 투고 : 2021.07.15
  • 심사 : 2021.08.08
  • 발행 : 2021.09.30

초록

빅데이터 시대에는 단순히 데이터베이스에 잘 정리된 정형 데이터뿐만 아니라 인터넷, 소셜 네트워크 서비스, 모바일 환경에서 실시간 생성되는 웹 문서, 이메일, 소셜 데이터 등 비정형 빅데이터를 효과적으로 분석하는 것이 매우 중요하다. 빅데이터 분석은 데이터 저장소에 저장된 빅데이터 속에서 의미 있는 새로운 상관관계, 패턴, 추세를 발견하여 새로운 가치를 창출하는 과정이다. 빅데이터 분석 도구인 R 언어를 이용하여 비정형 논문 데이터를 빈도분석을 통해 분석결과를 요약과 시각화하고자 한다. 본 연구에서 사용된 데이터는 한국정보통신학회 학회지 논문 중에서 2021년 1월호-5월호 총 논문 104편을 대상으로 분석하였다. 최종 분석결과 가장 많이 언급된 키워드는 "데이터"가 1,538회로 1위를 차지하였다. 따라서 분석결과를 바탕으로 연구의 한계와 이론적 실무적 시사점을 제시하고자 한다.

In the era of big data, not only structured data well organized in databases, but also the Internet, social network services, it is very important to effectively analyze unstructured big data such as web documents, e-mails, and social data generated in real time in mobile environment. Big data analysis is the process of creating new value by discovering meaningful new correlations, patterns, and trends in big data stored in data storage. We intend to summarize and visualize the analysis results through frequency analysis of unstructured article data using R language, a big data analysis tool. The data used in this study was analyzed for total 104 papers in the Mon-May 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 1,538 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

키워드

참고문헌

  1. H. Kim, S. Kim, and H. Kim, "Crisis Prediction of Regional Industry Ecosystem based on Text Sentiment Analysis Using News Data - Focused on the Automobile Industry in Gwangju-," International Journal of contents, vol. 20, no. 8, pp. 1-9, Aug. 2020.
  2. J. Kim, H. Moon, and W. Lee, "A Study on Trend Analysis in Convergence Research Applying Word Cloud in Korea," Journal of Digital Convergence, vol. 19, no. 2, pp. 33-38, Feb. 2021. https://doi.org/10.14400/JDC.2021.19.2.033
  3. Y. Oh and E. Park, "Data visualization of air quality data using R software," Journal of the Korea Data and Information Science Society, vol. 26, no. 2, pp. 399-408, Feb. 2015. https://doi.org/10.7465/jkdi.2015.26.2.399
  4. Y. Kang, M. Kim, C. Hong S. Kim, and S. Kwon, "Visualizing Educational Material using a Big Data Analytical Tool R Language," Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, vol. 8, no. 3, pp. 915-924, Mar. 2018.
  5. W. Lee, "A Study on Word Cloud Techniques for Analysis of Unstructured Text Data," The Journal of the Convergence on Culture Technology, vol. 6, no. 4, pp. 715-720, Nov. 2020. https://doi.org/10.17703/JCCT.2020.6.4.715
  6. E. Lee, K. Chu, and D. Lee, "A Study on Recent Trend Analysis in Consumer Research Applying Word Cloud," Journal of Product Research, vol. 37, no. 1, pp. 1-7, Feb. 2019.
  7. J. Ban, J. Ha, and D. Kim, "Frequency and Social Network Analysis of the Bible Data using Big Data Analytics Tools R," Journal of the Korea Institute of Information and Communication Engineering, vol. 24, no. 2, pp. 166-171, Feb. 2010. https://doi.org/10.6109/JKIICE.2020.24.2.166
  8. J. Huh, "Designing of Image Processing Curriculum Considering Network Security," Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology, vol. 7, no. 6, pp. 861-869, Jun. 2017. https://doi.org/10.35873/AJMAHS.2017.7.6.081
  9. S. Kim and S. Choi, "Analyzing the level of resilience by gender in computational thinking classes," Journal of the Korea Institute of Information and Communication Engineering, vol. 25, no. 2, pp. 252-258, Feb. 2021. https://doi.org/10.6109/JKIICE.2021.25.2.252