• Title/Summary/Keyword: R 텍스트 마이닝

Search Result 89, Processing Time 0.023 seconds

BERT-based Classification Model for Korean Documents (한국어 기술문서 분석을 위한 BERT 기반의 분류모델)

  • Hwang, Sangheum;Kim, Dohyun
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.1
    • /
    • pp.203-214
    • /
    • 2020
  • It is necessary to classify technical documents such as patents, R&D project reports in order to understand the trends of technology convergence and interdisciplinary joint research, technology development and so on. Text mining techniques have been mainly used to classify these technical documents. However, in the case of classifying technical documents by text mining algorithms, there is a disadvantage that the features representing technical documents must be directly extracted. In this study, we propose a BERT-based document classification model to automatically extract document features from text information of national R&D projects and to classify them. Then, we verify the applicability and performance of the proposed model for classifying documents.

Methodology for Issue-related R&D Keywords Packaging Using Text Mining (텍스트 마이닝 기반의 이슈 관련 R&D 키워드 패키징 방법론)

  • Hyun, Yoonjin;Shun, William Wong Xiu;Kim, Namgyu
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.57-66
    • /
    • 2015
  • Considerable research efforts are being directed towards analyzing unstructured data such as text files and log files using commercial and noncommercial analytical tools. In particular, researchers are trying to extract meaningful knowledge through text mining in not only business but also many other areas such as politics, economics, and cultural studies. For instance, several studies have examined national pending issues by analyzing large volumes of text on various social issues. However, it is difficult to provide successful information services that can identify R&D documents on specific national pending issues. While users may specify certain keywords relating to national pending issues, they usually fail to retrieve appropriate R&D information primarily due to discrepancies between these terms and the corresponding terms actually used in the R&D documents. Thus, we need an intermediate logic to overcome these discrepancies, also to identify and package appropriate R&D information on specific national pending issues. To address this requirement, three methodologies are proposed in this study-a hybrid methodology for extracting and integrating keywords pertaining to national pending issues, a methodology for packaging R&D information that corresponds to national pending issues, and a methodology for constructing an associative issue network based on relevant R&D information. Data analysis techniques such as text mining, social network analysis, and association rules mining are utilized for establishing these methodologies. As the experiment result, the keyword enhancement rate by the proposed integration methodology reveals to be about 42.8%. For the second objective, three key analyses were conducted and a number of association rules between national pending issue keywords and R&D keywords were derived. The experiment regarding to the third objective, which is issue clustering based on R&D keywords is still in progress and expected to give tangible results in the future.

Exploring Information Ethics Issues based on Text Mining using Big Data from Web of Science (Web of Science 빅데이터를 활용한 텍스트 마이닝 기반의 정보윤리 이슈 탐색)

  • Kim, Han Sung
    • The Journal of Korean Association of Computer Education
    • /
    • v.22 no.3
    • /
    • pp.67-78
    • /
    • 2019
  • The purpose of this study is to explore information ethics issues based on academic big data from Web of Science (WoS) and to provide implications for information ethics education in informatics subject. To this end, 318 published papers from WoS related to information ethics were text mined. Specifically, this paper analyzed the frequency of key-words(TF, DF, TF-IDF), information ethics issues using topic modeling, and frequency of appearances by year for each issue. This paper used 'tm', 'topicmodel' package of R for text mining. The main results are as follows. First, this paper confirmed that the words 'digital', 'student', 'software', and 'privacy' were the main key-words through TF-IDF. Second, the topic modeling analysis showed 8 issues such as 'Professional value', 'Cyber-bullying', 'AI and Social Impact' et al., and the proportion of 'Professional value' and 'Cyber-bullying' was relatively high. This study discussed the implications for information ethics education in Korea based on the results of this analysis.

Topic Modeling on Research Trends of Industry 4.0 Using Text Mining (텍스트 마이닝을 이용한 4차 산업 연구 동향 토픽 모델링)

  • Cho, Kyoung Won;Woo, Young Woon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.7
    • /
    • pp.764-770
    • /
    • 2019
  • In this research, text mining techniques were used to analyze the papers related to the "4th Industry". In order to analyze the papers, total of 685 papers were collected by searching with the keyword "4th industry" in Korea Journal Index(KCI) from 2016 to 2019. We used Python-based web scraping program to collect papers and use topic modeling techniques based on LDA algorithm implemented in R language for data analysis. As a result of perplexity analysis on the collected papers, nine topics were determined optimally and nine representative topics of the collected papers were extracted using the Gibbs sampling method. As a result, it was confirmed that artificial intelligence, big data, Internet of things(IoT), digital, network and so on have emerged as the major technologies, and it was confirmed that research has been conducted on the changes due to the major technologies in various fields related to the 4th industry such as industry, government, education field, and job.

Discovering the Knowledge Structure of Graphene Technology by Text Mining National R&D Projects and Newspapers (국가R&D과제와 신문에서 텍스트마이닝을 통한 그래핀 기술의 지식구조 탐색)

  • Lee, Ji-Yeon;Na, Hye-In;Lee, Byeong-Hee;Kim, Tae-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.2
    • /
    • pp.85-99
    • /
    • 2021
  • Graphene, called the "dream material" is drawing attention as a groundbreaking new material that will lead the era of the 4th Industrial Revolution. Graphene has high strength, excellent electrical and thermal conductivity, excellent optical permeability, and excellent gas barrier properties. In this paper, as the South Korean government recently announced Green New Deal and Digital New Deal policy, we analyze graphene technology, which is also attracting attention for its application to Corona 19 biosensor, to understand its national R&D trend and knowledge structure, and to explore the possibility of its application. Firstly, 4,054 cases of national R&D project information for the last 10 years are collected from the National Science & Technology Information Service(NTIS) to analyze the trend of graphene-related R&D. Besides, projects classified as green technology are analyzed concerning the government's Green New Deal policy. Secondly, text mining analysis is conducted by collecting 500 recent graphene-related articles from e-newspapers. According to the analysis, the field with the largest number of projects was found to be high-efficiency secondary battery technology, and the proportion of total research funds was also the highest. It is expected that South Korea will lead the development of graphene technology in the future to become a world leader in diverse industries including electric vehicles, cellular phone batteries, next-generation semiconductors, 5G, and biosensors.

'Economic Security' Discourse Analysis Using Text Mining (텍스트 마이닝을 활용한 '경제안보' 담론 분석)

  • Jungjoo Oh;Yeram Lim;Hyesu Cheon;Wonhyung Park
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.513-516
    • /
    • 2024
  • 미·중 기술 패권 경쟁이 심화되면서 경제안보는 국가안보의 핵심 요소로 부상하였다. 주요국들은 각국이 도입한 경제안보 개념에 따라 입법과 정책을 추진하고 있다. 그러나 우리나라에서 경제안보 개념은 아직까지 불분명한 상황이다. 이에 본 연구는 국내 뉴스 빅데이터를 통해 경제안보 관련 담론을 파악하여 한국식 경제안보 개념화를 위한 토대를 만드는 것을 목적으로 하였다. 빅카인즈를 통해 경제안보 관련 뉴스 기사를 수집하고 텍스트 마이닝을 활용하여 분석하였다. TF-IDF 분석과 LDA 토픽 모델링이 분석에 활용되었다. 그 결과 세 개의 주요 토픽이 도출되었고, 경제안보의 이중 구조를 확인할 수 있었다. 본 연구는 향후 한국식 경제안보를 개념화하고 그에 대한 전략을 마련하기 위한 기초자료로 활용할 수 있을 것으로 기대한다.

The Analysis of Research Trends in Technology to the Fourth Industrial Revolution using SNA (소셜 네트워크 분석을 이용한 4차 산업혁명 기술 분야의 연구 동향 분석)

  • Kim, Hong-Gwang;Ahn, Jong-Wook
    • Journal of Cadastre & Land InformatiX
    • /
    • v.49 no.1
    • /
    • pp.113-121
    • /
    • 2019
  • The fourth industrial revolution technology focused on the fusion of infrastructure and various advanced technologies related city. Therefore, technical cooperation in various fields of research is essential. In order to activating the fourth industrial revolution technologies, it is necessary to research the state of technology in various fields. Consequently, this paper aims to analysis of domestic and foreign research trends on technology to the fourth industrial revolution using SNA and text mining for web site. We collected text, date data of research paper and report in web site for five years, that is, from January 1st in 2014 to December 31st in 2018. Next, we have deduced the major keywords in public data through analyzing the morphemes. Then we have analyzed the core and related keyword lists through an SNA. In Korea, the focus is on R&D and legal/institutional solution in relation to the fourth industrial revolution technology. On the other hand, in the case of foreign, there was focus on practical technologies for urban services in detail aspects.

Text Mining Analysis Technique on ECDIS Accident Report (텍스트 마이닝 기법을 활용한 ECDIS 사고보고서 분석)

  • Lee, Jeong-Seok;Lee, Bo-Kyeong;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.25 no.4
    • /
    • pp.405-412
    • /
    • 2019
  • SOLAS requires that ECDIS be installed on ships of more than 500 gross tonnage engaged in international navigation until the first inspection arriving after July 1, 2018. Several accidents related to the use of ECDIS have occurred with its installation as a new major navigation instrument. The 12 incident reports issued by MAIB, BSU, BEAmer, DMAIB, and DSB were analyzed, and the cause of accident was determined to be related to the operation of the navigator and the ECDIS system. The text was analyzed using the R-program to quantitatively analyze words related to the cause of the accident. We used text mining techniques such as Wordcloud, Wordnetwork and Wordweight to represent the importance of words according to their frequency of derivation. Wordcloud uses the N-gram model as a way of expressing the frequency of used words in cloud form. As a result of the uni-gram analysis of the N-gram model, ECDIS words were obtained the most, and the bi-gram analysis results showed that the word "Safety Contour" was used most frequently. Based on the bi-gram analysis, the causative words are classified into the officer and the ECDIS system, and the related words are represented by Wordnetwork. Finally, the related words with the of icer and the ECDIS system were composed of word corpus, and Wordweight was applied to analyze the change in corpus frequency by year. As a result of analyzing the tendency of corpus variation with the trend line graph, more recently, the corpus of the officer has decreased, and conversely, the corpus of the ECDIS system is gradually increasing.

Exploring the Knowledge Structure of Fuel Cell Electric Vehicle in National R&D Projects for the Hydrogen Economy (수소 경제를 위한 국가R&D과제에서 연료전지전기차의 지식구조 탐색)

  • Choi, Jung Woo;Lee, Ji Yeon;Lee, Byeong-Hee;Kim, Tae-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.6
    • /
    • pp.306-317
    • /
    • 2021
  • With a global shift from carbon economy towards hydrogen economy, leading countries such as the U.S., Europe, China, and Japan are focusing their research capabilities on hydrogen research and development(R&D) by announcing various hydrogen economy policies. South Korea also has been following this global trend by announcing hydrogen economy roadmap in January 2019 and legislating hydrogen economy related law. In this paper, we tried to figure out the national R&D trend of Fuel Cell Electric Vehicle(FCEV) and its knowledge structure by using recent 10-year project data of National Technology and Information Service(NTIS). We collected 1,479 FCEV-related projects and conducted text mining and network analysis. According to the analysis, FCEV-related R&D has been actively carried out over the entire process of hydrogen production, transport, storage, and utilization. Furthermore, the paper provides insights into the government's policy agenda building and market strategy on the hydrogen economy.

Customized recommendation system through product review analysis (상품 리뷰 분석을 통한 사용자 맞춤형 추천 시스템)

  • Hwang, Doyeun;Bae, Sangjung;Kim, Changsoo;Jung, Heokyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.460-461
    • /
    • 2018
  • The traditional recommendation system is developed on the assumption that users behave independently, and have problem of readability and efficiency are inferior due to simply sort products or lack of function for associate product attributes with user's taste. To solve this problem in this study we propose a system that provides user customized information that the analysis of the unstructured review data with the purchase histories of users processed with meaningful information after crawling product review data using text mining with R. This allows to help user make decisions can be provided only necessary information without analyze massive amounts of products review data.

  • PDF