• Title/Summary/Keyword: Text frequency analysis

Search Result 464, Processing Time 0.036 seconds

Text Mining and Network Analysis of News Articles for Deriving Socio-Economic Damage Types of Heat Wave Events in Korea: 2012~2016 Cases (뉴스 기사 텍스트 마이닝과 네트워크 분석을 통한 폭염의 사회·경제적 영향 유형 도출: 2012~2016년 사례)

  • Jung, Jae In;Lee, Kyoungjun;Kim, Seungbum
    • Atmosphere
    • /
    • v.30 no.3
    • /
    • pp.237-248
    • /
    • 2020
  • In order to effectively prepare for damage caused by weather events, it is important to proactively identify the possible impacts of weather phenomena on the domestic society and economy. Text mining and Network analysis are used in this paper to build a database of damage types and levels caused by heat wave. We collect news articles about heat wave from the SBS news website and determine the primary and secondary effects of that through network analysis. In addition to that, based on the frequency with which each impact keyword is mentioned, we estimate how much influence each factor has. As a result, the types of impacts caused by heat wave are efficiently derived. Among these types of impacts, we find that people in South Korea are mainly interested in algae and heat-related illness. Since this technique of analysis can be applied not only to news articles but also to social media contents, such as Twitter and Facebook, it is expected to be used as a useful tool for building weather impact databases.

Text Analysis on the Research Trends of Nature Restoration in Korea (텍스트 분석을 활용한 국내 자연환경복원 연구동향 분석)

  • Lee, Gil-sang;Jung, Yee-rim;Song, Young-keun;Lee, Sang-hyuk;Son, Seung-Woo
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.27 no.2
    • /
    • pp.29-42
    • /
    • 2024
  • As a global response to climate and biodiversity challenges, there is an emphasis on the conservation and restoration of ecosystems that can simultaneously reduce carbon emissions and enhance biodiversity. This study comprised a text analysis and keyword extraction of 1,100 research papers addressing nature restoration in Korea, aiming to provide a quantative and systematic evaluation of domestic research trends in this field. To discern the major research topics of these papers, topic modeling was applied and correlations were established through network analysis. Research on nature restoration exhibited a mainly upward trend in 2002-2022 but with a slight recent decline. The most common keywords were "species," "forest," and "water". Research topics were broadly classified into (1) predictions of habitat size and species distribution, (2) the conservation and utilization of natural resources in urban areas, (3) ecosystems and landscape managements in protected areas, (4) the planting and growth of vegetation, and (5) habitat formation methods. The number of studies on nature restoration are increasing across various domains in Korea, with each domain experiencing professional development.

Perceived Characteristics of Grains during the Choseon Dynasty - A Study Applying Text Frequency Analysis Using the Choseonwangjoshilrok Data - (조선왕조실록 텍스트 빈도 분석을 통한 조선시대 곡물에 관한 인식 특성 고찰)

  • Mi-Hye, Kim
    • Journal of the Korean Society of Food Culture
    • /
    • v.38 no.1
    • /
    • pp.26-37
    • /
    • 2023
  • This study applied the text frequency method to analyze the crops prevalent during the Chosunwangjoshilrok dynasty, and categorized the results by each king. Contemporary perception of grains was observed by examining the staple crop types. Staple species were examined using the word cloud and semantic network analysis. Totally, 101,842 types of crop consumption were recorded during the Chosunwangjoshilrok period. Of these, 51,337 (50.4%) were grains, 50,407 (49.5%) were beans, and 98 (0.1%) were seeds. Rice was the most frequently consumed grain (37.1%), followed by pii (11.9%), millet (11.3%), barley (4.5%), proso (0.8%), wheat (0.6%), buckwheat (0.1%), and adlay (0.05%). Grain chronological frequency in the Choseon dynasty was determined to be 15,520 cases in the 15th century (30.2%), 11,201 cases in the 18th century (21.8%), 9,421 cases in the 17th century (18.4%), 9,113 cases in the 16th century (17.8%), and 6,082 cases in the 19th century (11.8%). Interest in grain amongst the 27 kings of Choseon was evaluated based on the frequency of records. The 15th century King Sejong recorded the maximum interest with 13,363 cases (13.1%), followed by King Jungjo (8,501 cases in the 18th century; 8.4%), King Sungjong (7,776 cases in the 15th century; 7.6%).

Text Mining Analysis on the Research Field of the Coastal and Ocean Engineering Based on the SCOPUS Bibliographic Information (해안해양공학 연구 분야의 SCOPUS 서지정보 Text Mining 분석)

  • Lee, Gi Seop;Cho, Hong Yeon;Han, Jae Rim
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.30 no.1
    • /
    • pp.19-28
    • /
    • 2018
  • Numerous research papers have been accumulated due to the development and computerization of bibliometrics. This made it difficult to review all of the related papers published worldwide to conduct the study. However, due to the development of Natural language processing techniques, the tendency analysis of published research papers has become easier. In this study, text mining analysis using the statistical computing language R was carried out based on the bibliographic information of SCOPUS DB (Data Base) in the field of coastal and ocean engineering. As expected, the term 'wave' predominates, and it was confirmed that numerical analysis and hydraulic experiments were still dominant from the terms 'numerical model', 'numerical simulation', and 'experimental study'. In addition, recent use of the term 'wave energy' related to marine energy has been recognized. On the other hand, it was quantitatively confirmed that the frequency of connection between 'wave', and 'height' or 'energy' prevailed, and suggested the possibility of high resolution analysis by detailed field and period in the future.

Can Similarities in Medical thought be Quantified? - Focusing on Donguibogam, Uihagibmun and Gyeongagjeonseo - (의학 사상의 유사성은 계량 분석 될 수 있는가 - 『동의보감』과 『의학입문』, 『경악전서』를 중심으로 -)

  • Oh, Junho
    • Journal of Korean Medical classics
    • /
    • v.31 no.2
    • /
    • pp.71-82
    • /
    • 2018
  • Objectives : The purpose of this study is to compare the similarities among Donguibogam(DO), Uihagibmun(UI), and Gyeongagjeonseo(GY) in order to examine whether the medical thoughts embedded in the texts can be compared in a quantitative way. Methods : Under an empirical assumption that medical thoughts can be reduced to the frequency of major key words within the text, we selected the fourteen words of the four categories that are commonly used to describe physiology and pathology in Korean medicine as key words. And the frequency of these key words was measured and compared with each other in the three important medical texts in Korea. Results : As a result of quantitative analysis based on ${\chi}^2$ statistic, the key words in the books were distributed most heterogeneously in DO and distributed most homogeneously in UI. In comparison of the similarity analyzed by the same method, DO and UI were significantly more similar than those of DO and UI. The results of the word frequency pattern and the similarities of the book contents(CBDF) show that DO is influenced by UI, and the differences between standardized residuals and homogeneity tells us that internal context of both books are constructed differently. Conclusions : These results support the results of traditional research by experts. With the above, we were able to confirm that medical thoughts can be reduced to the frequency of major key words within the text, and compared through the frequency of such key words.

Exploring Information Ethics Issues based on Text Mining using Big Data from Web of Science (Web of Science 빅데이터를 활용한 텍스트 마이닝 기반의 정보윤리 이슈 탐색)

  • Kim, Han Sung
    • The Journal of Korean Association of Computer Education
    • /
    • v.22 no.3
    • /
    • pp.67-78
    • /
    • 2019
  • The purpose of this study is to explore information ethics issues based on academic big data from Web of Science (WoS) and to provide implications for information ethics education in informatics subject. To this end, 318 published papers from WoS related to information ethics were text mined. Specifically, this paper analyzed the frequency of key-words(TF, DF, TF-IDF), information ethics issues using topic modeling, and frequency of appearances by year for each issue. This paper used 'tm', 'topicmodel' package of R for text mining. The main results are as follows. First, this paper confirmed that the words 'digital', 'student', 'software', and 'privacy' were the main key-words through TF-IDF. Second, the topic modeling analysis showed 8 issues such as 'Professional value', 'Cyber-bullying', 'AI and Social Impact' et al., and the proportion of 'Professional value' and 'Cyber-bullying' was relatively high. This study discussed the implications for information ethics education in Korea based on the results of this analysis.

Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification (공격 메일 식별을 위한 비정형 데이터를 사용한 유전자 알고리즘 기반의 특징선택 알고리즘)

  • Hong, Sung-Sam;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.1-10
    • /
    • 2019
  • Since big-data text mining extracts many features and data, clustering and classification can result in high computational complexity and low reliability of the analysis results. In particular, a term document matrix obtained through text mining represents term-document features, but produces a sparse matrix. We designed an advanced genetic algorithm (GA) to extract features in text mining for detection model. Term frequency inverse document frequency (TF-IDF) is used to reflect the document-term relationships in feature extraction. Through a repetitive process, a predetermined number of features are selected. And, we used the sparsity score to improve the performance of detection model. If a spam mail data set has the high sparsity, detection model have low performance and is difficult to search the optimization detection model. In addition, we find a low sparsity model that have also high TF-IDF score by using s(F) where the numerator in fitness function. We also verified its performance by applying the proposed algorithm to text classification. As a result, we have found that our algorithm shows higher performance (speed and accuracy) in attack mail classification.

Investigations on Techniques and Applications of Text Analytics (텍스트 분석 기술 및 활용 동향)

  • Kim, Namgyu;Lee, Donghoon;Choi, Hochang;Wong, William Xiu Shun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.471-492
    • /
    • 2017
  • The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.

Analysis of key words published with the Korea Society of Emergency Medical Services journal using text mining (텍스트마이닝을 이용한 한국응급구조학회지 중심단어 분석)

  • Kwon, Chan-Yang;Yang, Hyun-Mo
    • The Korean Journal of Emergency Medical Services
    • /
    • v.24 no.1
    • /
    • pp.85-92
    • /
    • 2020
  • Purpose: The purpose of this study was to analyze the English abstract key words found within the Korea Society of Emergency Medical Services journal using text mining techniques to determine the adherence of these terms with Medical Subject Headings (MeSH) and identify key word trends. Methods: We analyzed 212 papers that were published from 2012 to 2019. R software, web scraping, and frequency analysis of key words were conducted using R's basic and text mining packages. Additionally, the Word Clouds package was used for visualization. Results: The average number of key words used per study was 3.9. Word cloud visualization revealed that CPR was most prominent in the first half and emergency medical technician was most frequently used during the second half. There were a total of 542 (64.9%) words that exactly matched the MeSH listed words. A total of 293 (35%) key words did not match MeSH listed words. Conclusion: Researchers should obey submission rules. Further, journals should update their respective submission rules. MeSH key words that are frequently cited should be suggested for use.

Study on CEO New Year's Address: Using Text Mining Method (텍스트마이닝을 활용한 주요 대기업 신년사 분석)

  • YuKyoung Kim;Daegon Cho
    • Journal of Information Technology Services
    • /
    • v.22 no.2
    • /
    • pp.93-127
    • /
    • 2023
  • This study analyzed the CEO New Year's addresses of major Korean companies, extracting key topics for employees via text mining techniques. An intended contribution of this study is to assist reporters, analysts, and researchers in gaining a better understanding of the New Year's addresses by elucidating the implicit and implicative features of messages within. To this end, this study collected and analyzed 545 New Year's addresses published between 2012 and 2021 by the top 66 Korean companies in terms of market capitalization. Research methodologies applied include text clustering, word embedding of keywords, frequency analysis, and topic modeling. Our main findings suggest that the messages in the New Year's addresses were categorized into nine topics-organizational culture, global advancement, substantial management, business reorganization, capacity building, market leadership, management innovation, sustainable management, and technology development. Next, this study further analyzed the managerial significance of each topic and discussed their characteristics from the perspectives of time, industry, and corporate groups. Companies were typically found to emphasize sound management, market leadership, and business reorganization during economic downturns while stressing capacity building and organizational culture during market transition periods. Also, companies belonging to corporate groups tended to emphasize founding philosophy and corporate culture.