• Title/Summary/Keyword: Text frequency analysis

Search Result 459, Processing Time 0.028 seconds

Analyzing Different Contexts for Energy Terms through Text Mining of Online Science News Articles (온라인 과학 기사 텍스트 마이닝을 통해 분석한 에너지 용어 사용의 맥락)

  • Oh, Chi Yeong;Kang, Nam-Hwa
    • Journal of Science Education
    • /
    • v.45 no.3
    • /
    • pp.292-303
    • /
    • 2021
  • This study identifies the terms frequently used together with energy in online science news articles and topics of the news reports to find out how the term energy is used in everyday life and to draw implications for science curriculum and instruction about energy. A total of 2,171 online news articles in science category published by 11 major newspaper companies in Korea for one year from March 1, 2018 were selected by using energy as a search term. As a result of natural language processing, a total of 51,224 sentences consisting of 507,901 words were compiled for analysis. Using the R program, term frequency analysis, semantic network analysis, and structural topic modeling were performed. The results show that the terms with exceptionally high frequencies were technology, research, and development, which reflected the characteristics of news articles that report new findings. On the other hand, terms used more than once per two articles were industry-related terms (industry, product, system, production, market) and terms that were sufficiently expected as energy-related terms such as 'electricity' and 'environment.' Meanwhile, 'sun', 'heat', 'temperature', and 'power generation', which are frequently used in energy-related science classes, also appeared as terms belonging to the highest frequency. From a network analysis, two clusters were found including terms related to industry and technology and terms related to basic science and research. From the analysis of terms paired with energy, it was also found that terms related to the use of energy such as 'energy efficiency,' 'energy saving,' and 'energy consumption' were the most frequently used. Out of 16 topics found, four contexts of energy were drawn including 'high-tech industry,' 'industry,' 'basic science,' and 'environment and health.' The results suggest that the introduction of the concept of energy degradation as a starting point for energy classes can be effective. It also shows the need to introduce high-tech industries or the context of environment and health into energy learning.

Media exposure analysis of official sponsors and general companies of mega sport event (메가 스포츠이벤트의 공식스폰서와 일반기업의 미디어 노출 분석)

  • Kim, Joo-Hak;Cho, Sun-Mi
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.4
    • /
    • pp.171-181
    • /
    • 2018
  • As the proportion of sports events in the sports industry grows, the official sponsor market for sports events is also increasing. But because official sponsors are limited and expensive, some companies approach sporting events by way of Ambush marketing. This study is to analyze the differences of media exposure between official sponsors and general companies of mega sport events. To accomplish the purpose of the study, we collected text articles and analyzed them from the period of 2016 Rio Olympics, one year before the Olympics and one year after the Olympics. Web crawling was performed using Python for the collection of articles. Morphological and frequency analysis was performed using the KoNLP package and the TM package of statistical program R. In addition, the opinions of the related experts group were gathered to classify the companies or organizations in the media as the Organizing Committees for the Olympic Games(OCOGs), official sponsor, and general companies. As a result of the analysis, 5,220 times appeared related to the OCOGs, 7,845 times appeared related to the official sponsor, and 7,028 times appeared related to general companies. There isn't much difference in the frequency of exposure between official sponsors and general companies. It implies that Ambush marketing is recognized as a strategic marketing technique. The International Olympic Committee(IOC) has to recognize these social phenomena and establish reasonable standards for the marketing activities of official sponsors and general companies. And this study will serve as a basis for fair sponsor activities or marketing activities of sports events.

A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis (텍스트 마이닝을 활용한 신문사에 따른 내용 및 논조 차이점 분석)

  • Kam, Miah;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.53-77
    • /
    • 2012
  • This study analyses the difference of contents and tones of arguments among three Korean major newspapers, the Kyunghyang Shinmoon, the HanKyoreh, and the Dong-A Ilbo. It is commonly accepted that newspapers in Korea explicitly deliver their own tone of arguments when they talk about some sensitive issues and topics. It could be controversial if readers of newspapers read the news without being aware of the type of tones of arguments because the contents and the tones of arguments can affect readers easily. Thus it is very desirable to have a new tool that can inform the readers of what tone of argument a newspaper has. This study presents the results of clustering and classification techniques as part of text mining analysis. We focus on six main subjects such as Culture, Politics, International, Editorial-opinion, Eco-business and National issues in newspapers, and attempt to identify differences and similarities among the newspapers. The basic unit of text mining analysis is a paragraph of news articles. This study uses a keyword-network analysis tool and visualizes relationships among keywords to make it easier to see the differences. Newspaper articles were gathered from KINDS, the Korean integrated news database system. KINDS preserves news articles of the Kyunghyang Shinmun, the HanKyoreh and the Dong-A Ilbo and these are open to the public. This study used these three Korean major newspapers from KINDS. About 3,030 articles from 2008 to 2012 were used. International, national issues and politics sections were gathered with some specific issues. The International section was collected with the keyword of 'Nuclear weapon of North Korea.' The National issues section was collected with the keyword of '4-major-river.' The Politics section was collected with the keyword of 'Tonghap-Jinbo Dang.' All of the articles from April 2012 to May 2012 of Eco-business, Culture and Editorial-opinion sections were also collected. All of the collected data were handled and edited into paragraphs. We got rid of stop-words using the Lucene Korean Module. We calculated keyword co-occurrence counts from the paired co-occurrence list of keywords in a paragraph. We made a co-occurrence matrix from the list. Once the co-occurrence matrix was built, we used the Cosine coefficient matrix as input for PFNet(Pathfinder Network). In order to analyze these three newspapers and find out the significant keywords in each paper, we analyzed the list of 10 highest frequency keywords and keyword-networks of 20 highest ranking frequency keywords to closely examine the relationships and show the detailed network map among keywords. We used NodeXL software to visualize the PFNet. After drawing all the networks, we compared the results with the classification results. Classification was firstly handled to identify how the tone of argument of a newspaper is different from others. Then, to analyze tones of arguments, all the paragraphs were divided into two types of tones, Positive tone and Negative tone. To identify and classify all of the tones of paragraphs and articles we had collected, supervised learning technique was used. The Na$\ddot{i}$ve Bayesian classifier algorithm provided in the MALLET package was used to classify all the paragraphs in articles. After classification, Precision, Recall and F-value were used to evaluate the results of classification. Based on the results of this study, three subjects such as Culture, Eco-business and Politics showed some differences in contents and tones of arguments among these three newspapers. In addition, for the National issues, tones of arguments on 4-major-rivers project were different from each other. It seems three newspapers have their own specific tone of argument in those sections. And keyword-networks showed different shapes with each other in the same period in the same section. It means that frequently appeared keywords in articles are different and their contents are comprised with different keywords. And the Positive-Negative classification showed the possibility of classifying newspapers' tones of arguments compared to others. These results indicate that the approach in this study is promising to be extended as a new tool to identify the different tones of arguments of newspapers.

Analyzing the discriminative characteristic of cover letters using text mining focused on Air Force applicants (텍스트 마이닝을 이용한 공군 부사관 지원자 자기소개서의 차별적 특성 분석)

  • Kwon, Hyeok;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.75-94
    • /
    • 2021
  • The low birth rate and shortened military service period are causing concerns about selecting excellent military officers. The Republic of Korea entered a low birth rate society in 1984 and an aged society in 2018 respectively, and is expected to be in a super-aged society in 2025. In addition, the troop-oriented military is changed as a state-of-the-art weapons-oriented military, and the reduction of the military service period was implemented in 2018 to ease the burden of military service for young people and play a role in the society early. Some observe that the application rate for military officers is falling due to a decrease of manpower resources and a preference for shortened mandatory military service over military officers. This requires further consideration of the policy of securing excellent military officers. Most of the related studies have used social scientists' methodologies, but this study applies the methodology of text mining suitable for large-scale documents analysis. This study extracts words of discriminative characteristics from the Republic of Korea Air Force Non-Commissioned Officer Applicant cover letters and analyzes the polarity of pass and fail. It consists of three steps in total. First, the application is divided into general and technical fields, and the words characterized in the cover letter are ordered according to the difference in the frequency ratio of each field. The greater the difference in the proportion of each application field, the field character is defined as 'more discriminative'. Based on this, we extract the top 50 words representing discriminative characteristics in general fields and the top 50 words representing discriminative characteristics in technology fields. Second, the number of appropriate topics in the overall cover letter is calculated through the LDA. It uses perplexity score and coherence score. Based on the appropriate number of topics, we then use LDA to generate topic and probability, and estimate which topic words of discriminative characteristic belong to. Subsequently, the keyword indicators of questions used to set the labeling candidate index, and the most appropriate index indicator is set as the label for the topic when considering the topic-specific word distribution. Third, using L-LDA, which sets the cover letter and label as pass and fail, we generate topics and probabilities for each field of pass and fail labels. Furthermore, we extract only words of discriminative characteristics that give labeled topics among generated topics and probabilities by pass and fail labels. Next, we extract the difference between the probability on the pass label and the probability on the fail label by word of the labeled discriminative characteristic. A positive figure can be seen as having the polarity of pass, and a negative figure can be seen as having the polarity of fail. This study is the first research to reflect the characteristics of cover letters of Republic of Korea Air Force non-commissioned officer applicants, not in the private sector. Moreover, these methodologies can apply text mining techniques for multiple documents, rather survey or interview methods, to reduce analysis time and increase reliability for the entire population. For this reason, the methodology proposed in the study is also applicable to other forms of multiple documents in the field of military personnel. This study shows that L-LDA is more suitable than LDA to extract discriminative characteristics of Republic of Korea Air Force Noncommissioned cover letters. Furthermore, this study proposes a methodology that uses a combination of LDA and L-LDA. Therefore, through the analysis of the results of the acquisition of non-commissioned Republic of Korea Air Force officers, we would like to provide information available for acquisition and promotional policies and propose a methodology available for research in the field of military manpower acquisition.

Study on Research Trends (2001~2020) of the Baekdudaegan Mountains with Big Data Analyses of Academic Journals (학술논문 빅데이터 분석을 활용한 백두대간에 관한 연구동향(2001~2020) 분석)

  • Lee, Jinkyu;Sim, Hyung Seok;Lee, Chang-Bae
    • Journal of Korean Society of Forest Science
    • /
    • v.111 no.1
    • /
    • pp.36-49
    • /
    • 2022
  • The purpose of this study was to analyze domestic research trends related to the Baekdudaegan Mountains in the last two decades. In total, 551 academic papers and keyword data related to the Baekdudaegan Mountains were collected using the "Research and Information Service Section" and analyzed using "big data" analysis programs, such as Textom and UCINET. Papers related to the Baekdudaegan Mountains were published in 177 academic journals, and 229 papers (41.6% of all published papers) were published between 2011 and 2015. According to word frequency data (N-gram analyses), the major research topic over the past 20 years was "species diversity." According to CONCOR analysis results, the main research could be divided into 15 areas, the most important of which was "species diversity," followed by "vegetation restoration and management," and "culture." Ecological research comprised 12 groups with a frequency of 78.8%; humanities and social research comprised 2 groups with a frequency of 15.6%. Overall, our study of research areas and quantitative data analyses provides valuable information that could help establish policy formulation.

Implementation and Verification of TCP Congestion Control Algorithm using SDL (SDL을 이용한 TCP 혼잡제어 알고리즘의 구현 및 검증)

  • 이재훈;조성현;이태오;임재홍
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.2
    • /
    • pp.214-227
    • /
    • 2003
  • Developing an application, it is difficult to catch an exact requirement with the conventional text-based method. It has also problems in verification and analysis at each developing stage. Therefore, if an adjustment is required with an error and change of requirement, a bad effect happen in the whole system. In this case, it also affect adversely on the developing cost and period. Meanwhile, if an analysis or verification is performed, the possibility of an error frequency reduces. Thus, not only is it easier to correct the error but also add an new requirement. This thesis embody a TCP/IP congestion control algorithm with SDL which provides automatically graphic interface, verification and analysis to each developing stage. Using SDL gave a clear representation embodiment in each developing stage and easiness of adjustment due to changing requirements or correcting errors. In addition, the stages of protocol have been certified in a simulation by verification of MSC and the results showed a possibility of developing a better TCP/IP protocol.

Correspondence Strategy for Big Data's New Customer Value and Creation of Business (빅 데이터의 새로운 고객 가치와 비즈니스 창출을 위한 대응 전략)

  • Koh, Joon-Cheol;Lee, Hae-Uk;Jeong, Jee-Youn;Kim, Kyung-Sik
    • Journal of the Korea Safety Management & Science
    • /
    • v.14 no.4
    • /
    • pp.229-238
    • /
    • 2012
  • Within last 10 years, internet has become a daily activity, and humankind had to face the Data Deluge, a dramatic increase of digital data (Economist 2012). Due to exponential increase in amount of digital data, large scale data has become a big issue and hence the term 'big data' appeared. There is no official agreement in quantitative and detailed definition of the 'big data', but the meaning is expanding to its value and efficacy. Big data not only has the standardized personal information (internal) like customer information, but also has complex data of external, atypical, social, and real time data. Big data's technology has the concept that covers wide range technology, including 'data achievement, save/manage, analysis, and application'. To define the connected technology of 'big data', there are Big Table, Cassandra, Hadoop, MapReduce, Hbase, and NoSQL, and for the sub-techniques, Text Mining, Opinion Mining, Social Network Analysis, Cluster Analysis are gaining attention. The three features that 'bid data' needs to have is about creating large amounts of individual elements (high-resolution) to variety of high-frequency data. Big data has three defining features of volume, variety, and velocity, which is called the '3V'. There is increase in complexity as the 4th feature, and as all 4features are satisfied, it becomes more suitable to a 'big data'. In this study, we have looked at various reasons why companies need to impose 'big data', ways of application, and advanced cases of domestic and foreign applications. To correspond effectively to 'big data' revolution, paradigm shift in areas of data production, distribution, and consumption is needed, and insight of unfolding and preparing future business by considering the unpredictable market of technology, industry environment, and flow of social demand is desperately needed.

Knowledge Trend Analysis of Uncertainty in Biomedical Scientific Literature (생의학 학술 문헌의 불확실성 기반 지식 동향 분석에 관한 연구)

  • Heo, Go Eun;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.2
    • /
    • pp.175-199
    • /
    • 2019
  • Uncertainty means incomplete stages of knowledge of propositions due to the lack of consensus of information and existing knowledge. As the amount of academic literature increases exponentially over time, new knowledge is discovered as research develops. Although the flow of time may be an important factor to identify patterns of uncertainty in scientific knowledge, existing studies have only identified the nature of uncertainty based on the frequency in a particular discipline, and they did not take into consideration of the flow of time. Therefore, in this study, we identify and analyze the uncertainty words that indicate uncertainty in the scientific literature and investigate the stream of knowledge. We examine the pattern of biomedical knowledge such as representative entity pairs, predicate types, and entities over time. We also perform the significance testing using linear regression analysis. Seven pairs out of 17 entity pairs show the significant decrease pattern statistically and all 10 representative predicates decrease significantly over time. We analyze the relative importance of representative entities by year and identify entities that display a significant rising and falling pattern.

The Impact of Technology Utilization on Health Research and Development: Case Studies of the Development of Medical Device (합리적 기술 활용이 연구개발에 미치는 영향: 의료기기 개발 사례를 중심으로)

  • Min, Hye Sook;Park, Ji Eun;Kim, Chang-Yup
    • Health Policy and Management
    • /
    • v.31 no.2
    • /
    • pp.148-157
    • /
    • 2021
  • Background: Based on that the key function of health technology is improving the quality of healthcare services, our study purports to explore the process of medical device development in detail and to discuss its policy implications. Methods: A total of 12 in-depth interviews were conducted with four groups of industry, hospital, academia, and civil society. All of the interviewees except those from civil society were involved in the new medical device development between 2009 and 2018. We performed a text network analysis and content analysis of the interview data. Results: The frequency and the degree centrality rankings suggested a close association between the utilization issue and the technology development. Similarly, the results of the content analysis showed that the appropriate intervention in the utilization of technology has a direct impact on the progress of development. Under the continuous industrial effort to boost profits by developing new technology, service providers and citizens should be knowledgeable of and make good use of the new technology for the provision of better services. Conclusion: As the development itself would not guarantee the improvement of service quality and better health outcomes, health technology policies should take a more comprehensive view to serve the unmet needs and even to facilitate the technology development.

Analyzing Trends in Organizational Effectiveness(Job Satisfaction, Organizational Commitment, Organizational Citizenship Behavior) Research: Focusing on SCOPUS DB (조직유효성(직무만족, 조직몰입, 조직시민행동) 연구 동향 분석: SCOPUS DB를 중심으로)

  • Jae-Boong Kim
    • Journal of Industrial Convergence
    • /
    • v.22 no.1
    • /
    • pp.65-73
    • /
    • 2024
  • This paper aims to identify the major research trends in organizational effectiveness over the past 20 years. For this purpose, SCOPUS, an international academic database provided by Elsevier, was used to identify research trends in organizational effectiveness over the past 24 years (2000~2023). According to the frequency analysis, there were 2,789 cases of organizational, 2,714 cases of effectiveness, 850 cases of management, 689 cases of performance, 632 cases of organizations, and 597 cases of leadership. Trend analysis. While effectiveness and organizational have been consistently researched, the trends of leadership and management have been declining in recent years. LDA analysis shows that effectiveness and organizational are important topics. This shows that it is important to be able to predict the future when it is difficult to predict the future. The results of this study can be used as a guide for companies to establish organizational management at a strategic level and improve organizational effectiveness.