• Title/Summary/Keyword: text mining analysis

Search Result 1,208, Processing Time 0.028 seconds

Research trends in statistics for domestic and international journal using paper abstract data (초록데이터를 활용한 국내외 통계학 분야 연구동향)

  • Yang, Jong-Hoon;Kwak, Il-Youp
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.267-278
    • /
    • 2021
  • As time goes by, the amount of data is increasing regardless of government, business, domestic or overseas. Accordingly, research on big data is increasing in academia. Statistics is one of the major disciplines of big data research, and it will be interesting to understand the research trend of statistics through big data in the growing number of papers in statistics. In this study, we analyzed what studies are being conducted through abstract data of statistical papers in Korea and abroad. Research trends in domestic and international were analyzed through the frequency of keyword data of the papers, and the relationship between the keywords was visualized through the Word Embedding method. In addition to the keywords selected by the authors, words that are importantly used in statistical papers selected through Textrank were also visualized. Lastly, 10 topics were investigated by applying the LDA technique to the abstract data. Through the analysis of each topic, we investigated which research topics are frequently studied and which words are used importantly.

Analysis of Research Topics and Trends on COVID-19 in Korea Using Latent Dirichlet Allocation (LDA)

  • Heo, Seong-Min;Yang, Ji-Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.83-91
    • /
    • 2020
  • This study aims to identify research topics and examine the trend of Covid19-related papers on DBpia. Applying latent Dirichlet allocation (LDA), we have extracted seven research topics, each of which concerns "International Dynamics", "Technology & Security", "Psychological Impact", "Biomedical-Related", "Economic Impact", "Online Education", and "Religion-Related". In addition, we used the multinomial logistic model to examine the trend of research topics. We found that the papers mainly cover topics related to "International Dynamics" and "Biomedical-Related" before June 2020, but the topics have become diverse since then. In particular, topics regarding "Economic Impact", "Online Education" and "Psychological Impact" has drawn increased attention of researchers. The findings would provide a guideline for collaboration in Covid19-related research, and could serve as a reference work for active research.

A Study on Stock Trend Determination in Stock Trend Prediction

  • Lim, Chungsoo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.35-44
    • /
    • 2020
  • In this study, we analyze how stock trend determination affects trend prediction accuracy. In stock markets, successful investment requires accurate stock price trend prediction. Therefore, a volume of research has been conducted to improve the trend prediction accuracy. For example, information extracted from SNS (social networking service) and news articles by text mining algorithms is used to enhance the prediction accuracy. Moreover, various machine learning algorithms have been utilized. However, stock trend determination has not been properly analyzed, and conventionally used methods have been employed repeatedly. For this reason, we formulate the trend determination as a moving average-based procedure and analyze its impact on stock trend prediction accuracy. The analysis reveals that trend determination makes prediction accuracy vary as much as 47% and that prediction accuracy is proportional to and inversely proportional to reference window size and target window size, respectively.

Topic modeling and topic change trend analysis for advanced construction technologies (건설신기술에 대한 토픽 모델링 및 토픽 변화추이 분석)

  • Jeong, Seong Yun;Kim, Nam Gon
    • Smart Media Journal
    • /
    • v.10 no.4
    • /
    • pp.102-110
    • /
    • 2021
  • Currently, the advanced construction technology endorsement system is being operated to promote the development of domestic construction technology. We tried to examine the implicit meanings inherent in advanced construction technologies by analyzing the relationship between emerging vocabularies with high importance in relation to the advanced construction technologies endorsed through this system. For this purpose, 918 cases of advanced construction technology information were collected. Based on the endorsed year and summary of the advanced construction technologies, the importance of the emerging vocabularies was measured for each advanced construction technology. And, based on the LDA model, the degree of influence between related vocabularies was evaluated for each of the four topic areas. Topics according to the technical application fields were analyzed. From 1990 to 2021, the trend of changes in highly influential vocabularies by each topic was inferred. In the future, changes in the degree of influence of the topics of environment, machinery, facilities, and maintenance and reinforcement of structures and related technology fields were predicted.

Study on Tendency of Cloud Computing Using R and LDA Technique : Focusing on Tendency of Overseas Studies (R과 LDA 기법을 활용한 클라우드 컴퓨팅 동향에 관한 연구: 해외 연구 동향을 중심으로)

  • Kang, Tae-Gu
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.5
    • /
    • pp.261-266
    • /
    • 2022
  • The full-fledged digital age derived from the fourth industrial revolution and the impact of COVID-19 lead to changes in various fields, including companies. In other words, the importance of cloud computing is being emphasized in the rapidly changing digital environment due to the rapid growth of the cloud market due to the rapid increase in digital services. The cloud may be one of the representative strategies for sustainable growth and survival in various fields as well as related industries. Although there have been a variety of studies on the cloud, the tendency of them has been not been adequately examined. This paper, therefore, analyzed the tendency of studies on the cloud computing. by using SCOPUS, the database of overseas academic journals using both R and LAD technique. The findings showed that many studies with high interest in the cloud computing have been conducted, the cloud computing were most often drawn from an analysis on key words. Moreover, various key words, including cloud, cloud and computing, data and computing were drawn, except for the theme of cloud computing. It is expected that could be used as a basic data, in that they provide the foundation for activating the related industries in terms of practice of the cloud computing.

Selection of Effective Herbal Medicines for Parkinson's Disease Based on the Text Mining of the Classical Korean Medical Literature Donguibogam

  • Bae, Hyo Won;Lee, Tae Wook;Choi, Byung Tae;Shin, Hwa Kyoung;Yun, Young Ju
    • The Journal of Korean Medicine
    • /
    • v.42 no.4
    • /
    • pp.120-132
    • /
    • 2021
  • Objectives: The prevalence of Parkinson's disease is on an upward trend along with an increase in the aging population but there is no available treatment that halts the progression of neurodegeneration. This study reports a numerical analysis on Donguibogam and suggests novel herbal drugs, which have never been researched before but found to be deemed effective in this study. Methods: Referring to 71 Korean medicine symptom terms that represent the symptoms of Parkinson's disease, 4170 prescriptions described in Donguibogam were classified into two groups based on whether their main effects were effective for Parkinson's disease or not. Comparing the two groups, the chi-square test was performed to select statistically significant herbs, while the t-test, Wilcoxon test, and descriptive statistics were performed to determine the appropriate dose. Results: One hundred and twenty-seven prescriptions effective for Parkinson's disease were identified. The chi-square test determined 17 herbs that are effective for symptomatic treatment. Among the medicinal herbs, the authors suggest Osterici seu Notopterygii Radix et Rhizoma, Ephedrae Herba, Aconiti Tuber, Myrrha, Sinomeni Caulis et Rhizoma, and Aconiti Kusnezoffii Tuber as herbal candidates that have never been studied for Parkinson's disease. Through the statistical tests, it was judged that the mean value of the dose of the entire prescription was the appropriate dose for each herb. Conclusions: Seventeen herbs were selected for Parkinson's disease and the appropriate daily dose were calculated. Furthermore, this study presented a new process that applies a statistical method to traditional medical literature and preselecting herbs deemed effective for specific diseases.

Development of Social Data Collection and Loading Engine-based Reliability analysis System Against Infectious Disease Pandemic (감염병 위기 대응을 위한 소셜 데이터 수집 및 적재 엔진 기반 신뢰도 분석 시스템 개발)

  • Doo Young Jung;Sang-Jun Lee;MIN KYUNG IL;Seogsong Jeong;HyunWook Han
    • The Journal of Bigdata
    • /
    • v.7 no.2
    • /
    • pp.103-111
    • /
    • 2022
  • There are many institutions, organizations, and sites related to responding to infectious diseases, but as the pandemic situation such as COVID-19 continues for years, there are many changes in the initial and current aspects, and accordingly, policies and response systems are evolving. As a result, regional gaps arise, and various problems are scattered due to trust, distrust, and implementation of policies. Therefore, in the process of analyzing social data including information transmission, Twitter data, one of the major social media platforms containing inaccurate information from unknown sources, was developed to prevent facts in advance. Based on social data, which is unstructured data, an algorithm that can automatically detect infectious disease threats is developed to create an objective basis for responding to the infectious disease crisis to solidify international competitiveness in related fields.

A Study of Information Literacy Curriculum Using Topic Modeling (토픽모델링을 활용한 정보활용교육 연구주제 분석 및 교육내용 제안)

  • Jihye, Yun;Yoo Kyung, Jeong
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.4
    • /
    • pp.1-21
    • /
    • 2022
  • The aim of this study is to identify the research topics and suggest an information literacy curriculum by analyzing research articles on information literacy. For this purpose, we applied the topic modeling technique to 97 scientific articles and identified the core contents of information literacy education, such as media literacy, information literacy instruction, and the use of information resources. Based on the analysis results, we suggested an information literacy curriculum by considering the Big 6 model, information literacy standards of American Association of School Library, and Association of College and Research Libraries's information literacy competencies. This study is significant in that it considered 'use of information resources' and 'information ethics' to suggest information literacy education.

Analysis of Waterpark Status and Recognition Using Big Data Analysis (빅데이터 분석을 활용한 워터파크 현황 및 인식 분석)

  • Kim, Jae-Hwan;Lee, Jae-Moon
    • Journal of Digital Convergence
    • /
    • v.15 no.10
    • /
    • pp.525-535
    • /
    • 2017
  • The purpose of this study aims to examine consumer perception and current status of water park. The Naver and Daum were used for data collection channels and the keyword 'water park' was used for data retrieval. The data analysis period was limited to the study period from January 1, 2015 to December 31, 2016 for a total of two years. First, as a result of the frequency analysis, hidden cameras, Lotte water park, arrests, suspects, gimhae were in top 5 in 2015, Lotte water park, swimming, summer, opening, admission ticket were in top 5 in 2016. Second, as a result of the connection degree central analysis, hidden camera, arrest, suspect, female, shower room were in top 5 in 2015, swimming, Lotte water park, summer and One Mount, admission ticket were in top 5 in 2016. Third, as a result of the N-GRAM network graph, the water park/hidden camera, the hidden camera/hidden camera, the suspect/arrest, the Gimhae/Lotte water park, water park/suspect were in top 5 in 2015, and One Mount/water park, Gimhae/Lotte water park, water park/admission ticket, water park/water park, water park/opening were in top 5 in 2016. Fourth, as a result of the CONCOR analysis, three groups in 2015 and two groups in 2016 were formed.

A Study on the Perception of Pit and Fissure Sealant using Unstructured Big Data (비정형 빅데이터를 이용한 치면열구전색(치아홈메우기)에 대한 인식분석)

  • Han-A Cho
    • Journal of Korean Dental Hygiene Science
    • /
    • v.6 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • Background: This study aimed to explore the overall perception of pit and fissure sealants and suggest methods to revitalize their current stagnation. Methods: To determine the social perception of the change in coverage policy for pit and fissure sealants, we categorized them into five time periods. The first period (December 1, 2009 to November 30, 2010), the second period (December 1, 2010 to September 30, 2012), the third period (October 1, 2012 to May 5, 2013), the fourth period (May 6, 2013 to September 30, 2017), and the fifth period (October 1, 2017 to December 31, 2022). We utilized text mining, an unstructured big data analysis method. Keywords were collected and analyzed using Textom, and the frequency analysis of the top 30 keywords, structural features of the semantic network, centrality analysis, QAP correlation analysis, and co-occurrence analysis were conducted. Results: The frequency analysis showed that the top keywords for each time period were 'Cavities', 'Treatment', and 'Children'. In the structural features of the semantic network of pit and fissure sealants by time period, the density index was found to be around 1.00 for all time periods. The QAP correlation analysis showed the highest correlation between the first and second periods and the fourth and fifth periods with a correlation coefficient of 0.834. The co-occurrence analysis showed that 'cavities' and 'prevention were the top two words across all time periods. Conclusion: This study showed that pit and fissure sealants are well accepted by the society as a preventive treatment for caries. However, the awareness of health education related to these sealants was found to be low. Efforts to revitalize stagnant pit and fissure sealants need to be strengthened with effective education.