• 제목/요약/키워드: Topic coherence

Search Result 25, Processing Time 0.026 seconds

Tweets analysis using a Dynamic Topic Modeling : Focusing on the 2019 Koreas-US DMZ Summit (트윗의 타임 시퀀스를 활용한 DTM 분석 : 2019 남북미정상회동 이벤트를 중심으로)

  • Ko, EunJi;Choi, SunYoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.2
    • /
    • pp.308-313
    • /
    • 2021
  • In this study, tweets about the 2019 Koreas-US DMZ Summit were collected along with a time sequence and analyzed by a sequential topic modeling method, Dynamic Topic Modeling(DTM). In microblogging services such as Twitter, unstructured data that mixes news and an opinion about a single event occurs at the same time on a large scale, and information and reactions are produced in the same message format. Therefore, to grasp a topic trend, the contextual meaning can be found only by performing pattern analysis reflecting the characteristics of sequential data. As a result of calculating the DTM after obtaining the topic coherence score and evaluating the Latent Dirichlet Allocation(LDA), 30 topics related to news reports and opinions were derived, and the probability of occurrence of each topic and keywords were dynamically evolving. In conclusion, the study found that DTM is a suitable model for analyzing the trend of integrated topics in a specific event over time.

The Effect of Cohesive Devices on Memory and Understanding of Scientific Text (응집장치가 과학텍스트의 기억과 이해에 미치는 효과)

  • 김세영;한광희;조숙환
    • Korean Journal of Cognitive Science
    • /
    • v.13 no.2
    • /
    • pp.1-13
    • /
    • 2002
  • This Paper is concerned with the impact of linguistic markers of coherence, such as causal connectives. repetitions. and anchoring devices. on the comprehension of a scientific text in Korean. A scientific text on the process of lightning formation was selected. and two versions of the text were constructed by varying the strength of coherence. Eighty-two undergraduate students took Part in the experiment in which they were instructed to fill in the blanks in each text in a recall and a recognition task and to respond to a set of question in a comprehension test. The results of this experiment revealed a selective effect of the cohesive markers. It was found that the different linguistic signals seem to Play a facilitating role in varying degrees in accordance with the type of tasks involved Moreover an analysis of topic continuity from the beginning paragraphs through the last revealed that the text was better understood in the paragraphs containing the main topic better than those without it. This finding seems to indicate that the off-line processing of scientific text is not influenced solely by the local bottom-up processing alone The effect of topic continuity seems to suggest that a global. top-down processing effect has an important role to play. overriding the impact of cohesive devices.

  • PDF

Keyword Reorganization Techniques for Improving the Identifiability of Topics (토픽 식별성 향상을 위한 키워드 재구성 기법)

  • Yun, Yeoil;Kim, Namgyu
    • Journal of Information Technology Services
    • /
    • v.18 no.4
    • /
    • pp.135-149
    • /
    • 2019
  • Recently, there are many researches for extracting meaningful information from large amount of text data. Among various applications to extract information from text, topic modeling which express latent topics as a group of keywords is mainly used. Topic modeling presents several topic keywords by term/topic weight and the quality of those keywords are usually evaluated through coherence which implies the similarity of those keywords. However, the topic quality evaluation method based only on the similarity of keywords has its limitations because it is difficult to describe the content of a topic accurately enough with just a set of similar words. In this research, therefore, we propose topic keywords reorganizing method to improve the identifiability of topics. To reorganize topic keywords, each document first needs to be labeled with one representative topic which can be extracted from traditional topic modeling. After that, classification rules for classifying each document into a corresponding label are generated, and new topic keywords are extracted based on the classification rules. To evaluated the performance our method, we performed an experiment on 1,000 news articles. From the experiment, we confirmed that the keywords extracted from our proposed method have better identifiability than traditional topic keywords.

Topic Modeling and Sentiment Analysis of Twitter Discussions on COVID-19 from Spatial and Temporal Perspectives

  • AlAgha, Iyad
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.1
    • /
    • pp.35-53
    • /
    • 2021
  • The study reported in this paper aimed to evaluate the topics and opinions of COVID-19 discussion found on Twitter. It performed topic modeling and sentiment analysis of tweets posted during the COVID-19 outbreak, and compared these results over space and time. In addition, by covering a more recent and a longer period of the pandemic timeline, several patterns not previously reported in the literature were revealed. Author-pooled Latent Dirichlet Allocation (LDA) was used to generate twenty topics that discuss different aspects related to the pandemic. Time-series analysis of the distribution of tweets over topics was performed to explore how the discussion on each topic changed over time, and the potential reasons behind the change. In addition, spatial analysis of topics was performed by comparing the percentage of tweets in each topic among top tweeting countries. Afterward, sentiment analysis of tweets was performed at both temporal and spatial levels. Our intention was to analyze how the sentiment differs between countries and in response to certain events. The performance of the topic model was assessed by being compared with other alternative topic modeling techniques. The topic coherence was measured for the different techniques while changing the number of topics. Results showed that the pooling by author before performing LDA significantly improved the produced topic models.

Analyzing Customer Experience in Hotel Services Using Topic Modeling

  • Nguyen, Van-Ho;Ho, Thanh
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.586-598
    • /
    • 2021
  • Nowadays, users' reviews and feedback on e-commerce sites stored in text create a huge source of information for analyzing customers' experience with goods and services provided by a business. In other words, collecting and analyzing this information is necessary to better understand customer needs. In this study, we first collected a corpus with 99,322 customers' comments and opinions in English. From this corpus we chose the best number of topics (K) using Perplexity and Coherence Score measurements as the input parameters for the model. Finally, we conducted an experiment using the latent Dirichlet allocation (LDA) topic model with K coefficients to explore the topic. The model results found hidden topics and keyword sets with high probability that are interesting to users. The application of empirical results from the model will support decision-making to help businesses improve products and services as well as business management and development in the field of hotel services.

Topic Expansion based on Infinite Vocabulary Online LDA Topic Model using Semantic Correlation Information (무한 사전 온라인 LDA 토픽 모델에서 의미적 연관성을 사용한 토픽 확장)

  • Kwak, Chang-Uk;Kim, Sun-Joong;Park, Seong-Bae;Kim, Kweon Yang
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.9
    • /
    • pp.461-466
    • /
    • 2016
  • Topic expansion is an expansion method that reflects external data for improving quality of learned topic. The online learning topic model is not appropriate for topic expansion using external data, because it does not reflect unseen words to learned topic model. In this study, we proposed topic expansion method using infinite vocabulary online LDA. When unseen words appear in learning process, the proposed method allocates unseen word to topic after calculating semantic correlation between unseen word and each topic. To evaluate the proposed method, we compared with existing topic expansion method. The results indicated that the proposed method includes additional information that is not contained in broadcasting script by reflecting external documents. Also, the proposed method outperformed on coherence evaluation.

Exploring Secondary Science Teacher Preparation Program and Suggesting its Development Direction: A Case of USA and Korea

  • Park, Young-Shin;Lee, Ki-Young;Morrell, Patricia D.;Schepige, Adele
    • Journal of the Korean earth science society
    • /
    • v.38 no.5
    • /
    • pp.378-392
    • /
    • 2017
  • Teacher quality is a topic of international concern, as it impacts student learning and teacher preparation. This study compared the undergraduate secondary science teacher preparation programs from two universities in Korea with those of Oregon, USA. We examined the programs' structural curricular coherence, conceptual curricular coherence, and curricular balance. Structural curricular coherence was determined by examining the overarching goals of the institutions' programs, the organization of the programs of study in terms of meeting those goals, and outside bodies of evidence. All universities were in structural coherence for various reasons. Conceptual curricular coherence was determined by examining students' perceptions of the connection between their preparation and their clinical practice. In case of Korea, most students from both universities were not satisfied with their practical preparation. In the US, the students from both institutions felt well prepared to transition to inservice teaching. To determine curricular balance, we examined the institutions' preparation programs looking at the credit hours taken in the four main areas of the teacher knowledge base: GPK (General Pedagogical Knowledge), SMK (Subject Matter Knowledge), PCK (Pedagogical Content Knowledge), and CK (Contextual Knowledge). The total credit hours taken in each category was very similar by country but the application and field component in the USA was far greater than those of Korea where the focus was heavily on SMK and PCK. The main reason for these may be the nations' licensing and employment processes.

Topic-based Multi-document Summarization Using Non-negative Matrix Factorization and K-means (비음수 행렬 분해와 K-means를 이용한 주제기반의 다중문서요약)

  • Park, Sun;Lee, Ju-Hong
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.4
    • /
    • pp.255-264
    • /
    • 2008
  • This paper proposes a novel method using K-means and Non-negative matrix factorization (NMF) for topic -based multi-document summarization. NMF decomposes weighted term by sentence matrix into two sparse non-negative matrices: semantic feature matrix and semantic variable matrix. Obtained semantic features are comprehensible intuitively. Weighted similarity between topic and semantic features can prevent meaningless sentences that are similar to a topic from being selected. K-means clustering removes noises from sentences so that biased semantics of documents are not reflected to summaries. Besides, coherence of document summaries can be enhanced by arranging selected sentences in the order of their ranks. The experimental results show that the proposed method achieves better performance than other methods.

Comparing Social Media and News Articles on Climate Change: Different Viewpoints Revealed

  • Kang Nyeon Lee;Haein Lee;Jang Hyun Kim;Youngsang Kim;Seon Hong Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.11
    • /
    • pp.2966-2986
    • /
    • 2023
  • Climate change is a constant threat to human life, and it is important to understand the public perception of this issue. Previous studies examining climate change have been based on limited survey data. In this study, the authors used big data such as news articles and social media data, within which the authors selected specific keywords related to climate change. Using these natural language data, topic modeling was performed for discourse analysis regarding climate change based on various topics. In addition, before applying topic modeling, sentiment analysis was adjusted to discover the differences between discourses on climate change. Through this approach, discourses of positive and negative tendencies were classified. As a result, it was possible to identify the tendency of each document by extracting key words for the classified discourse. This study aims to prove that topic modeling is a useful methodology for exploring discourse on platforms with big data. Moreover, the reliability of the study was increased by performing topic modeling in consideration of objective indicators (i.e., coherence score, perplexity). Theoretically, based on the social amplification of risk framework (SARF), this study demonstrates that the diffusion of the agenda of climate change in public news media leads to personal anxiety and fear on social media.

SNS Analysis Using LDA Topic Modeling (LDA 토픽 모델링을 활용한 SNS 분석)

  • Min-Soo Jang;Sun-Young Ihm
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.402-403
    • /
    • 2023
  • 본 연구의 목적은 LDA 토픽 모델링을 활용하여 한국어 SNS데이터에 분석을 통해 우리나라의 여가활동, 일과 직업, 주거와 생활의 동향을 살펴보는 것이다. AI Hub에서 제공하는 한국어 SNS데이터를 수집하고 형태소 분석, 전처리 과정을 거친 후 coherence score을 토대로 최적의 토픽 수를 결정하여 토픽을 추출하였다. 도출한 트렌드를 바탕으로 경영, 마케팅 분야에 미치는 영향을 예측할 수 있을 것으로 기대한다.