• Title/Summary/Keyword: latent dirichlet allocation

Search Result 208, Processing Time 0.026 seconds

Extraction of Satisfaction Factors and Evaluation of Tourist Attractions based on Travel Site Review Comments (여행 사이트 리뷰를 활용한 관광지 만족도 요인 추출 및 평가)

  • Cho, Suhyoun;Kim, Boseop;Park, Minsik;Lee, Gichang;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.43 no.1
    • /
    • pp.62-71
    • /
    • 2017
  • In order to attract foreign tourists, it is important to understand what factors on domestic tour spots are critically considered and how they are evaluated after visit. However, most of the researches on tour business have collected information from tourists through survey on a small number of tourists, which leads to inaccurate and biased conclusion. In this paper, we suggest a data-driven methodology to figure out tourists' satisfaction factors and estimate sentiment scores on them. To do so, we collected review comments data from popular web site. Latent dirichlet allocation is employed to extract key factors and elastic net is used to estimate sentiment scores. Then, an aggregated evaluation score is generated by combining the factors and the sentiment scores per topics. Our proposed method can be used to recommend travel schedules with themes and discover new spots.

Multi-Topic Sentiment Analysis using LDA for Online Review (LDA를 이용한 온라인 리뷰의 다중 토픽별 감성분석 - TripAdvisor 사례를 중심으로 -)

  • Hong, Tae-Ho;Niu, Hanying;Ren, Gang;Park, Ji-Young
    • The Journal of Information Systems
    • /
    • v.27 no.1
    • /
    • pp.89-110
    • /
    • 2018
  • Purpose There is much information in customer reviews, but finding key information in many texts is not easy. Business decision makers need a model to solve this problem. In this study we propose a multi-topic sentiment analysis approach using Latent Dirichlet Allocation (LDA) for user-generated contents (UGC). Design/methodology/approach In this paper, we collected a total of 104,039 hotel reviews in seven of the world's top tourist destinations from TripAdvisor (www.tripadvisor.com) and extracted 30 topics related to the hotel from all customer reviews using the LDA model. Six major dimensions (value, cleanliness, rooms, service, location, and sleep quality) were selected from the 30 extracted topics. To analyze data, we employed R language. Findings This study contributes to propose a lexicon-based sentiment analysis approach for the keywords-embedded sentences related to the six dimensions within a review. The performance of the proposed model was evaluated by comparing the sentiment analysis results of each topic with the real attribute ratings provided by the platform. The results show its outperformance, with a high ratio of accuracy and recall. Through our proposed model, it is expected to analyze the customers' sentiments over different topics for those reviews with an absence of the detailed attribute ratings.

A Development of LDA Topic Association Systems Based on Spark-Hadoop Framework

  • Park, Kiejin;Peng, Limei
    • Journal of Information Processing Systems
    • /
    • v.14 no.1
    • /
    • pp.140-149
    • /
    • 2018
  • Social data such as users' comments are unstructured in nature and up-to-date technologies for analyzing such data are constrained by the available storage space and processing time when fast storing and processing is required. On the other hand, it is even difficult in using a huge amount of dynamically generated social data to analyze the user features in a high speed. To solve this problem, we design and implement a topic association analysis system based on the latent Dirichlet allocation (LDA) model. The LDA does not require the training process and thus can analyze the social users' hourly interests on different topics in an easy way. The proposed system is constructed based on the Spark framework that is located on top of Hadoop cluster. It is advantageous of high-speed processing owing to that minimized access to hard disk is required and all the intermediately generated data are processed in the main memory. In the performance evaluation, it requires about 5 hours to analyze the topics for about 1 TB test social data (SNS comments). Moreover, through analyzing the association among topics, we can track the hourly change of social users' interests on different topics.

Falling Accidents Analysis in Construction Sites by Using Topic Modeling (토픽 모델링을 이용한 건설현장 추락재해 분석)

  • Ryu, Hanguk
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.7
    • /
    • pp.175-182
    • /
    • 2019
  • We classify topics on fall incidents occurring in construction sites using topic modeling among machine learning techniques and analyze the causes of the accidents according to each topic. In order to apply topic modeling based on latent dirichlet allocation, text data was preprocessed and evaluated with Perplexity score to improve the reliability of the model. The most common falling accidents happened to the daily workers belonging to small construction site. Most of the causes were not operated properly due to lack of safety equipment, inadequacy of arrangement and wearing, and low performance of safety equipment. In order to prevent and reduce the falling accidents, it is important to educate the daily workers of small construction site, arrange the workplace, and check the wearing of personal safety equipment and device.

Topic Modeling and Sentiment Analysis of Twitter Discussions on COVID-19 from Spatial and Temporal Perspectives

  • AlAgha, Iyad
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.1
    • /
    • pp.35-53
    • /
    • 2021
  • The study reported in this paper aimed to evaluate the topics and opinions of COVID-19 discussion found on Twitter. It performed topic modeling and sentiment analysis of tweets posted during the COVID-19 outbreak, and compared these results over space and time. In addition, by covering a more recent and a longer period of the pandemic timeline, several patterns not previously reported in the literature were revealed. Author-pooled Latent Dirichlet Allocation (LDA) was used to generate twenty topics that discuss different aspects related to the pandemic. Time-series analysis of the distribution of tweets over topics was performed to explore how the discussion on each topic changed over time, and the potential reasons behind the change. In addition, spatial analysis of topics was performed by comparing the percentage of tweets in each topic among top tweeting countries. Afterward, sentiment analysis of tweets was performed at both temporal and spatial levels. Our intention was to analyze how the sentiment differs between countries and in response to certain events. The performance of the topic model was assessed by being compared with other alternative topic modeling techniques. The topic coherence was measured for the different techniques while changing the number of topics. Results showed that the pooling by author before performing LDA significantly improved the produced topic models.

Topics and Sentiment Analysis Based on Reviews of Omni-Channel Retailing

  • KIM, Soon-Hong;YOO, Byong-Kook
    • Journal of Distribution Science
    • /
    • v.19 no.4
    • /
    • pp.25-35
    • /
    • 2021
  • Purpose: This study aims to analyze the factors affecting customer satisfaction in the customer reviews of omni-channel, posted on Internet blogs, cafes, and YouTube using text mining analysis. Research, data, and Methodology: In this study, frequency analysis is performed and the LDA (Latent Dirichlet Allocation) is used to analyze social big data to respond to reviewers' reaction to the recently opened omni-channel shopping reviews by L Shopping Company. Additionally, based on the topic analysis, we conduct a sentiment analysis on purchase reviews and analyze the characteristics of each topic on the positive or negative sentiments of omni-channel app users. Results: As a result of a topic analysis, four main topics are derived: delivery and events, economic value, recommendations and convenience, and product quality and brand awareness. The emotional analysis reveals that the reviewers have many positive evaluations for price policy and product promotion, but negative evaluations for app use, delivery, and product quality. Conclusions: Retailers can establish customized marketing strategies by identifying the customer's major interests through text mining analysis. Additionally, the analysis of sentiment by subject becomes an important indicator for developing products and services that customers want by identifying areas that satisfy customers and areas that evoke negative reactions.

Analyzing Technological Trends of Smart Factory using Topic Modeling

  • Hussain, Adnan;Kim, Chulhyun;Battsengel, Ganchimeg;Jeon, Jeonghwan
    • Asian Journal of Innovation and Policy
    • /
    • v.10 no.3
    • /
    • pp.380-403
    • /
    • 2021
  • Recently, smart factories have gained significant importance since the development of the fourth industrial revolution and the rise of global industrial competition. Therefore, the industries' survival to meet the global market trends requires accurate technological planning. Although, different works are available to investigate forecasting technologies and their influence on the smart factory. However, little significant work is available yet on the analysis of technological trends concerning the smart factory, which is the core focus herein. This work was performed to analyze the technological trends of the smart factory, followed by a detailed investigation of recent research hotspots/frontiers in the field. A well-known topic modeling technique, namely Latent Dirichlet Allocation (LDA), was employed for this study described above. The technological trends were further strengthened with the in-depth analysis of a smart factory-based case study. The findings produced the technological trends which possess significant potential in determining the technological strategies. Moreover, the results of this work may be helpful for researchers and enterprises in forecasting and planning future technological evolution.

A Study on the Perception of Public Value from Public Corporation in the Agricultural and Rural Sector - The Case of Korea Rural Community Corporation - (농업·농촌 부문 공기업의 공익적 가치 인식 연구 - 한국농어촌공사를 대상으로 -)

  • Lim, Che Hwan;Beom, Jin Woo;An, Dong Hwan;Yoo, Do il
    • Journal of Korean Society of Rural Planning
    • /
    • v.27 no.4
    • /
    • pp.83-96
    • /
    • 2021
  • This study analyzes the perception of public value created by Korea Rural Community Corporation, a representative public corporation in the agricultural and rural sector. We categorize agricultural and rural public values as 'stable food supply,' 'conservation of national environment and nature,' 'formation and cultivation of water resources,' 'prevention of soil loss and flooding,' 'conservation of ecological system,' 'conservation of rural tradition and culture.' For the qualitative analysis, we apply content analysis. And, for the quantitative analysis, we use topic modeling and Latent Dirichlet Allocation (LDA) analysis which is used widely in the field of text-mining. Results show that internal perception for value suppliers are mainly created for 'stable food supply,' 'formation and cultivation of water resources,' and 'conservation of rural tradition and culture.' External perception for value demanders are created for all public values, but its evaluation and demand include various aspects including both positive and negative opinions.

Analysis of Shipping and Logistics News Articles using Topic Modeling (토픽모델링을 활용한 해운물류 뉴스 분석)

  • Hee-Young Yoon;Il-Youp Kwak
    • Korea Trade Review
    • /
    • v.46 no.4
    • /
    • pp.61-76
    • /
    • 2021
  • This study focuses on three logistics-related news (Logistics Newspaper, Korea Shipping Gadget, and Korea Shipping Newspaper) in order to present changes in logistics issues, centering on Corona 19, which has recently had the greatest impact in the world. For data collection, two-year news articles in 2019 and 2020 (title, article, content, date, article classification, article URL) were collected through web crawling (using Python's BeautifulSoup, requests module) on the homepages of three representative logistics-related media companies. As for the data analysis methods, fundamental statistical analysis, Latent Dirichlet Allocation (LDA) for topic modeling, and Scattertext were performed. The analysis results were as follows. First, among the three news media related to logistics, the Korea Shipping Newspaper was carrying out the most active media activities. Second, through topic modeling with LDA, eight logistics-related topics were identified, and keywords and significant issues of each topic were presented. Third, the keywords were visually expressed through Scattertext. This is the first study to present changes in the logistics field, focusing on articles from representative logistics-related media in 2019 and 2020. In particular, 2019 and 2020 can be divided into before and after the outbreak of Corona 19, which has had a great impact not only on the logistics field but also on our lives as a whole. For future work, a multi-faceted approach is required, such as comparative studies of logistics issues between countries or presenting implications based on long-term time-series articles.

Aviation Safety Mandatory Report Topic Prediction Model using Latent Dirichlet Allocation (LDA) (잠재 디리클레 할당(LDA)을 이용한 항공안전 의무보고 토픽 예측 모형)

  • Jun Hwan Kim;Hyunjin Paek;Sungjin Jeon;Young Jae Choi
    • Journal of the Korean Society for Aviation and Aeronautics
    • /
    • v.31 no.3
    • /
    • pp.42-49
    • /
    • 2023
  • Not only in aviation industry but also in other industries, safety data plays a key role to improve the level of safety performance. By analyzing safety data such as aviation safety report (text data), hazard can be identified and removed before it leads to a tragic accident. However, pre-processing of raw data (or natural language data) collected from each site should be carried out first to utilize proactive or predictive safety management system. As air traffic volume increases, the amount of data accumulated is also on the rise. Accordingly, there are clear limitation in analyzing data directly by manpower. In this paper, a topic prediction model for aviation safety mandatory report is proposed. In addition, the prediction accuracy of the proposed model was also verified using actual aviation safety mandatory report data. This research model is meaningful in that it not only effectively supports the current aviation safety mandatory report analysis work, but also can be applied to various data produced in the aviation safety field in the future.