• Title/Summary/Keyword: 트위터 데이터

Search Result 229, Processing Time 0.024 seconds

A Reply Graph-based Social Mining Method with Topic Modeling (토픽 모델링을 이용한 댓글 그래프 기반 소셜 마이닝 기법)

  • Lee, Sang Yeon;Lee, Keon Myung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.6
    • /
    • pp.640-645
    • /
    • 2014
  • Many people use social network services as to communicate, to share an information and to build social relationships between others on the Internet. Twitter is such a representative service, where millions of tweets are posted a day and a huge amount of data collection has been being accumulated. Social mining that extracts the meaningful information from the massive data has been intensively studied. Typically, Twitter easily can deliver and retweet the contents using the following-follower relationships. Topic modeling in tweet data is a good tool for issue tracking in social media. To overcome the restrictions of short contents in tweets, we introduce a notion of reply graph which is constructed as a graph structure of which nodes correspond to users and of which edges correspond to existence of reply and retweet messages between the users. The LDA topic model, which is a typical method of topic modeling, is ineffective for short textual data. This paper introduces a topic modeling method that uses reply graph to reduce the number of short documents and to improve the quality of mining results. The proposed model uses the LDA model as the topic modeling framework for tweet issue tracking. Some experimental results of the proposed method are presented for a collection of Twitter data of 7 days.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

An Analysis of Image Use in Twitter Message (트위터 상의 이미지 이용에 관한 분석)

  • Chung, EunKyung;Yoon, JungWon
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.24 no.4
    • /
    • pp.75-90
    • /
    • 2013
  • Given the context that users are actively using social media with multimedia embedded information, the purpose of this study is to demonstrate how images are used within Twitter messages, especially in influential and favorited messages. In order to achieve the purpose of this study, the top 200 influential and favorited messages with images were selected out of 1,589 tweets related to "Boston bombing" in April 2013. The characteristics of the message, image use, and user are analyzed and compared. Two phases of the analysis were conducted on three data sets containing the top 200 influential messages, top 200 favorited messages, and general messages. In the first phase, coding schemes have been developed for conducting three categorical analyses: (1) categorization of tweets, (2) categorization of image use, and (3) categorization of users. The three data sets were then coded using the coding schemes. In the second phase, comparison analyses were conducted among influential, favorited, and general tweets in terms of tweet type, image use, and user. While messages expressing opinion were found to be most favorited, the messages that shared information were recognized as most influential to users. On the other hand, as only four image uses - information dissemination, illustration, emotive/persuasive, and information processing - were found in this data set, the primary image use is likely to be data-driven rather than object-driven. From the perspective of users, the user types such as government, celebrity, and photo-sharing sites were found to be favorited and influential. An improved understanding of how users' image needs, in the context of social media, contribute to the body of knowledge of image needs. This study will also provide valuable insight into practical designs and implications of image retrieval systems or services.

A Model for Nowcasting Commodity Price based on Social Media Data (소셜 데이터 기반 실시간 식자재 물가 예측 모형)

  • Kim, Jaewoo;Cha, Meeyoung;Lee, Jong Gun
    • Journal of KIISE
    • /
    • v.44 no.12
    • /
    • pp.1258-1268
    • /
    • 2017
  • Capturing real-time daily information on food prices is invaluable to help policymakers and development organizations address food security problems and improve public welfare. This study analyses the possible use of large-scale online data, available due to growing Internet connectivity in developing countries, to provide updates on food security landscape. We conduct a case study of Indonesia to develop a time-series prediction model that nowcasts daily food prices for four types of food commodities that are essential in the region: beef, chicken, onion and chilli. By using Twitter price quotes, we demonstrate the capability of social data to function as an affordable and efficient proxy for traditional offline price statistics.

A Study on the Data Collection and Storage of Big Data Systems (빅데이터 시스템의 데이터 수집 및 저장에 관한 연구)

  • Park, Jihun;Kim, Gyunghwan;Jung, Eunsu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.48-51
    • /
    • 2017
  • 빅데이터는 저장되지 않았거나 저장되더라도 분석되지 못하고 버리게 되는 방대한 양의 데이터를 말한다. 실제로도 빅데이터는 페이스북, 트위터등의 소셜 네트워크에서 많이 발생하고 있는데, 이러한 방대한 데이터들을 어떻게 효율적으로 저장하고 분석하는지에 대한 관심이 많아지고 있다. 따라서 본 논문에서는 빅데이터의 개념, 빅데이터의 향후 동향과 이슈들에 대해 살펴보고, 빅데이터 시스템이 데이터를 수집하고 저장하는 것에 대한 고려할만한 사항들과 효율적인 해결방안에 대해 제시하였다.

TRED : Twitter based Realtime Event-location Detector (트위터 기반의 실시간 이벤트 지역 탐지 시스템)

  • Yim, Junyeob;Hwang, Byung-Yeon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.8
    • /
    • pp.301-308
    • /
    • 2015
  • SNS is a web-based online platform service supporting the formation of relations between users. SNS users have usually used a desktop or laptop for this purpose so far. However, the number of SNS users is greatly increasing and their access to the web is improving with the spread of smart phones. They share their daily lives with other users through SNSs. We can detect events if we analyze the contents that are left by SNS users, where the individual acts as a sensor. Such analyses have already been attempted by many researchers. In particular, Twitter is used in related spheres in various ways, because it has structural characteristics suitable for detecting events. However, there is a limitation concerning the detection of events and their locations. Thus, we developed a system that can detect the location immediately based on the district mentioned in Twitter. We tested whether the system can function in real time and evaluated its ability to detect events that occurred in reality. We also tried to improve its detection efficiency by removing noise.

Predicting Movie Success based on Machine Learning Using Twitter (트위터를 이용한 기계학습 기반의 영화흥행 예측)

  • Yim, Junyeob;Hwang, Byung-Yeon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.7
    • /
    • pp.263-270
    • /
    • 2014
  • This paper suggests a method for predicting a box-office success of the film. Lately, as the growth of the film industry, a variety of studies for the prediction of market demand is being performed. The product life cycle of film is relatively short cultural goods. Therefore, in order to produce stable profits, marketing costs before opening as well as the number of screen after opening need a plan. To fulfill this plan, the demand for the product and the calculation of economic profit scale should be preceded. The cases of existing researches, as a variable for predicting, primarily use the factors of competition of the market or the properties of the film. However, the proportion of the potential audiences who purchase the goods is relatively insufficient. Therefore, in this paper, in order to consider people's perception of a movie, Twitter was utilized as one of the survey samples. The existing variables and the information extracted from Twitter are defined as off-line and on-line element, and applied those two elements in machine learning by combining. Through the experiment, the proposed predictive techniques are validated, and the results of the experiment predicted the chance of successful film with about 95% of accuracy.

A Collecting Model of Public Opinion on Social Disaster in Twitter: A Case Study in 'Humidifier Disinfectant' (사회적 재난에 대한 트위터 여론 수렴 모델: '가습기 살균제' 사건을 중심으로)

  • Park, JunHyeong;Ryu, Pum-Mo;Oh, Hyo-Jung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.4
    • /
    • pp.177-184
    • /
    • 2017
  • The abstract should concisely state what was done, how it was done, principal results, and their significance. It should be less than 300 words for all forms of publication. Recently social disasters have been occurring frequently in the increasing complicated social structure, and the scale of damage has also become larger. Accordingly, there is a need for a way to prevent further damage by rapidly responding to social disasters. Twitter is attracting attention as a countermeasure against disasters because of immediacy and expandability. Especially, collecting public opinion on Twitter can be used as a useful tool to prevent disasters by quickly responding. This study proposes a collecting method of Twitter public opinion through keyword analysis, issue topic tweet detection, and time trend analysis. Furthermore we also show the feasibility by selecting the case of humidifier disinfectant which is a social issue recently.

A Critical Review on Social Media Campaign Studies: Trends and Issues (소셜미디어 선거캠페인 연구 동향과 쟁점)

  • Chang, Woo-young
    • Informatization Policy
    • /
    • v.26 no.1
    • /
    • pp.3-24
    • /
    • 2019
  • This study examined the trends and issues of social media campaign studies from three aspects-campaign strategy, institutional environment regulating the social media, and political effect. Then, this study performed an empirical analysis on the case of the 20th general election in order to discuss the political effect, which has been analyzed the least. Specifically, this study empirically examined the trends of candidates' participation in the twitter campaign, the partial mobilization and voter response, and the platform effect on the election results. The study examined all of the candidates' twitter accounts and traffic and found the following results.-first, the number of participants in the twitter campaign increased significantly compared to the 19th general election, and the campaign was dominated by only two political parties that had more power to mobilize resources; second, it was clearly identified that twitter is a partisan media. where specifically, those in the mainstream of the Democratic Party mobilized much more supporters; and lastly, the twitter campaign has a positive impact on the increase in the rate of votes and chances of winning the election. Particularly, the number of followers and the duration of activities were found statistically meaningful, proving that promotion of networking and social capital is more important in election campaigns.

Automatic Classification of Malicious Usage on Twitter (트위터 상의 악의적 이용 자동분류)

  • Kim, Meen Chul;Shim, Kyu Seung;Han, Nam Gi;Kim, Ye Eun;Song, Min
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.47 no.1
    • /
    • pp.269-286
    • /
    • 2013
  • The advent of Web 2.0 and social media is taking a leading role of emerging big data. At the same time, however, informational dysfunction such as infringement of one's rights and violation of social order has been increasing sharply. This study, therefore, aims at defining malicious usage, identifying malicious feature, and devising an automated method for classifying them. In particular, the rule-based experiment reveals statistically significant performance enhancement.