• Title/Summary/Keyword: Twitter Corpus

Search Result 14, Processing Time 0.027 seconds

Extracting Core Events Based on Timeline and Retweet Analysis in Twitter Corpus (트위터 문서에서 시간 및 리트윗 분석을 통한 핵심 사건 추출)

  • Tsolmon, Bayar;Lee, Kyung-Soon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.1 no.1
    • /
    • pp.69-74
    • /
    • 2012
  • Many internet users attempt to focus on the issues which have posted on social network services in a very short time. When some social big issue or event occurred, it will affect the number of comments and retweet on that day in twitter. In this paper, we propose the method of extracting core events based on timeline analysis, sentiment feature and retweet information in twitter data. To validate our method, we have compared the methods using only the frequency of words, word frequency with sentiment analysis, using only chi-square method and using sentiment analysis with chi-square method. For justification of the proposed approach, we have evaluated accuracy of correct answers in top 10 results. The proposed method achieved 94.9% performance. The experimental results show that the proposed method is effective for extracting core events in twitter corpus.

Developing a Sentiment Analysing and Tagging System (감성 분석 및 감성 정보 부착 시스템 구현)

  • Lee, Hyun Gyu;Lee, Songwook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.8
    • /
    • pp.377-384
    • /
    • 2016
  • Our goal is to build the system which collects tweets from Twitter, analyzes the sentiment of each tweet, and helps users build a sentiment tagged corpus semi-automatically. After collecting tweets with the Twitter API, we analyzes the sentiments of them with a sentiment dictionary. With the proposed system, users can verify the results of the system and can insert new sentimental words or dependency relations where sentiment information exist. Sentiment information is tagged with the JSON structure which is useful for building or accessing the corpus. With a test set, the system shows about 76% on the accuracy in analysing the sentiments of sentences as positive, neutral, or negative.

Company Name Discrimination in Tweets using Topic Signatures Extracted from News Corpus

  • Hong, Beomseok;Kim, Yanggon;Lee, Sang Ho
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.4
    • /
    • pp.128-136
    • /
    • 2016
  • It is impossible for any human being to analyze the more than 500 million tweets that are generated per day. Lexical ambiguities on Twitter make it difficult to retrieve the desired data and relevant topics. Most of the solutions for the word sense disambiguation problem rely on knowledge base systems. Unfortunately, it is expensive and time-consuming to manually create a knowledge base system, resulting in a knowledge acquisition bottleneck. To solve the knowledge-acquisition bottleneck, a topic signature is used to disambiguate words. In this paper, we evaluate the effectiveness of various features of newspapers on the topic signature extraction for word sense discrimination in tweets. Based on our results, topic signatures obtained from a snippet feature exhibit higher accuracy in discriminating company names than those from the article body. We conclude that topic signatures extracted from news articles improve the accuracy of word sense discrimination in the automated analysis of tweets.

Propensity Analysis of Political Attitude of Twitter Users by Extracting Sentiment from Timeline (타임라인의 감정추출을 통한 트위터 사용자의 정치적 성향 분석)

  • Kim, Sukjoong;Hwang, Byung-Yeon
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.1
    • /
    • pp.43-51
    • /
    • 2014
  • Social Network Service has the sufficient potential can be widely and effectively used for various fields of society because of convenient accessibility and definite user opinion. Above all Twitter has characteristics of simple and open network formation between users and remarkable real-time diffusion. However, real analysis is accompanied by many difficulties because of semantic analysis in 140-characters, the limitation of Korea natural language processing and the technical problem of Twitter is own restriction. This thesis paid its attention to human's political attitudes showing permanence and assumed that if applying it to the analytic design, it would contribute to the increase of precision and showed it through the experiment. As a result of experiment with Tweet corpus gathered during the election of national assemblymen on 11st April 2012, it could be known to be considerably similar compared to actual election result. The precision of 75.4% and recall of 34.8% was shown in case of individual Tweet analysis. On the other hand, the performance improvement of approximately 8% and 5% was shown in by-timeline political attitude analysis of user.

Construction and Evaluation of a Sentiment Dictionary Using a Web Corpus Collected from Game Domain (게임 도메인 웹 코퍼스를 이용한 감성사전 구축 및 평가)

  • Jeong, Woo-Young;Bae, Byung-Chull;Cho, Sung Hyun;Kang, Shin-Jin
    • Journal of Korea Game Society
    • /
    • v.18 no.5
    • /
    • pp.113-122
    • /
    • 2018
  • This paper describes an approach to building and evaluating a sentiment dictionary using a Web corpus in the game domain. To build a sentiment dictionary, we collected vocabulary based on game-related web documents from a domestic portal site, using the Twitter Korean Processor. From the collected vocabulary, we selected the words whose POS are tagged as either verbs or adjectives, and assigned sentiment score for each selected word. To evaluate the constructed sentiment dictionary, we calculated F1 score with precision and recall, using Korean-SWN that is based on English Senti-word Net(SWN). The evaluation results show that average F1 scores are 0.85 for adjectives and 0.77 for verbs, respectively.

Twitter Corpus Collection and Analysis (트위터 말뭉치 수집과 분석)

  • Yoo, Daehoon;Lee, Cheongjae;Kim, Seokhwan;Lee, Gary Geunbae
    • Annual Conference on Human and Language Technology
    • /
    • 2009.10a
    • /
    • pp.136-140
    • /
    • 2009
  • 최근 기존 블로그와 다른 마이크로 블로그의 한 종류로 트위터가 인터넷 상에서 화두로 대두되고 있다. 트위터는 기존 블로그나 미니홈피의 여러 가지 기능을 간소화하고 짧은 내용의 텍스트만을 올릴 수 있는 마이크로 블로그이다. 그런 이유로 트위터는 단순함과 즉시성이라는 고유의 특성을 가지고 일반적인 인터넷 이용자들에게 급속하게 알려지고 있다. 이러한 트위터를 분석하면 다양한 주제에 대해서 인터넷상의 대중들의 생각과 의견들을 알 수 있는 창구가 될 수 있다. 또한 다른 언어권 국가들의 트위터와 비교하면 양 국가간의 문화적 차이를 알 수 있다. 본 논문에서는 한국어 및 영어권 이용자들의 트위터 상의 메시지를 주제별, 목적별 등으로 분석하였다. 그 결과, 한국에서는 트위터 이용을 개인적인 생각을 적는 일기장으로 많이 사용되지만, 영어권 에서는 그 외에도 보도 자료나 광고등 여러 가지 목적으로 사용되고 있다는 것을 알 수 있다.

  • PDF

Extracting Core Event Feature Based on Timeline Analysis and Sentiment Feature in Twitter Corpus (트위터 자료의 시간별 분석과 감성 자질을 이용한 핵심 사건 추출)

  • Kim, Hui-Hwan;Tsolmon, Bayar;Lee, Kyung-Soon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.11a
    • /
    • pp.395-398
    • /
    • 2011
  • 트위터 사용자들은 어떠한 이슈에 대해 트위터를 통해 빠르고 간결하게 다른 사람들과의 지속적인 커뮤니케이션을 원하고, 이러한 특징은 이슈 별 사건에 따라 트윗 개수에 영향을 미치게 된다. 만약 어느 하나의 사회적 이슈에 대해 어떠한 사건이 일어나게 되면 그때의 트윗 개수는 폭발적으로 증가하게 된다. 본 논문에서는 이러한 특징을 이용하여 트위터 자료를 시간별로 분석하여 사건을 인식하고, 감성 자질과 카이제곱 값을 이용해 해당 날짜에 대한 핵심 사건을 추출한다.

A Crowdsourcing-based Emotional Words Tagging Game for Building a Polarity Lexicon in Korean (한국어 극성 사전 구축을 위한 크라우드소싱 기반 감성 단어 극성 태깅 게임)

  • Kim, Jun-Gi;Kang, Shin-Jin;Bae, Byung-Chull
    • Journal of Korea Game Society
    • /
    • v.17 no.2
    • /
    • pp.135-144
    • /
    • 2017
  • Sentiment analysis refers to a way of analyzing the writer's subjective opinions or feelings through text. For effective sentiment analysis, it is essential to build emotional word polarity lexicon. This paper introduces a crowdsourcing-based game that we have developed for efficiently building a polarity lexicon in Korean. First, we collected a corpus from the relating Internet communities using a crawler, and we classified them into words using the Twitter POS analyzer. These POS-tagged words are provided as a form of mobile platform based tagging game in which the players voluntarily tagged the polarities of the words, and then the result was collected into the database. So far we have tagged the polarities of about 1200 words. We expect that our research can contribute to the Korean sentiment analysis research especially in the game domain by collecting more emotional word data in the future.

An Exploratory Analysis of Online Discussion of Library and Information Science Professionals in India using Text Mining

  • Garg, Mohit;Kanjilal, Uma
    • Journal of Information Science Theory and Practice
    • /
    • v.10 no.3
    • /
    • pp.40-56
    • /
    • 2022
  • This paper aims to implement a topic modeling technique for extracting the topics of online discussions among library professionals in India. Topic modeling is the established text mining technique popularly used for modeling text data from Twitter, Facebook, Yelp, and other social media platforms. The present study modeled the online discussions of Library and Information Science (LIS) professionals posted on Lis Links. The text data of these posts was extracted using a program written in R using the package "rvest." The data was pre-processed to remove blank posts, posts having text in non-English fonts, punctuation, URLs, emails, etc. Topic modeling with the Latent Dirichlet Allocation algorithm was applied to the pre-processed corpus to identify each topic associated with the posts. The frequency analysis of the occurrence of words in the text corpus was calculated. The results found that the most frequent words included: library, information, university, librarian, book, professional, science, research, paper, question, answer, and management. This shows that the LIS professionals actively discussed exams, research, and library operations on the forum of Lis Links. The study categorized the online discussions on Lis Links into ten topics, i.e. "LIS Recruitment," "LIS Issues," "Other Discussion," "LIS Education," "LIS Research," "LIS Exams," "General Information related to Library," "LIS Admission," "Library and Professional Activities," and "Information Communication Technology (ICT)." It was found that the majority of the posts belonged to "LIS Exam," followed by "Other Discussions" and "General Information related to the Library."

Hate Speech Detection Using Modified Principal Component Analysis and Enhanced Convolution Neural Network on Twitter Dataset

  • Majed, Alowaidi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.112-119
    • /
    • 2023
  • Traditionally used for networking computers and communications, the Internet has been evolving from the beginning. Internet is the backbone for many things on the web including social media. The concept of social networking which started in the early 1990s has also been growing with the internet. Social Networking Sites (SNSs) sprung and stayed back to an important element of internet usage mainly due to the services or provisions they allow on the web. Twitter and Facebook have become the primary means by which most individuals keep in touch with others and carry on substantive conversations. These sites allow the posting of photos, videos and support audio and video storage on the sites which can be shared amongst users. Although an attractive option, these provisions have also culminated in issues for these sites like posting offensive material. Though not always, users of SNSs have their share in promoting hate by their words or speeches which is difficult to be curtailed after being uploaded in the media. Hence, this article outlines a process for extracting user reviews from the Twitter corpus in order to identify instances of hate speech. Through the use of MPCA (Modified Principal Component Analysis) and ECNN, we are able to identify instances of hate speech in the text (Enhanced Convolutional Neural Network). With the use of NLP, a fully autonomous system for assessing syntax and meaning can be established (NLP). There is a strong emphasis on pre-processing, feature extraction, and classification. Cleansing the text by removing extra spaces, punctuation, and stop words is what normalization is all about. In the process of extracting features, these features that have already been processed are used. During the feature extraction process, the MPCA algorithm is used. It takes a set of related features and pulls out the ones that tell us the most about the dataset we give itThe proposed categorization method is then put forth as a means of detecting instances of hate speech or abusive language. It is argued that ECNN is superior to other methods for identifying hateful content online. It can take in massive amounts of data and quickly return accurate results, especially for larger datasets. As a result, the proposed MPCA+ECNN algorithm improves not only the F-measure values, but also the accuracy, precision, and recall.