Analyzing Dissatisfaction Factors of Weather Service Users Using Twitter and News Headlines

Kim, In-Gyum;Lee, Seung-Wook;Kim, Hye-Min;Lee, Dae-Geun;Lim, Byunghwan;

doi:10.5392/IJoC.2019.15.4.065

International Journal of Contents

Volume 15 Issue 4
/
Pages.65-73
/
2019
/
1738-6764(pISSN)
/
2093-7504(eISSN)

The Korea Contents Association (한국콘텐츠학회)

DOI QR Code

Analyzing Dissatisfaction Factors of Weather Service Users Using Twitter and News Headlines

Kim, In-Gyum (Future Strategy Research Team National Institute of Meteorological Sciences) ;
Lee, Seung-Wook (Future Strategy Research Team National Institute of Meteorological Sciences) ;
Kim, Hye-Min (Future Strategy Research Team National Institute of Meteorological Sciences) ;
Lee, Dae-Geun (Future Strategy Research Team National Institute of Meteorological Sciences) ;
Lim, Byunghwan (Observation and Forecast Research Division National Institute of Meteorological Sciences)

Received : 2019.07.23
Accepted : 2019.09.30
Published : 2019.12.28

https://doi.org/10.5392/IJoC.2019.15.4.065 Citation PDF KSCI HTML

Download PDF

⟨ Previous Next ⟩

Abstract

Social media is a massive dataset in which individuals' thoughts are freely recorded. So there have been a variety of efforts to analyze it and to understand the social phenomenon. In this study, Twitter was used to define the moments when negative perceptions of the Korean Meteorological Administration (KMA) were displayed and the reasons people were dissatisfied with the KMA. Machine learning methods were used for sentiment analysis to automatically train the implied awareness on Twitter which mentioned the KMA July-October 2011-2014. The trained models were used to validate sentiments on Twitter 2015-2016, and the frequency of negative sentiments was compared with the satisfaction of forecast users. It was found that the frequency of the negative sentiments increased before satisfaction decreased sharply. And the tweet keywords and the news headlines were qualitatively compared to analyze the cause of negative sentiments. As a result, it was revealed that the individual caused the increase in the monthly negative sentiments increase in 2016. This study represents the value of sentiment analysis that can complement user satisfaction surveys. Also, combining Twitter and news headlines provided the idea of analyzing the causes of dissatisfaction that are difficult to identify with only satisfaction surveys. The results contribute to improving user satisfaction with weather services by efficiently managing changes in satisfaction.

Keywords

1. INTRODUCTION

The Korea Meteorological Administration (KMA) is adopting a traditional survey method in order to understand the perception of forecast users. A survey is the general method for meteorological communities to attempt to expand their points of communication with users [1]-[2]. However, surveys cannot be done often due to their high cost and the fact that, even upon executing a survey, it is no easy task to pinpoint the exact cause of dissatisfaction. For example, incorrect forecasting related to particulate matter after April 2016 was presented as one of the main reasons for the drop in overall user satisfaction for that year [3]. However, the satisfaction score for the first half of the year, as surveyed in June, showed little difference from the first half of 2015 [4]. It is difficult to accept that sentiments of dissatisfaction caused by incorrect particulate matter forecasting in April were not yet discovered in June and were subsequently only reflected in the survey pertaining to the second half of the year.

According to the results of interviews with 25,000 Koreans by the KISA (Korea Internet and Security Agency), the ratio of internet users who accessed social media in the past year was 65.2%. 64.7% of those social media users used mobile devices, so they are not hindered by location, and the majority of them use their devices for very personal goals such as socialization and association (84.0%) and sharing personal interests like hobbies and leisure activities (49.1%) [5]. Because of this characteristic, having subjective opinions recorded in real-time, social media is widely used in a variety of fields to try to understand people's perceptions [6]-[8]. Much research has been conducted using Twitter as a corpus as well. On Twitter, tweets can only be written using 140 characters or less, so it has become known as useful for classifying the sentiments of text [8]-[13].

Manually categorizing opinions stemming from large amounts of data is almost impossible due to issues with time and cost. As such, there is a need for systems that can automatically sort through the sentiment of texts, which has led to the recent rise of sentiment analysis, also known as opinion mining [14]. Currently, this practice is expanding into a range of various fields. Examples include the business field, in which it is used to analyze the correlation between sentiments in tweets and the DJIA (Dow Jones Industrial Average) or establish corporate marketing strategies [11], [15], and the medical field, where it is used for breast cancer screening and communication between health care professionals and the general public [16], [17]. Kang and Park used sentiment analysis to evaluate customer satisfaction regarding mobile application services [18].

Methodologies of sentiment analysis are grouped somewhat differently depending on the researcher [19], [20]. Among those currently available, the supervised learning approach, in which the optimal functions of training data are inferred and a computer automatically distinguishes the sentiments of text through those inferred functions, is very common. Supervised learning includes Naive Bayes (NB), Support Vector Machines (SVM), and so on [21].

Lexicon acquisition is an essential phase in sentiment analysis. Two of the larger and more popular lexicons are SentiWordNet and GI [22], [23]. However, so far, researchers have generally created their own lexicons to improve the performance of sentiment analysis [24]. In this paper, a custom sentiment lexicon, extracted from tweets from July to October 2011-2014, was built manually. The constructed lexicon was used to automatically classify the sentiments of tweets during the period of 2015-2016. We accepted the Naive Bayes and Support Vector Machine classifiers for sentiment analysis. Temporal variability in negative tweets obtained by the sentiment analysis was compared to user satisfaction level, and we estimated whether the change in negative sentiment could be utilized to predict user satisfaction. Lastly, daily frequency of the analyzed sentiments was qualitatively compared with news headlines to investigate the background in which the negative tweet was written.

In section 2, we introduce the process of data collection, preprocessing, and the concept of sentiment analysis. Section 3 verifies the performance of the two classifiers, and identifies the daily frequency of negative sentiments and changes in user satisfaction. Additionally, we extracted news headlines representative of negative tone to estimate the cause of negative sentiment. Finally, section 4 concludes the paper. The purpose of this study was to examine changes in user perception through sentiment analysis using Twitter and to show that the result can be useful for decision makers in meteorological communities who need to quickly respond to reduced user satisfaction.

2. MATERIALS AND METHODOLOGIES

2.1 Data collection and preprocessing

E1CTBR_2019_v15n4_65_f0001.png 이미지

Fig. 1. Process of sentiment analysis

In supervised learning, sentiment analysis is performed in Phase 1. Tweets which mentioned the KMA (in actuality "기상청" in Korean) had been collected from 'search.twitter.com'. The collection period was from 15 July to 23 October in 2011-2014. The period was selected for the comparison with the satisfaction from KMA's survey results. And there is a trend of satisfaction decreasing in the latter half of the year compared to the first half in survey. So sentiment analysis focusing on the latter half of the year is needed first. The selected period connects the Korea's summer (June-August) and autumn (September-November), and there are frequent occurrences of severe weather such as rainy season (called as Changma in Korea), typhoons, and heat waves. Also it is the vacation season for ordinary citizens, so there is the possibility that sentiments fluctuations will change greatly. 41,476 tweets were collected over a total of 400 days, and this data was used as training data to construct the sentiment lexicon. 57,273 tweets from 1 January 2015 to 31 December 2016 were gathered and used them as validation data. Satisfaction data were extracted from the "Public Satisfaction Survey on National Weather Service" reports published in the latter half of 2015 and 2016 by KMA.

Phase 2. The three authors first shared rules which group 1,000 sample tweets into two sentiments: negative and neutral. Next, the remaining 40,476 tweets were coded with sentiment labels of 1 (negative) and 0 (neutral), and ultimately the sentiment label which was chosen by 2 or more of the 3 authors was decided upon. The result was the creation of a dataset with 8,888 negative tweets and 32,588 neutral tweets.

Phase 3. Unlike English, which is an isolating language, Korean is an agglutinative language in linguistic typology, and morphological analysis is importantly required as a preprocessing task of texts written in Korean for performing sentiment analysis. Morphological analysis examines the morphological structure of text such as root words, endings, parts of speech (POS), etc. A program which makes it possible to automatically perform morphological analysis is called a morphological analyzer. Of the open Korean morphological analyzers 'Hannanum' from the Korea Advanced Institute of Science and Technology (KAIST) is the main one [25], [26]. If tweet text is entered into Hannanum, it produces results which are divided into raw morphemes and POS. The reference dictionary used for morphological analysis was the 'NIADic' released by the National Information Society Agency (NIA). The R (version 3.4.0.) was used to Pre-processing and sentiment analysis, including morphological analysis of the collected Twitter. The process of calculating the sentiment index of the morphemes (phase 4) and analyzing sentiments through supervised learning (phase 5) is discussed in detail in the following section.

2.2 Sentiment analysis

2.2.1 Naive Bayes: Naive Bayes is a type of probability

classifier which sorts the categories of documents using the Bayes’ theorem, and it is used in the sentiment analysis field [27-28]. After training has been performed on the raw data, the conditional probability of the input vectors is calculated, and a classification into a distinguishing class, i.e. negative or neutral, is made. Tweets D is composed of multiple morphemes (e), as in Eq. (1).

\(\mathrm{D}=\mathrm{e}_{1}, \mathrm{e}_{2}, \mathrm{e}_{3}, \ldots \ldots, \mathrm{e}_{\mathrm{n}}\) (1)

Eq. (2) shows the conditional probability for determining which sentiment includes D, which refers to each tweet. In Eq. (2), C is the sentiment to be distinguished, and i is the ith sentiment. In this study, C is divided into negative and neutral sentiment.

\(\mathrm{P}\left(\mathrm{C}_{\mathrm{i}} | \mathrm{D}\right)=\mathrm{P}\left(\mathrm{C}_{\mathrm{i}} | \mathrm{e}_{1}, \mathrm{e}_{2}, \mathrm{e}_{3}, \ldots \ldots, \mathrm{e}_{\mathrm{n}}\right)\) (2)

Eq. (2) is converted into Eq. (3) through Bayes’ theorem.

\(\mathrm{P}\left(\mathrm{C}_{\mathrm{i}} | \mathrm{D}\right)=\frac{\mathrm{P}\left(\mathrm{D} | \mathrm{C}_{\mathrm{i}} \times \mathrm{P}\left(\mathrm{C}_{\mathrm{i}}\right)\right.}{\mathrm{P}(\mathrm{D})}\) (3)

Naive Bayes selects the sentiment with higher result values calculated by Eq. (3). In this process, P(D) can be omitted because it is the same without regard to sentiment. Naive Bayes, meanwhile, assumes that all the morphemes which make up the tweets are independent of each other. That is, the appearance of the e1 morpheme in a certain sentence does not influence the appearance of the e2 morpheme. Because of this assumption of independence between elements, Eq. (3) can be expressed as Eq. (4).

\(\mathrm{P}\left(\mathrm{C}_{\mathrm{i}} | \mathrm{D}\right)=\mathrm{P}\left(\mathrm{e}_{1} | \mathrm{C}_{\mathrm{i}}\right) \times \mathrm{P}\left(\mathrm{e}_{2} | \mathrm{C}_{\mathrm{i}}\right) \times \ldots \ldots \mathrm{P}\left(\mathrm{e}_{\mathrm{n}} | \mathrm{C}_{\mathrm{i}}\right)\) (4)

This is the conditional probability that the individual morpheme en is in the sentiment class Ci. However, Eq. (4) has two limitations. First, if a certain morpheme which appears in the validation data is not included in the training data, the conditional probability is calculated as 0. Second, when all the probability values that are less than zero are multiplied as in Eq. (4), if there are a lot of morphemes in the tweet, underflow occurs in which the result value is close to 0. To resolve these problems, Laplace smoothing and Log conversion were applied to Eq. (4). The method for calculating P(en|Ci) through Laplace smoothing is shown in Eq. (5).

\(\mathrm{P}\left(\mathrm{e}_{\mathrm{n}} | \mathrm{C}_{\mathrm{i}}\right)=\frac{\operatorname{count}\left(\mathrm{e}_{\mathrm{n}} \mathrm{C}_{\mathrm{i}}\right)+1}{\left|\mathrm{C}_{\mathrm{i}}\right|+|\mathrm{V}|}\) (5)

|Ci| is the number of all morphemes in the sentiment class Ci, and |V| is the number with duplicated morphemes excluded from |Ci| in Eq. (5). And by taking the log on both sides of Eq. (4), it is possible to prevent underflow in Eq. (6).

\(\begin{aligned} \log \mathrm{P}\left(\mathrm{C}_{\mathrm{i}} | \mathrm{D}\right)=& \log \left\{\mathrm{P}\left(\mathrm{e}_{1} | \mathrm{C}_{\mathrm{i}}\right)\right\}+\log \left\{\mathrm{P}\left(\mathrm{e}_{2} | \mathrm{c}_{\mathrm{i}}\right)\right\}+\cdots \cdots+\\ & \log \left\{\mathrm{P}\left(\mathrm{e}_{\mathrm{n}} | \mathrm{C}_{\mathrm{i}}\right)\right\}+\log \left\{\mathrm{P}\left(\mathrm{C}_{\mathrm{i}}\right)\right\} \end{aligned}\) (6)

The morpheme-analyzed words can be classified into those which appear in only negative or neutral tweets as in Fig. 1, and words which are categorized this way can be consist of an additional lexicon of Ac and Bc. If a specific word in the new lexicon is found in the input tweet, it can immediately be sorted into the corresponding sentiment, and if not, sentiment analysis is performed using sentiment index of morphemes made from A∩B in Fig. 1. Adding the new lexicon is a very simple way to supplement Naive Bayes, and Kim et al. (2016) found that the accuracy of sentiment analysis on training data was improved by 27.7 % by using the lexicon [29].

2.2.2 Support Vector Machine:

Support Vector Machine is a supervised learning method that is developed for the problem of recognizing two categories of patterns and data analysis [30]. This study follows a simple method introduced by Hong et al.

(2016) [31]. The frequency and malicious value (MV) of each of the morphemes extracted by Hannanum were calculated. The frequency is the number of tweets in which the morpheme appears, and the MV is the number of cases in which the tweet which contains the morpheme is of a negative sentiment. The frequency and MV can be calculated by Eq. (7) and (8), respectively. n is the total number of tweets.

Table 1. An accuracy of sentiment analysis using Naive Bayes and Support Vector Machine.

E1CTBR_2019_v15n4_65_t0001.png 이미지

\(\begin{array}{l} \text { frequency(i) }=\sum_{j=1}^{n} \text { Include(i,j) } \\ * \text { Include }(\mathrm{i}, \mathrm{j})=\left\{\begin{array}{c} 1 \text { (If morpheme is contained in tweet } \mathrm{j}) \\ 0 \text { (otherwise) } \end{array}\right. \end{array}\) (7)

\(\begin{array}{l} \operatorname{MV}(\mathrm{i})=\sum_{j=1}^{\mathrm{n}}(\text { include }(\mathrm{i}, \mathrm{j}) \times \mathrm{M}(\mathrm{j})) \\ * \mathrm{M}(\mathrm{j})=\left\{\begin{array}{l} 1 \text { (If tweet j is negative) } \\ 0 \text { (otherwise) } \end{array}\right. \end{array} \) (8)

The malicious index (MI) is calculated by using the frequency and MV. The MI is the value of the MV divided by frequency(i), and it has a value between 0 and 1. The closer it is to 1, the more often the morpheme was used in negative tweets.

The sentiment lexicon is constructed through the individual morphemes and the MI.

The analyzed morphemes are matched with the MI of the already constructed sentiment lexicon, and the malicious index of tweet (MT) is calculated for individual tweets. The MT is derived as the average of MVs as shown in Eq. (9). m is the total number of matches between the morphemes of the sentiment lexicon and the morphemes contained in the tweet that is being analyzed. The frequency and MT in each tweet are used as a parameter of Support Vector Machine.

\(\mathrm{MT}(\mathrm{i})=\frac{\sum_{\mathrm{j}=1}^{\mathrm{m}}\{\mathrm{match}(\mathrm{i}, \mathrm{j}) \times \mathrm{M} \mathrm{I}(\mathrm{j})\}}{\sum_{\mathrm{j}=1}^{\mathrm{m}} \mathrm{M}(\mathrm{j})}\) (9)

1 (If morpheme j is contained in sentiment lexicon)

0 (otherwise)

E1CTBR_2019_v15n4_65_f0002.png 이미지

Fig. 2. The number of positive and negative morphemes extracted from tweet data for 2011-2014

Table 1. An accuracy of sentiment analysis using Naive Bayes and Support Vector Machine

E1CTBR_2019_v15n4_65_t0001.png 이미지

\(\text { Precision }=\frac{t p}{t p+f p}\), \(\text { Recall }=\frac{t p}{t p+f n}\)

\(\mathrm{F}_{1}-\text { score }=2 \times \frac{\text { Precision } \times \text { Recall }}{\text { Precision }+\mathrm{Recall}} \)

3. RESULTS

3.1 Accuracy of sentiment analysis for training data

Table 1 shows the results of sentiment analysis on the training data. The precision, recall, and F1-score were calculated to evaluate the performance of Naive Bayes and Support Vector Machine. The precision and recall of Naive Bayes were calculated as 0.9021 and 0.9867, respectively. The results of Support Vector Machine were 0.8669 and 0.8137.

The F1-score, which was calculated assuming the weights of precision and recall are the same, was 0.9425 for Naive Bayes and 0.8395 for SVM. The F1-score has a value of 1 when precision and recall are perfect and 0 when the opposite is true. However, Naive Bayes doesn’t always exhibit better performance when analyzing the sentiments of validation data. This is because just one round of verification was performed for training data and morphemes, which were not included in the training data, can unexpectedly appear multiple times in the validation data. Thus, there is no guarantee as to the reliability of the developed model. Therefore, both methods were utilized to draw a conclusion. This helps overcome the weakness of using a single method.

3.2 Daily frequency of sentiments for 2015-2016

Fig. 3 shows the change of sentiments analyzed in the validation data. The vertical dark grey line presents the time when survey was conducted. Four rounds of surveys were conducted over 4-5 days in June and October for the years 2015 and 2016. The x-axis represents days and the y-axis represents the number of tweets classified as negative or neutral.

E1CTBR_2019_v15n4_65_f0003.png 이미지

Fig. 3. Change of sentiment for 2015-2016.

The above graphs show a results of machine learning mehods of (a) Naive Bayes and (b) Support Vector Machine (x-axis = days; y-axis = the number of tweets classified as negative and neutral sentiment; the vertical dark grey line is a survey period)

The performance of Support Vector Machine for the training data was lower than the Naive Bayes, so it is necessary to confirm the accuracy of the results of Fig. 3(b). To do this, a correlation analysis was performed with the number of daily negative tweets in the results of Fig. 3(a) and (b). The result of the Pearson coefficient was 0.9456 at a 95% confidence level, and the p-value was 2.2e-16, so it can be established that the negative tweets showed sufficiently similar patterns when analyzed utilizing the two methods.

3.3 Change of satisfaction and reliability

Fig. 4 shows the bi-yearly changes in satisfaction and reliability. Those factors are the respective responses to the questions "How satisfied (trusted) are you with the KMA's weather services in the last 6 months?" Respondents were asked to choose a point on a 7 point Likert scale from “very satisfied (trusted)” to “very dissatisfied (distrusted)”, and the evaluators calculated satisfaction and reliability through an arithmetic mean which assigned scores to the scale items selected by the individual respondents and divided them by all respondents. The score assigned to each scale item was between 0-100, with 0 being very dissatisfied (unreliable) and 100 being very satisfied (reliable). The difference in score for each scale interval was approximately 16.7 points.

In Fig. 4, it can be seen that both factors showed a score of over 70 in the first half of 2016 and that there were no large changes. But in the second half of 2016, satisfaction suddenly fell to 64.1 points. It is assumed that the negative sentiments in Fig. 3, which rapidly increased after the first half of 2016, acted as the cause of the drop in people's satisfaction. A detailed qualitative analysis needs to be performed for the days in which numerous negative sentiments were expressed.

E1CTBR_2019_v15n4_65_f0004.png 이미지

Fig. 4. Change in satisfaction and reliability scores among respondents 20 and 30-years of age

3.4 Negative sentiments and news headlines

Table 2 shows the 20 days that had the highest negative tweet frequency in 2016. Notably, the days shown in Table 2 are after June 20th, after the survey for the first half of 2016 was finished. Further analysis using news headlines was conducted for the duplicate days using both machine learning methods in Table 2.

Table 2. Result of top 20 days of negative sentiment frequency in 2016

E1CTBR_2019_v15n4_65_t0002.png 이미지

Table 3 shows the nouns that appeared frequently in tweets written on the duplicate days in Table 2 by month. News including the term 'KMA' was collected from ‘news.naver.com’ and headlines with a negative tone based on the words from Table 3 were searched. A total of 10,810 news items were collected. Representative samples are shown in Table 4. The news headlines in Table 4 do not include all contents with negative tone and, when possible, separate press was arranged to show that it was not the biased opinion of a minor press outlet.

The most searched terms for June 22nd were supercomputer, waste, tax, etc. (Table 3). However, among the 477 articles found on June 22nd, the only news related to supercomputers and KMA was a leading article from PPSS.

PPSS is a small to mid-sized media outlet which does not have the level of influence of the major press media. Despite this, the content written in the PPSS editorial was quoted multiple times in tweets. This led us to infer that negative awareness of Twitter users of the KMA was high. In addition, the last ten days of June was the period when the KMA's error regarding Changma predictions began to appear in the news. However, the period when multiple negative sentiments appeared on Twitter was around the end of July. This could be because this was the period when a public dispute began over the introduction of THAAD (Terminal High Altitude Area Defense) to Korea by the United States as a measure to oppose the Democratic People`s Republic of Korea, and issues were occurring which deserved more attention than the weather.

Table 3. High-frequency nouns extracted from tweets on the duplicate days in Table

E1CTBR_2019_v15n4_65_t0003.png 이미지

Table 4. News headlines with negative tone found from the words of Table 3

E1CTBR_2019_v15n4_65_t0004.png 이미지

Weather phenomenon in Korea such as heavy rains, heat waves, and tropical nights are concentrated in the period from June to August. In particular, the majority of local rainfall occurs during this period (Fig. 5). Precipitation deviations within the same region not only make forecasting more difficult, but may also negatively affect reliability for users. So the accuracy perceived by people who use a KMA forecast can be expected to be very low during summer. During the last ten days of July, rainfall forecast errors along with the appearance of a continued heat wave caused an increase in negative sentiments. Articles critical of the rainfall forecast errors on June 22nd may not have had a big effect by late July, but it is believed that, in a situation with a high discomfort index due to July's continued heat wave, the rainfall forecast had an effect on the sentiments of Twitter users arising from local inaccuracies. In Fig. 6, it can be seen that the maximum temperature exceeded 30 °C after the rainfall during July in Seoul.

E1CTBR_2019_v15n4_65_f0005.png 이미지

Fig. 5. Hourly precipitation in Seoul at 23:00 UTC on 28 July, Month Day keywords

E1CTBR_2019_v15n4_65_f0006.png 이미지

Fig. 6. Change of daily precipitation, maximum temperature, and average temperature from June to September in Seoul

The increase in negative sentiments in August was due to the KMA's predictions that the heat wave would ease off being incorrect several times. The KMA predicted that on August 16th the heat wave would cool down somewhat, but this date was pushed back several times to the 18th, 22nd, and 24th, which seemed to incur people's dissatisfaction. Ultimately, the rapid increase in negative sentiments in July and August were because of the severe heat wave that stretched over two months and the errors in the predictions that the heat wave would end as well as errors in local precipitation forecasts occurring at the same time. Table 5 shows that the heat was severe in summer of 2016 [32].

Table 5. The number of days of heat waves and tropical nights in summer since 1973.

E1CTBR_2019_v15n4_65_t0005.png 이미지

4. CONCLUSIONS

Understanding forecast users' attitudes is an important task and is needed so that weather communities can improve the usefulness of their services. In general, many researchers have been conducting surveys to analyze the attitudes of forecast users, but for this study, sentiment analysis on Twitter was performed. The main age group of those who use Twitter is people in their 20s and 30s, so they do not represent all forecast users. However, comprehending the awareness of those who have relatively low satisfaction is important because satisfaction toward the KMA improves as age increases. We used constructed lexicon to automatically classify the sentiments of tweets using Naïve Bayes and Support Vector Machine classifiers. Temporal variability in negative tweets obtained by the sentiment analysis was compared to user satisfaction level. And, daily frequency of the analyzed sentiments was qualitatively compared with news headlines. Based on the results, the key findings are as follows: (1) by analyzing the sentiments expressed on Twitter, decision makers can infer change of user attitude in near real time; (2) it is possible to reason the individual causes of negative sentiments by comparing negative sentiments with news headlines; and (3) in some cases, negative sentiments do not originate solely from major news media.

Satisfaction, which normally stayed above 70 points, dropped to 64.1 points in the survey for the second half of the year conducted in October 2016 and the negative sentiments had already started to rapidly increase in June. These show that the increased negative sentiments were reflected in the drop in satisfaction. The major causes of negative sentiments in the second half of 2016 were the purchase price of a supercomputer in June, heat wave and rainfall forecast errors in July, heat wave endpoint and rainfall forecast errors in August, and prediction errors for strong aftershocks in September. If the factors that cause negative sentiments appear in the future, rapid response will be required. Lastly, many people reacted to an editorial written by PPSS despite the fact that it is a very small-sized internet press outlet in Korea and not a leading media outlet. The article was about the purchase price of a supercomputer for the KMA. The KMA needs to investigate not only their services but also the elements that attract people’s attention and provide appropriate publicity for those elements.

The increases, causes, and sources of negative sentiments as mentioned above are difficult to analyze through surveys. This means that sentiment analysis via social media can be used as a beneficial complementary type of evaluation in conjunction with surveys in order to support decision making for public response and service improvement [33].

ACKNOWLEDGEMENT

This work was funded by the Korea Meteorological Administration Research and Development Program "Support to Use of Meteorological Information and Value Creation" under Grant (1365003084).

References

S. Joslyn and S. Savelli, "Communicating forecast uncertainty: public perception of weather forecast uncertainty," Meteorological Applications, vol. 17, no. 2, Jun. 2010, pp. 180-195. doi: https://doi.org/10.1002/met.190
S. Drobot, A. Anderson, C. Burghardt, and P. Pisano, "U.S. Public preferences for weather and road condition information," Bulletin of the American Meteorological Society, vol. 95, no. 6, Jun. 2014, pp. 849-859. doi: https://doi.org/10.1175/BAMS-D-12-00112.1
KMA, Public Satisfaction Survey on National Weather Service in 2016, Korea Meteorological Administration, 2016.
KMA, Public Satisfaction Survey on National Weather Service in 2015, Korea Meteorological Administration, 2015.
KISA, 2016 Survey on the Internet Usage Summary Report, Korea Internet & Security Agency, 2016.
C. Hutto and E. Gilbert, "VADER: A parsimonious rulebased model for sentiment analysis of social media text," Proc. 8th ICWSM, 2014, pp. 216-225.
E. Kouloumpis, T. Wilson, and J. Moore, "Twitter sentiment analysis: the good the bad and the OMG!," Proc. 5th ICWSM, 2011, pp. 538-541.
H. C. Rah, S. Park, M. Kim, Y. Cho, and K. H. Yoo, "Analysis of social network service data to estimate tourist interests in green tour activities," International Journal of Contents, vol. 14, no. 3, Sep. 2018, pp. 27-31. doi: https://doi.org/10.5392/IJoC.2018.14.3.027
A. Pak and P. Paroubek, "Twitter as a corpus for sentiment analysis and opinion mining," Proc. 7th LREC, 2010, pp. 19-21.
A. Go, L. Huang, and R. Bhayani, Twitter sentiment analysis, Final Projects Report from CS224N, The Stanford Natural Language Processing Group, 2009.
B. Jansen, M. Zhang, K. Sobel, and A. Chowdury, "Twitter power: tweets as electronic word of mouth," Journal of the American Society for Information Science and Technology, vol. 60, no. 11, Nov. 2009, pp. 2169-2188. doi: http://doi.org/10.1002/asi.21149
A. Agarwal, B. Xie, L. Vovsha, O. Rambow, and R. Passonneau, "Sentiment analysis of twitter data," Proc. LSM 2011, 2011, pp. 30-38.
S. Jalali and H. W. Park, "Conversations about open data on twitter," International Journal of Contents, vol. 13, no. 1, Mar. 2017, pp. 31-37. doi: https://doi.org/10.5392/IJoC.2017.13.1.031
M. Taboada, "Sentiment analysis: An overview from linguistics," Annual Reviews of Linguistics, vol. 2, Jan. 2016, pp. 325-347. doi: https://doi.org/10.1146/annurevlingustics-011415-040518
J. Bollen, H. Mao, and X. J. Zeng, "Twitter mood predicts the stock market," Journal of Computational Science, vol. 2, no. 1, Mar. 2011, pp. 1-8. doi: https://doi.org/10/1016/j.jocs.2010.12.007 https://doi.org/10.1016/j.jocs.2010.12.007
S. Nawaz, M. Bilal, M. I. Lali, M. R. Ui, W. Aslam, and S. Jajja, "Effectiveness of social media data in healthcare communication," Journal of Medical Imaging and Health Information, vol. 7, no. 6, Oct. 2017, pp. 1365-1371. doi: https://doi.org/10.1166/jmihi.2017.2148
K. Wong, F. Davis, S. Zaiane, and Y. Yasui, "Sentiment analysis of breast cancer screening in the United States using twitter," Proc. 8th KDIR, 2016, pp. 265-274.
D. Kang and Y. Park, "Review-based measurement of customer satisfaction in mobile service: Sentiment analysis and VIKOR approach," Expert Systems with Applications, vol. 41, no. 4, Mar. 2014, pp. 1041-1050. doi: https://doi.org/10.1016/j.eswa.2013.07.101
A. Kaur and V. Gupta, "A survey on sentiment analysis and opinion mining techniques," Journal of Emerging Technologies in Web Intelligence, vol. 5, no. 4, Nov. 2013, pp. 367-371. doi: https://doi.org/10.4304/jetwi.5.4.367-371
D. Maynard and A. Funk, "Automatic detection of political opinions in tweets," Proc. 8th ESWC, 2011, pp. 88-99.
W. Medhat, A. Hassan, and H. Korashy, "Sentiment analysis algorithms and applications: A survey," Ain Shams Engineering Journal, vol. 5, no. 4, Dec. 2014, pp. 1093-1113. doi: https://doi.org/10.1016/j.asej.2014.04.011
A. Agarwal, V. Sharma, G. Sikka, and R. Dhir, "Opinion mining of news headlines using SentiWordNet," Proc. Symposium on CDAN, 2016, p. 5.
P. J. Stone, D. C. Dynphy, M. S. Smith, and D. M. Ogilvie, The General Inquirer: A Computer Approach to Content Analysis, MIT Press, Cambridge, 1966.
C. S. G. Khoo and S. B. Johnkhan, "Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons", Journal of Information Science, vol. 44, no. 4, Apr. 2017, pp. 491-511. doi: https://doi.org/10.11770/0165551517703514
W. Lee, S. Kim, G. Kim, and K. Choi, "Implementation of modularized morphological analyzer," Proc. 11th HCLT, 1999, pp. 123-136.
H. N. Yeom, M. N. Hwang, M. Hwang, and H. Jung, "Study of machine-learning classifier and feature set selection for intent classification of Korean Tweets about Food Safety," Journal of Information Science Theory and Practice, vol. 2, no. 3, Sep. 2014, pp. 29-39. doi: https://doi.org/10.1633/JISTaP.2014.2.3.3
H. Kang and S. J. Yoo, "Senti-lexicon and improved Naive Bayes algorithms for sentiment analysis of restaurant reviews," Expert Systems with Applications, vol. 39, no. 5, Apr. 2012, pp. 6000-6010. doi: https://doi.org/10.1016/j.eswa.2011.11.107
L. Dhande and G. Patnaik, "Analyzing sentiment of movie review data using Naive Bayes neural classifier," International Journal of Emerging Trends & Technology in Computer Science, vol. 3, no. 4, Jul. 2014, pp. 313-320.
I. G. Kim, H. M. Kim, B. Lim, and K. K. Lee, "Relationship between result of sentiment analysis and user satisfaction - The case of Korean Meteorological Administration," The Korea Contents Association, vol. 16, no. 10, Oct. 2016, pp. 393-402. doi: https://doi.org/10.5392/JKCA.2016.10.393
C. Cortes and V. Vapnik, "Support vector networks," Machine Learning, vol. 20, no. 3, Sep. 1995, pp. 273-297. doi: https://doi.org/10.1023/A:102262741
J. Hong, S. Kim, J. Park, and J. Choi, "A Malicious comments detection technique on the internet using sentiment analysis and SVM," Journal of the Korea Institute of Information and Communication Engineering, vol. 20, no. 2, Feb. 2016, pp. 260-267. doi: http://dx.doi.org/10.6109/jkiice.2016.20.2.260
Interagency of Korea, 2016 Abnormal climate report, Korea Meteorological Administration ,2016.
W. Chamlertwat, P. Bhattarakosol, and T. Rungkasiri, "Discovering consumer insight from Twitter via sentiment analysis," Journal of Universal Computer Science, vol. 18, no. 8, Jan. 2012, pp. 973-992. doi: http://www.jucs.org/doi?doi=10.3217/jucs-018-08-0973

International Journal of Contents

Analyzing Dissatisfaction Factors of Weather Service Users Using Twitter and News Headlines

Abstract

Keywords

1. INTRODUCTION

2. MATERIALS AND METHODOLOGIES

2.1 Data collection and preprocessing

2.2 Sentiment analysis

2.2.1 Naive Bayes: Naive Bayes is a type of probability

2.2.2 Support Vector Machine:

3. RESULTS

3.1 Accuracy of sentiment analysis for training data

3.2 Daily frequency of sentiments for 2015-2016

3.3 Change of satisfaction and reliability

3.4 Negative sentiments and news headlines

4. CONCLUSIONS

ACKNOWLEDGEMENT

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)