• Title/Summary/Keyword: Social big data analysis

Search Result 715, Processing Time 0.024 seconds

Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon (국내 주요 10대 기업에 대한 국민 감성 분석: 다범주 감성사전을 활용한 빅 데이터 접근법)

  • Kim, Seo In;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.45-69
    • /
    • 2016
  • Recently, sentiment analysis using open Internet data is actively performed for various purposes. As online Internet communication channels become popular, companies try to capture public sentiment of them from online open information sources. This research is conducted for the purpose of analyzing pulbic sentiment of Korean Top-10 companies using a multi-categorical sentiment lexicon. Whereas existing researches related to public sentiment measurement based on big data approach classify sentiment into dimensions, this research classifies public sentiment into multiple categories. Dimensional sentiment structure has been commonly applied in sentiment analysis of various applications, because it is academically proven, and has a clear advantage of capturing degree of sentiment and interrelation of each dimension. However, the dimensional structure is not effective when measuring public sentiment because human sentiment is too complex to be divided into few dimensions. In addition, special training is needed for ordinary people to express their feeling into dimensional structure. People do not divide their sentiment into dimensions, nor do they need psychological training when they feel. People would not express their feeling in the way of dimensional structure like positive/negative or active/passive; rather they express theirs in the way of categorical sentiment like sadness, rage, happiness and so on. That is, categorial approach of sentiment analysis is more natural than dimensional approach. Accordingly, this research suggests multi-categorical sentiment structure as an alternative way to measure social sentiment from the point of the public. Multi-categorical sentiment structure classifies sentiments following the way that ordinary people do although there are possibility to contain some subjectiveness. In this research, nine categories: 'Sadness', 'Anger', 'Happiness', 'Disgust', 'Surprise', 'Fear', 'Interest', 'Boredom' and 'Pain' are used as multi-categorical sentiment structure. To capture public sentiment of Korean Top-10 companies, Internet news data of the companies are collected over the past 25 months from a representative Korean portal site. Based on the sentiment words extracted from previous researches, we have created a sentiment lexicon, and analyzed the frequency of the words coming up within the news data. The frequency of each sentiment category was calculated as a ratio out of the total sentiment words to make ranks of distributions. Sentiment comparison among top-4 companies, which are 'Samsung', 'Hyundai', 'SK', and 'LG', were separately visualized. As a next step, the research tested hypothesis to prove the usefulness of the multi-categorical sentiment lexicon. It tested how effective categorial sentiment can be used as relative comparison index in cross sectional and time series analysis. To test the effectiveness of the sentiment lexicon as cross sectional comparison index, pair-wise t-test and Duncan test were conducted. Two pairs of companies, 'Samsung' and 'Hanjin', 'SK' and 'Hanjin' were chosen to compare whether each categorical sentiment is significantly different in pair-wise t-test. Since category 'Sadness' has the largest vocabularies, it is chosen to figure out whether the subgroups of the companies are significantly different in Duncan test. It is proved that five sentiment categories of Samsung and Hanjin and four sentiment categories of SK and Hanjin are different significantly. In category 'Sadness', it has been figured out that there were six subgroups that are significantly different. To test the effectiveness of the sentiment lexicon as time series comparison index, 'nut rage' incident of Hanjin is selected as an example case. Term frequency of sentiment words of the month when the incident happened and term frequency of the one month before the event are compared. Sentiment categories was redivided into positive/negative sentiment, and it is tried to figure out whether the event actually has some negative impact on public sentiment of the company. The difference in each category was visualized, moreover the variation of word list of sentiment 'Rage' was shown to be more concrete. As a result, there was huge before-and-after difference of sentiment that ordinary people feel to the company. Both hypotheses have turned out to be statistically significant, and therefore sentiment analysis in business area using multi-categorical sentiment lexicons has persuasive power. This research implies that categorical sentiment analysis can be used as an alternative method to supplement dimensional sentiment analysis when figuring out public sentiment in business environment.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

Understanding the Current Status of Research on Traditional Korean Medicine Treatment for the People with Disability and Suggestions for Further Research: Scoping Review (장애인 한의치료 연구의 현황 파악과 후속 연구에 대한 제언을 위한 Scoping Review)

  • Kwon, Miri;Lee, Jungmin;Kang, Doyoung;Jeon, Hyonjun;Kim, Suna;Kim, Mihyun;Lee, Shinhee;Jun, Hyungsun;Kang, Heeseol;Cheong, Moonjoo;Leem, Jungtae
    • Journal of Korean Medicine Rehabilitation
    • /
    • v.32 no.1
    • /
    • pp.89-106
    • /
    • 2022
  • Objectives In this study, a scoping review was conducted to inform decision-making related to traditional Korean medicine for people with disabilities in the future. Methods Seven databases were searched to find previous studies on traditional Korean medicine for people with disabilities. Studies published until August 2021 were considered. Using the methodology of scoping review, research on traditional Korean medicine for people with disabilities was reviewed with the following steps: 1) drawing research questions, 2) searching for related studies, 3) selecting studies, 4) extracting data, and 5) analyzing and reporting results. Results Out of 2,072 studies, 7 research papers and 10 reports were finally selected. The research papers included 5 cases studies, 1 survey study, and 1 chart review. Most studies used herbal medicine and acupuncture treatment, but the reports on the interventions were not detailed. The reports included policy studies, project performance guidelines, and project results reports, and most of the evaluation indicators tended to be standardized. Conclusions This study reviewed the literature on traditional Korean medicine for people with disabilities. It presents future directions for clinical research on traditional Korean medicine for people with disabilities and can be used to inform healthcare policies and clinical practice. In the future, quantitative research such as clinical trials, meta-analysis, and health insurance big data analysis is needed to understand the current status and effects of traditional Korean medicine for people with disabilities. In addition, qualitative research is necessary to identify unmet demands of traditional Korean medicine for people with disabilities.

Migrant Multi-Cultural Family Women's Life Quality Related to Oral Health: Survey in Dae-Gu (다문화가족 이주여성의 구강건강관련 삶의 질: 대구지역 조사)

  • Jeon, Eun-Suk;An, Seo-Young;Choi, Yeon-Hee
    • Journal of dental hygiene science
    • /
    • v.11 no.3
    • /
    • pp.181-187
    • /
    • 2011
  • This study conducted oral examinations and individual interviews on migrant multi-cultural family women in Daegu and measured their socio-demographic characters, oral health conditions and OHIP-14 in an aim to investigate the relevance between the oral health of migrant multi-cultural family women living in some big cities and their quality of life. Based on data finally collected from 189 women, the t-test, ANOVA and binary logistic regression analysis were conducted and the conclusions are as follows: The average number of decayed teeth was 2.23, loss teeth was 1.48, and treated teeth was 5.58. Women from the Philippines had more number of loss teeth than those from other countries, and women from China relatively had a small number of filled permanent teeth. The quality of life related to oral health was found to be poor in proportion to the number of loss teeth. A comparison of life quality related to oral health depending on loss teeth showed that life quality related to oral health was lowest in the areas of mental discomfort, physical ability decrease, mental ability decrease, social ability decrease and social disadvantage. Life quality related to oral health was found to be low in proportion to the number of permanent teeth with decay experience and poor monthly household income, which shows that the number of permanent teeth with decay experience and monthly income are mostly related to life quality related to oral health. As migrant multi-cultural family women's life quality related to oral health is low in proportion to the number of loss teeth and decayed teeth, it needs to develop a program to improve their oral healthrelated life quality and conduct follow-up research to verify its effect.

Recognition Effect of Cultural Contents : Focusing on Changes in Perception of Sexual Minority (문화콘텐츠의 인정 효과 : 성소수자에 대한 인식변화를 중심으로(1920-2017))

  • Lee, Hye-Mi;Ryu, Seoung-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.7
    • /
    • pp.84-94
    • /
    • 2018
  • This study analyzed domestic media articles from 1920 to 2017 using R 3.4, a big data analysis tool. In addition, it examines the sexual minority discourse reproduced through the media for about 100 years, focused on the role of the film as an art of struggling with the projective aversion to sexual minorities. sexual minorities in movies are not abominable. They are people we already know in our daily lives, and they are just different in sexual orientation. In general, sexual minorities are less likely to encounter in everyday life, so they are experienced and perceived through what the media present. It is noteworthy that the representation of sexual minorities in the media is formed as a major agenda of our society by publicizing the problems underlying society on the surface. It causes social issues to be raised by revealing and highlighting the problems that are regarded as alienated and avoided from the mainstream's gaze. The content provided by the media enables a three-dimensional experience of subjects who have not experienced it by themselves, and has a decisive influence in correctly recognizing and judging society. Media content suggests that it can be a powerful weapon of recognition struggle that can naturally fight against social hatred without using methods such as demonstrations or protests.

Effects of the Personality Traits of Baby Boomers on the Preparation Behaviors for the Old Age -Focused on the Cheon-An Industrial Complex's Workers- (베이비부머의 성격특성이 노후준비행동에 미치는 영향 -천안지역 공단 근로자를 중심으로-)

  • You, In-Soon;Choi, Soo-Il
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.4
    • /
    • pp.245-262
    • /
    • 2012
  • The purpose of this study is to find out the differences in personality traits and preparation level of the baby boomers for the old age by the demographic factors and to analyze the effect of Big 5 on preparation for the old age. The 331 questionnaires were distributed among workers born in 1955 to 1963 in the Chunan industrial complex. And the frequency analysis, factor analysis, reliability analysis, t-test, ANOVA, hierarchical regression analysis for the collected data were conducted with program SPSS 18.0 The findings are as follows. First, there is a partial difference in personality and preparation for the old age by the demographic factors. Second, extraversion, openness to experience and conscientiousness from personality have a positive effect on physical preparation for the old age while neuroticism has a negative effect on it. And there is no relationship between agreeableness and personality. Third, conscientiousness, openness to experience and extraversion have a positive effect on social preparation for the old age, but neuroticism and agreeableness do no affect it. Fourth, neuroticism,extraversion,openness to experience and conscientiousness have a positive effect on economic preparation for the old age, but agreeableness does not have any relationship with economic preparation.

Effects of Financial College Tuition Support by Korean Parents using a Hierarchical Bayes Model (계층적 베이즈 모형을 이용한 대학등록금에 대한 부모님의 경제적 지원 영향 분석)

  • Oh, Man-Suk;Oh, Hyun Sook;Oh, Min Jung
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.2
    • /
    • pp.267-280
    • /
    • 2013
  • College tuition is a significant economic, social, and political issue in Korea. We conduct a Bayesian analysis of a hierarchical model to address the factors related to college tuition based on a survey data collected by Statistics Korea. A binary response variable is selected depending on if more than 70% of tuition costs are supported by parents, and a hierarchical Probit model is constructed with areas as groups. A set of explanatory variables is selected from a factor analysis of available variables in the survey. A Markov chain Monte Carlo algorithm is used to estimate parameters. From the analysis results, income and stress are significantly related to college tuition support from parents. Parents with high income tend to support children's college tuition and students with parents' financial support tend to be mentally less stressed; subsequently, this shows that the economic status of parents significantly affects the mental health of college students. Gender, a healthy life style, and college satisfaction are not significant factors. Comparing areas in terms of the degrees of correlation between stress/income and tuition support from parents, students in Kangwon-do are the most mentally stressed when parents' support is limited; in addition, the positive correlation between parents support and income is stronger in big cities compared to provincial areas.

Analysis of Borrows Demand for Books in Public Libraries Considering Cultural Characteristics (문화적 특성을 고려한 공공도서관 도서 대출수요 분석 : 대구광역시 시립도서관을 사례로)

  • Oh, Min-Ki;Kim, Kyung-Rae;Jeong, Won-Oong;Kim, Keun-Wook
    • Journal of Digital Convergence
    • /
    • v.19 no.3
    • /
    • pp.55-64
    • /
    • 2021
  • Public libraries are a space where residents learn a wide range of knowledge and ideologies, and as they are directly connected to life, various related studies have been conducted. In most previous studies, variables such as population, traffic accessibility, and environment were found to be highly relevant to library use. In this study, it can be said that the difference from previous studies is that the book borrow demand and relevance were analyzed by reflecting the variables of cultural characteristics based on the book borrow history (1,820,407 cases) and member information (297,222 persons). As a result of the analysis, it was analyzed that as the increase in borrows for social science and literature books compared to technical science books, the demand for book borrows increased. In addition, various descriptive statistical analyzes were used to analyze the characteristics of library book borrow demand, and policy implications and limitations of the study were also presented based on the analysis results. and considering that cultural characteristics change depending on the location and time of day, it is believed that related research should be continued in the future.

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.