• Title/Summary/Keyword: 소셜 데이터 분석

Search Result 744, Processing Time 0.026 seconds

A Study on China's SNS Opinion Leader through Social Data (소셜 데이터를 통한 중국의 여론 주도층에 관한 연구)

  • Zheng, Xuan;Lee, Jooyoup
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.6 no.9
    • /
    • pp.59-70
    • /
    • 2016
  • The rapid development of the Chinese version of Twitter, the groom Weibo has become an important communication means for Chinese SNS users to obtain and share information. As a result, in China, the phenomenon of the power shift has emerged from the traditional opinion leaders to SNS opinion leasers. The relationship analysis of demographic variables of the Chinese SNS users and their Information on the relationship between keywords was made by utilizing the centrality analysis using Social Network Program NetMiner. China's SNS opinion leaders have general interest in daily activities with their families or friends rather than in social issues. And in case of SNS opinion leaders of high betweenness centrality, it was analyzed that general users was a key mediator role that organically out lead to the adjacent information. These properties are not independent of demographic variables, such as professional, therefore, the demographic characteristics of SNS opinion leaders showed a significant effect on the parameters of betweenness centrality. This study analyzed the characteristics of SNS users, especially opinion leaders in China by looking at the aspects of Chinese social phenomenon in terms of information. Through this study, we expect to provide basic information about the social characteristics of China through collective communication.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Evaluating Global Container Ports' Performance Considering the Port Calls' Attractiveness (기항 매력도를 고려한 세계 컨테이너 항만의 성과 평가)

  • Park, Byungin
    • Journal of Korea Port Economic Association
    • /
    • v.38 no.3
    • /
    • pp.105-131
    • /
    • 2022
  • Even after the improvement in 2019, UNCTAD's Liner Shipping Connectivity Index (LSCI), which evaluates the performance of the global container port market, has limited use. In particular, since the liner shipping connectivity index evaluates the performance based only on the distance of the relationship, the performance index combining the port attractiveness of calling would be more efficient. This study used the modified Huff model, the hub-authority algorithm and the eigenvector centrality of social network analysis, and correlation analysis for 2007, 2017, and 2019 data of Ocean-Commerce, Japan. The findings are as follows: Firstly, the port attractiveness of calling and the overall performance of the port did not always match. However, according to the analysis of the attractiveness of a port calling, Busan remained within the top 10. Still, the attractiveness among other Korean ports improved slowly from the low level during the study period. Secondly, Global container ports are generally specialized for long-term specialized inbound and outbound ports by the route and grow while maintaining professionalism throughout the entire period. The Korean ports continue to change roles from analysis period to period. Lastly, the volume of cargo by period and the extended port connectivity index (EPCI) presented in this study showed a correlation from 0.77 to 0.85. Even though the Atlantic data is excluded from the analysis and the ship's operable capacity is used instead of the port throughput volume, it shows a high correlation. The study result would help evaluate and analyze global ports. According to the study, Korean ports need a long-term strategy to improve performance while maintaining professionalism. In order to maintain and develop the port's desirable role, it is necessary to utilize cooperation and partnerships with the complimentary port and attract shipping companies' services calling to the complementary port. Although this study carried out a complex analysis using a lot of data and methodologies for an extended period, it is necessary to conduct a study covering ports around the world, a long-term panel analysis, and a scientific parameter estimation study of the attractiveness analysis.

The Effect of Health and Environmental Message Framing on Consumer Attitude and WoM: Focused on Vegan Product (건강과 환경 메시지 프레이밍에 따른 소비자 태도와 구전에 미치는 영향: 비건 제품을 중심으로)

  • Park, Seoyoung;Lim, Boram
    • Journal of Service Research and Studies
    • /
    • v.13 no.3
    • /
    • pp.127-146
    • /
    • 2023
  • Recently, digital advertising has shifted towards delivering messages through short ads of less than 15 seconds, and on social media, ads need to convey the message within 5 seconds before consumers skip them. Although the length of advertisements has decreased, advancements in artificial intelligence algorithms and big data analysis have made it possible to deliver personalized messages that cater to consumers' interests. In this changing landscape, the importance of delivering tailored messages through short and efficient ads is increasing. In this study, we examined the effects of message framing as part of effective message delivery. Specifically, we examined the differences in the effects of two framings, "health" and "environment," for vegan products. The growing consumer interest in health and the environment has elevated the interest in vegan products, and the vegan market is expanding rapidly. Consumers purchase vegan products not only for personal health benefits but also due to their ethical responsibility towards the environment, which can be considered ethical consumption. Previous research has not shown the differences in the effects between health and environment message framings, and the research has been limited to vegan food products. This study investigates the differences in the effects of health and environment message framings using a dish soap product category. By identifying which advertising messages, either health or environment, are more effective in promoting vegan products, this study provides insights for companies to enhance their message framing strategies effectively.

The Role of Content Services Within a Firm's Internet Service Portfolio: Case Studies of Naver Webtoon and Google YouTube (기업의 인터넷 서비스 포트폴리오 내 콘텐츠 서비스의 역할: 네이버 웹툰과 구글 유튜브의 사례 연구)

  • Choi, Jiwon;Cho, Wooje;Jung, Yoonhyuk;Kwon, YoungOk
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.1-28
    • /
    • 2022
  • In recent years, many Internet giants have begun providing their own content services, which attract online users by offering personalized services based on artificial intelligence technologies. This study investigates the role of two firms' content services within the firms' online service network. We examine the role of Naver Webtoon, which can be characterized as a professional-generated content, within Naver's service portfolio, and that of Google YouTube, which can be characterized as a user-generated content, within Google's service portfolio. Using survey data on viewers' use of the two services, we analyze a valued directed service network, where a node denotes an online service and a relationship between two nodes denotes a sequential use of two services. We found that both Webtoon and YouTube show higher out-degree centrality than in-degree centrality, which implies these content services are more likely to be starting services rather than arriving services within the firms' interactive network. The gap between the out-degree and in-degree centrality of YouTube is much smaller than that of Webtoon. The high centrality of YouTube, a user-generated content service, within the Google service network shows that YouTube's initial role of providing specific-content videos (e.g., entertainment) has expanded into a general search service for users.

Analysis of Behavioral Characteristics by Park Types Displayed in 3rd Generation SNS (제3세대 SNS에 표출된 공원 유형별 이용 특성 분석)

  • Kim, Ji-Eun;Park, Chan;Kim, Ah-Yeon;Kim, Ho Gul
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.47 no.2
    • /
    • pp.49-58
    • /
    • 2019
  • There have been studies on the satisfaction, preference, and post occupancy evaluation of urban parks in order to reflect users' preferences and activities, suggesting directions for future park planning and management. Despite using questionnaires that are proven to be affective to get users' opinions directly, there haven been limitations in understanding the latest changes in park use through questionnaires. This study seeks to address the possibility of utilizing the thirdgeneration SNS data, Instagram and Google, to compare behavior patterns and trends in park activities. Instagram keywords and photos representing user's feelings with a specific park name were collected. We also examined reviews, peak time, and popular time zones regarding selected parks through Google. This study tries to analyze users' behaviors, emerging activities, and satisfaction using SNS data. The findings are as follows. People using park near residential areas tend to enjoy programs being operated in indoor facilities and to like to use picnic places. In an adjacent park of commercial areas, eating in the park and extended areas beyond the park boundaries is found to be one of the popular park activities. Programs using open spaces and indoor facilities were active as well. Han River Park as a detached park type offers a popular venue for excercises and scenery appreciation. We also identified companionship characteristics of different park types from texts and photos, and extracted keywords of feelings and reviews about parks posted in $3^{rd}$ generation SNS. SNS data can provide basis to grasp behavioral patterns and satisfaction factors, and changes of park activities in real time. SNS data also can be used to set future directions in park planning and management in accordance with new technologies and policies.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

Analysis of News Agenda Using Text mining and Semantic Network Analysis: Focused on COVID-19 Emotions (텍스트 마이닝과 의미 네트워크 분석을 활용한 뉴스 의제 분석: 코로나 19 관련 감정을 중심으로)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.47-64
    • /
    • 2021
  • The global spread of COVID-19 around the world has not only affected many parts of our daily life but also has a huge impact on many areas, including the economy and society. As the number of confirmed cases and deaths increases, medical staff and the public are said to be experiencing psychological problems such as anxiety, depression, and stress. The collective tragedy that accompanies the epidemic raises fear and anxiety, which is known to cause enormous disruptions to the behavior and psychological well-being of many. Long-term negative emotions can reduce people's immunity and destroy their physical balance, so it is essential to understand the psychological state of COVID-19. This study suggests a method of monitoring medial news reflecting current days which requires striving not only for physical but also for psychological quarantine in the prolonged COVID-19 situation. Moreover, it is presented how an easier method of analyzing social media networks applies to those cases. The aim of this study is to assist health policymakers in fast and complex decision-making processes. News plays a major role in setting the policy agenda. Among various major media, news headlines are considered important in the field of communication science as a summary of the core content that the media wants to convey to the audiences who read it. News data used in this study was easily collected using "Bigkinds" that is created by integrating big data technology. With the collected news data, keywords were classified through text mining, and the relationship between words was visualized through semantic network analysis between keywords. Using the KrKwic program, a Korean semantic network analysis tool, text mining was performed and the frequency of words was calculated to easily identify keywords. The frequency of words appearing in keywords of articles related to COVID-19 emotions was checked and visualized in word cloud 'China', 'anxiety', 'situation', 'mind', 'social', and 'health' appeared high in relation to the emotions of COVID-19. In addition, UCINET, a specialized social network analysis program, was used to analyze connection centrality and cluster analysis, and a method of visualizing a graph using Net Draw was performed. As a result of analyzing the connection centrality between each data, it was found that the most central keywords in the keyword-centric network were 'psychology', 'COVID-19', 'blue', and 'anxiety'. The network of frequency of co-occurrence among the keywords appearing in the headlines of the news was visualized as a graph. The thickness of the line on the graph is proportional to the frequency of co-occurrence, and if the frequency of two words appearing at the same time is high, it is indicated by a thick line. It can be seen that the 'COVID-blue' pair is displayed in the boldest, and the 'COVID-emotion' and 'COVID-anxiety' pairs are displayed with a relatively thick line. 'Blue' related to COVID-19 is a word that means depression, and it was confirmed that COVID-19 and depression are keywords that should be of interest now. The research methodology used in this study has the convenience of being able to quickly measure social phenomena and changes while reducing costs. In this study, by analyzing news headlines, we were able to identify people's feelings and perceptions on issues related to COVID-19 depression, and identify the main agendas to be analyzed by deriving important keywords. By presenting and visualizing the subject and important keywords related to the COVID-19 emotion at a time, medical policy managers will be able to be provided a variety of perspectives when identifying and researching the regarding phenomenon. It is expected that it can help to use it as basic data for support, treatment and service development for psychological quarantine issues related to COVID-19.

Product Community Analysis Using Opinion Mining and Network Analysis: Movie Performance Prediction Case (오피니언 마이닝과 네트워크 분석을 활용한 상품 커뮤니티 분석: 영화 흥행성과 예측 사례)

  • Jin, Yu;Kim, Jungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.49-65
    • /
    • 2014
  • Word of Mouth (WOM) is a behavior used by consumers to transfer or communicate their product or service experience to other consumers. Due to the popularity of social media such as Facebook, Twitter, blogs, and online communities, electronic WOM (e-WOM) has become important to the success of products or services. As a result, most enterprises pay close attention to e-WOM for their products or services. This is especially important for movies, as these are experiential products. This paper aims to identify the network factors of an online movie community that impact box office revenue using social network analysis. In addition to traditional WOM factors (volume and valence of WOM), network centrality measures of the online community are included as influential factors in box office revenue. Based on previous research results, we develop five hypotheses on the relationships between potential influential factors (WOM volume, WOM valence, degree centrality, betweenness centrality, closeness centrality) and box office revenue. The first hypothesis is that the accumulated volume of WOM in online product communities is positively related to the total revenue of movies. The second hypothesis is that the accumulated valence of WOM in online product communities is positively related to the total revenue of movies. The third hypothesis is that the average of degree centralities of reviewers in online product communities is positively related to the total revenue of movies. The fourth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. The fifth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. To verify our research model, we collect movie review data from the Internet Movie Database (IMDb), which is a representative online movie community, and movie revenue data from the Box-Office-Mojo website. The movies in this analysis include weekly top-10 movies from September 1, 2012, to September 1, 2013, with in total. We collect movie metadata such as screening periods and user ratings; and community data in IMDb including reviewer identification, review content, review times, responder identification, reply content, reply times, and reply relationships. For the same period, the revenue data from Box-Office-Mojo is collected on a weekly basis. Movie community networks are constructed based on reply relationships between reviewers. Using a social network analysis tool, NodeXL, we calculate the averages of three centralities including degree, betweenness, and closeness centrality for each movie. Correlation analysis of focal variables and the dependent variable (final revenue) shows that three centrality measures are highly correlated, prompting us to perform multiple regressions separately with each centrality measure. Consistent with previous research results, our regression analysis results show that the volume and valence of WOM are positively related to the final box office revenue of movies. Moreover, the averages of betweenness centralities from initial community networks impact the final movie revenues. However, both of the averages of degree centralities and closeness centralities do not influence final movie performance. Based on the regression results, three hypotheses, 1, 2, and 4, are accepted, and two hypotheses, 3 and 5, are rejected. This study tries to link the network structure of e-WOM on online product communities with the product's performance. Based on the analysis of a real online movie community, the results show that online community network structures can work as a predictor of movie performance. The results show that the betweenness centralities of the reviewer community are critical for the prediction of movie performance. However, degree centralities and closeness centralities do not influence movie performance. As future research topics, similar analyses are required for other product categories such as electronic goods and online content to generalize the study results.

Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

  • Choi, Youji;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.155-175
    • /
    • 2017
  • As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.