• Title/Summary/Keyword: Text Data Analysis

Search Result 1,555, Processing Time 0.047 seconds

A Study on Popular Sentiment for Generation MZ: Through social media (SNS) sentiment analysis (MZ세대에 대한 대중감성 연구: 소셜미디어(SNS) 감성 분석을 통해)

  • Myung-suk Ann
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.1
    • /
    • pp.19-26
    • /
    • 2023
  • In this study, the public sensitivity of the 'MZ generation' was examined through the social media big data sensitivity analysis method. For the analysis, the consumer account SNS text was examined, and positive and negative emotional factors were presented by classifying external sensibilities and emotions of the MZ generation. In conclusion, the positive emotions of liking and interest in relation to the "MZ generation" were 72.1%, higher than the negative emotional ratio of 27.9%. In positive sensitivity, the older generation showed 'a favorable feeling for the individuality and dignifiedness of the MZ generation' and 'interest in the MZ generation with new values'. In contrast, the MZ generation has a favorable feeling for 'the fact that they are a generation of their own boldness, youthfulness and individuality' and 'small growthism'. Negative sensitivity outside the MZ generation was found to be 'A concern about the marriage avoidance, employment difficulties, debt investment, and resignation trends of the MZ generation', 'Hate the MZ generation who treats Kkondae' and 'Difficult to talk to the MZ generation'. On the other hand, the negative emotions felt by the MZ generation itself were 'Rejection of generalization', 'Rejection of generation and gender conflicts', 'Rejection of competition worse than the older generation', 'Relative failure of the rich era', and 'Sadness to live in a predicted climate disaster'. Therefore, the older generation should not look at the MZ generation in general, but as individuals, and should alleviate conflicts with intergenerational understanding and empathy. there is a need for community consideration to solve generational conflicts, gender conflicts, and environmental problems.

Exploring ESG Activities Using Text Analysis of ESG Reports -A Case of Chinese Listed Manufacturing Companies- (ESG 보고서의 텍스트 분석을 이용한 ESG 활동 탐색 -중국 상장 제조 기업을 대상으로-)

  • Wung Chul Jin;Seung Ik Baek;Yu Feng Sun;Xiang Dan Jin
    • Journal of Service Research and Studies
    • /
    • v.14 no.2
    • /
    • pp.18-36
    • /
    • 2024
  • As interest in ESG has been increased, it is easy to find papers that empirically study that a company's ESG activities have a positive impact on the company's performance. However, research on what ESG activities companies should actually engage in is relatively lacking. Accordingly, this study systematically classifies ESG activities of companies and seeks to provide insight to companies seeking to plan new ESG activities. This study analyzes how Chinese manufacturing companies perform ESG activities based on their dynamic capabilities in the global economy and how they differ in their activities. This study used the ESG annual reports of 151 Chinese manufacturing listed companies on the Shanghai & Shenzhen Stock Exchange and ESG indicators of China Securities Index Company (CSI) as data. This study focused on the following three research questions. The first is to determine whether there are any differences in ESG activities between companies with high ESG scores (TOP-25) and companies with low ESG scores (BOT-25), and the second is to determine whether there are any changes in ESG activities over a 10-year period (2010-2019), focusing only on companies with high ESG scores. The results showed that there was a significant difference in ESG activities between high and low ESG scorers, while tracking the year-to-year change in activities of the top-25 companies did not show any difference in ESG activities. In the third study, social network analysis was conducted on the keywords of E/S/G. Through the co-concurrence matrix technique, we visualized the ESG activities of companies in a four-quadrant graph and set the direction for ESG activities based on this.

A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier (영화 리뷰 감성분석을 위한 텍스트 마이닝 기반 감성 분류기 구축)

  • Kim, Yuyoung;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.71-89
    • /
    • 2016
  • Sentiment analysis is used for identifying emotions or sentiments embedded in the user generated data such as customer reviews from blogs, social network services, and so on. Various research fields such as computer science and business management can take advantage of this feature to analyze customer-generated opinions. In previous studies, the star rating of a review is regarded as the same as sentiment embedded in the text. However, it does not always correspond to the sentiment polarity. Due to this supposition, previous studies have some limitations in their accuracy. To solve this issue, the present study uses a supervised sentiment classification model to measure a more accurate sentiment polarity. This study aims to propose an advanced sentiment classifier and to discover the correlation between movie reviews and box-office success. The advanced sentiment classifier is based on two supervised machine learning techniques, the Support Vector Machines (SVM) and Feedforward Neural Network (FNN). The sentiment scores of the movie reviews are measured by the sentiment classifier and are analyzed by statistical correlations between movie reviews and box-office success. Movie reviews are collected along with a star-rate. The dataset used in this study consists of 1,258,538 reviews from 175 films gathered from Naver Movie website (movie.naver.com). The results show that the proposed sentiment classifier outperforms Naive Bayes (NB) classifier as its accuracy is about 6% higher than NB. Furthermore, the results indicate that there are positive correlations between the star-rate and the number of audiences, which can be regarded as the box-office success of a movie. The study also shows that there is the mild, positive correlation between the sentiment scores estimated by the classifier and the number of audiences. To verify the applicability of the sentiment scores, an independent sample t-test was conducted. For this, the movies were divided into two groups using the average of sentiment scores. The two groups are significantly different in terms of the star-rated scores.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

    • Park, Jiae;Cho, Yoonho
      • Journal of Intelligence and Information Systems
      • /
      • v.22 no.3
      • /
      • pp.143-163
      • /
      • 2016
    • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

    A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification (한국표준산업분류를 기준으로 한 문서의 자동 분류 모델에 관한 연구)

    • Lee, Jae-Seong;Jun, Seung-Pyo;Yoo, Hyoung Sun
      • Journal of Intelligence and Information Systems
      • /
      • v.24 no.3
      • /
      • pp.221-241
      • /
      • 2018
    • As we enter the knowledge society, the importance of information as a new form of capital is being emphasized. The importance of information classification is also increasing for efficient management of digital information produced exponentially. In this study, we tried to automatically classify and provide tailored information that can help companies decide to make technology commercialization. Therefore, we propose a method to classify information based on Korea Standard Industry Classification (KSIC), which indicates the business characteristics of enterprises. The classification of information or documents has been largely based on machine learning, but there is not enough training data categorized on the basis of KSIC. Therefore, this study applied the method of calculating similarity between documents. Specifically, a method and a model for presenting the most appropriate KSIC code are proposed by collecting explanatory texts of each code of KSIC and calculating the similarity with the classification object document using the vector space model. The IPC data were collected and classified by KSIC. And then verified the methodology by comparing it with the KSIC-IPC concordance table provided by the Korean Intellectual Property Office. As a result of the verification, the highest agreement was obtained when the LT method, which is a kind of TF-IDF calculation formula, was applied. At this time, the degree of match of the first rank matching KSIC was 53% and the cumulative match of the fifth ranking was 76%. Through this, it can be confirmed that KSIC classification of technology, industry, and market information that SMEs need more quantitatively and objectively is possible. In addition, it is considered that the methods and results provided in this study can be used as a basic data to help the qualitative judgment of experts in creating a linkage table between heterogeneous classification systems.

    Explicating Personal Health Informatics Experience (퍼스널 헬스케어 디바이스 사용자 경험 연구)

    • Shin, Dong-Hee;Cho, Hoyoun
      • The Journal of the Korea Contents Association
      • /
      • v.17 no.1
      • /
      • pp.550-566
      • /
      • 2017
    • Recent advances in wearable devices and quantified-self movement increase the number of personal informatics application that may cause an concern to health industry and user. In this light, the goal of this study is to identify more effective ways of design and evaluation of personal informatics application for self-tracking and delivering health information to users. For this goal, this study conducted areal-world study that processes such that user can assess, be aware of, and self-reflect on their data and behavior activity. In doing so, this study aims to determine the psychological effects of forms of health feedback (comparative vs. non-comparative) and presentation modes (text vs. image) on users' tendencies toward health conservation. Results from a between-subjects experiment revealed that health information in a comparative and textual format was more effective in encouraging health conservation in participants than identical information presented in a non-comparative and image format. In addition, participants' level of health consciousness emerged as a significant predictor. Through this analysis of quantitative data and inferences, this study make a number of contributions to the user affordance research and its methodology of health informatics study and designing personal informatics application that support user's behavior change in various contexts.

    Development and Effectiveness of a Smoking Preventive Program for Elementary Students (초등학생을 위한 흡연예방 프로그램의 개발 및 효과에 관한 연구)

    • Lee, Eun-Hye;Kim, Il-Ok
      • The Journal of Korean Academic Society of Nursing Education
      • /
      • v.9 no.2
      • /
      • pp.264-275
      • /
      • 2003
    • The purpose of this study were to develop a smoking preventive education program for elementary students and evaluate it's effectiveness. This study was a quasi experimental study under the nonequivalent control group with pretest-posttest design. The subjects of this study were 62 who are attending elementary school(31 for each group), 2 different district elementary school. The subjects were matched by grade, similar in anti-smoking educational background of smoking, as well as their residence and income level of their families. The instruments used in this study was 18 criterion referenced test items modeled by Dick & Carey that were developed by researchers for evaluating the subjects' knowledge and attitude about smoking. A pretest was administered a week before treatment The program given to the experimental group is composed of the texts explaining the poisonous substances in tobacco, social and cultural harmfulness of smoking to the body and psychology, indirect smoking, smoking of pregnant women, motives of smoking, refusal skills of smoking; and for the subjects' understanding and the better results of study - pictures, role play, discussion, text through computer based multi-media, puzzle searching for hidden pictures, cross-word puzzle, and finally compensation. The data were collected for 50 days form mid- September to the end of October in the year of 2000, composed of formative evaluation, pre-test and summative evaluation via 2 sessions. Accordingly, the collected data were analysed by t-test, paired t-test, repeated measure ANOVA by the SAS program. This research summarize the findings as follows; 1. There was a significant difference in knowledge between the experimental group(after 1 wks t=10.4680, p=.0001; after 4 wks t= 9.310, p=.0001) and control group(after 1 wks t=0.0420, p= .9669; after 4 wks t= -0.378 p=.7079) in between the results of 1 and 4 week after education in summative evaluation (F=27.45, P=.0001). 2. There was non statistical significant difference in attitude between the experimental group (after 1 wks t=1.2292, p=0.2286 ; after 4 wks t=1.330, p=0.1935) and control group (after 1 wks t=0.1819, p=0.8569 ; after 4 wks t=0.2970, p=0.7685) in between the results of 1 and 4 week after education in summative evaluation(F=0.71, P=0.494). To sum up, the statistics of conclusive analysis evaluative for the children under school age of the 'knowledge acquisition' about smoking harmfulness. On the other hand, as there was already sound attitude about smoking, the evaluation of attitude was non significant difference between control group and experimental group, just there was partially significant difference.

    • PDF

    Influential Factors on Text Readability of Self-guided Interpretive Signs (자기안내식(自己案內式) 해설판(解說板) 글자의 가독성(可讀性)에 영향(影響)을 미치는 요인(要因)들)

    • Kim, Sang-Oh
      • Journal of Korean Society of Forest Science
      • /
      • v.94 no.6
      • /
      • pp.362-369
      • /
      • 2005
    • Readability, an indicator measuring the easiness of reading letters, is an important element that determines the communicative effectiveness of self-guided signs. This study examined how the letter design elements of self-guided signs influence on readability to provide basic information for more effective sign designs. Data were collected from August to November of 2003 at a self-guided trail of Naejangsan National Park, Korea. A total of 375 subjects participated in the questionnaire survey, and 94.7% of them were used for data analysis. Among the total of 19 attributes, five attributes such as number of letters, number of type styles, ratio of picture area on the signs, space between letters, type size influenced on readability. These five attributes explained 50.0% of the variation in readability. The number of letters was the most influential attributes on readability, followed by the number of type styles, ratio of picture area on the signs, space between letters, and type size. The effectiveness of signs may be efficiently increased by managing these five major attributes with more concern.

    The Correlation between Social Media and the Behaviors of the Supreme Court in Korea (소셜미디어와 대법원 판결의 상관 관계에 대한 분석)

    • Heo, Junhong;Seo, Yeeun;Lee, Seoyeong;Lee, Sang-Yong Tom
      • Knowledge Management Research
      • /
      • v.22 no.3
      • /
      • pp.31-53
      • /
      • 2021
    • As a communication channel for individuals, social media is affecting various areas such as business, economy, politics, and society. One of the less-studied areas is the law. Therefore, this study collected various information from social media and analyzed its impacts on the legal decisions, especially the Supreme Court decisions in Korea. This study was conducted by compiling information from Internet news articles and public responses. We found that when the negative reactions from the public got higher, the trial duration until the supreme court making the final decisions became shorter. However, we were not able to find the significant relationship between social media reactions and dismissal of appeal nor annulment. Our study would contribute to the information systems and knowledge management research in a sense that the social analytics is applied to the area of legal decisions, instead of using conventional qualitative study methodology. Our study is also meaningful to the practitioners because that big data analytical business can be applied to the field of law by creating a new database for the emerging legal technology. Finally, law makers can think of a better way to standardize the legal decision process to minimize the reverse effects from social media.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.