• Title/Summary/Keyword: Information Category

Search Result 1,583, Processing Time 0.032 seconds

Participation Level in Online Knowledge Sharing: Behavioral Approach on Wikipedia (온라인 지식공유의 참여정도: 위키피디아에 대한 행태적 접근)

  • Park, Hyun Jung;Lee, Hong Joo;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.97-121
    • /
    • 2013
  • With the growing importance of knowledge for sustainable competitive advantages and innovation in a volatile environment, many researches on knowledge sharing have been conducted. However, previous researches have mostly relied on the questionnaire survey which has inherent perceptive errors of respondents. The current research has drawn the relationship among primary participant behaviors towards the participation level in knowledge sharing, basically from online user behaviors on Wikipedia, a representative community for online knowledge collaboration. Without users' participation in knowledge sharing, knowledge collaboration for creating knowledge cannot be successful. By the way, the editing patterns of Wikipedia users are diverse, resulting in different revisiting periods for the same number of edits, and thus varying results of shared knowledge. Therefore, we illuminated the participation level of knowledge sharing from two different angles of number of edits and revisiting period. The behavioral dimensions affecting the level of participation in knowledge sharing includes the article talk for public discussion and user talk for private messaging, and community registration, which are observable on Wiki platform. Public discussion is being progressed on article talk pages arranged for exchanging ideas about each article topic. An article talk page is often divided into several sections which mainly address specific type of issues raised during the article development procedure. From the diverse opinions about the relatively trivial things such as what text, link, or images should be added or removed and how they should be restructured to the profound professional insights are shared, negotiated, and improved over the course of discussion. Wikipedia also provides personal user talk pages as a private messaging tool. On these pages, diverse personal messages such as casual greetings, stories about activities on Wikipedia, and ordinary affairs of life are exchanged. If anyone wants to communicate with another person, he or she visits the person's user talk page and leaves a message. Wikipedia articles are assessed according to seven quality grades, of which the featured article level is the highest. The dataset includes participants' behavioral data related with 2,978 articles, which have reached the featured article level, with editing histories of articles, their article talk histories, and user talk histories extracted from user talk pages for each article. The time period for analysis is from the initiation of articles until their promotion to the featured article level. The number of edits represents the total number of participation in the editing of an article, and the revisiting period is the time difference between the first and last edits. At first, the participation levels of each user category classified according to behavioral dimensions have been analyzed and compared. And then, robust regressions have been conducted on the relationships among independent variables reflecting the degree of behavioral characteristics and the dependent variable representing the participation level. Especially, through adopting a motivational theory adequate for online environment in setting up research hypotheses, this work suggests a theoretical framework for the participation level of online knowledge sharing. Consequently, this work reached the following practical behavioral results besides some theoretical implications. First, both public discussion and private messaging positively affect the participation level in knowledge sharing. Second, public discussion exerts greater influence than private messaging on the participation level. Third, a synergy effect of public discussion and private messaging on the number of edits was found, whereas a pretty weak negative interaction effect of them on the revisiting period was observed. Fourth, community registration has a significant impact on the revisiting period, whereas being insignificant on the number of edits. Fifth, when it comes to the relation generated from private messaging, the frequency or depth of relation is shown to be more critical than the scope of relation for the participation level.

A Case Study on Universal Dependency Tagsets (다국어 범용 의존관계 주석체계(Universal Dependencies) 적용 연구 - 한국어와 일본어의 비교를 중심으로)

  • Han, Jiyoon;Lee, Jin;Lee, Chanyoung;Kim, Hansaem
    • Cross-Cultural Studies
    • /
    • v.53
    • /
    • pp.163-192
    • /
    • 2018
  • The purpose of this paper was to examine universal dependency UD application cases of Korean and Japanese with similar morphological characteristics. In addition, UD application and improvement methods of Korean were examined through comparative analysis. Korean and Japanese are very well developed due to their agglutinative characteristics. Therefore, there are many difficulties to apply UD which is built around English refraction. We examined the application of UPOS and DEPREL as components of UD with discussions. In UPOS, we looked at category problem related to narrative such as AUX, ADJ, and VERB, We examined how to handle units. In relation to the DEPREL annotation system, we discussed how to reflect syntactic problem from the basic unit annotation of syntax tags. We investigated problems of case and aux arising from the problem of setting dominant position from Korean and Japanese as the dominant language. We also investigated problems of annotation of parallel structure and setting of annotation basic unit. Among various relation annotation tags, case and aux are discussed because they show the most noticeable difference in distribution when comparing annotation tag application patterns with Korean. The case is related to both Korean and Japanese surveys. Aux is a secondary verb in Korean and an auxiliary verb in Japanese. As a result of examining specific annotation patterns, it was found that Japanese aux not only assigned auxiliary clauses, but also auxiliary elements to add the grammatical meaning to the verb and form corresponding to the end of Korean. In UD annotation of Japanese, the basic unit of morphological analysis is defined as a unit of basic syntactic annotation in Japanese UD annotation. Thus, when using information, it is necessary to consider how to use morphological analysis unit as information of dependency annotation in Korean.

Multi-Category Sentiment Analysis for Social Opinion Related to Artificial Intelligence on Social Media (소셜 미디어 상에서의 인공지능 관련 사회적 여론에 대한 다 범주 감성 분석)

  • Lee, Sang Won;Choi, Chang Wook;Kim, Dong Sung;Yeo, Woon Young;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.51-66
    • /
    • 2018
  • As AI (Artificial Intelligence) technologies have been swiftly evolved, a lot of products and services are under development in various fields for better users' experience. On this technology advance, negative effects of AI technologies also have been discussed actively while there exists positive expectation on them at the same time. For instance, many social issues such as trolley dilemma and system security issues are being debated, whereas autonomous vehicles based on artificial intelligence have had attention in terms of stability increase. Therefore, it needs to check and analyse major social issues on artificial intelligence for their development and societal acceptance. In this paper, multi-categorical sentiment analysis is conducted over online public opinion on artificial intelligence after identifying the trending topics related to artificial intelligence for two years from January 2016 to December 2017, which include the event, match between Lee Sedol and AlphaGo. Using the largest web portal in South Korea, online news, news headlines and news comments were crawled. Considering the importance of trending topics, online public opinion was analysed into seven multiple sentimental categories comprised of anger, dislike, fear, happiness, neutrality, sadness, and surprise by topics, not only two simple positive or negative sentiment. As a result, it was found that the top sentiment is "happiness" in most events and yet sentiments on each keyword are different. In addition, when the research period was divided into four periods, the first half of 2016, the second half of the year, the first half of 2017, and the second half of the year, it is confirmed that the sentiment of 'anger' decreases as goes by time. Based on the results of this analysis, it is possible to grasp various topics and trends currently discussed on artificial intelligence, and it can be used to prepare countermeasures. We hope that we can improve to measure public opinion more precisely in the future by integrating empathy level of news comments.

A Study on the Characteristics of Intentional Self-Poisoning Patients : Comparison between Non-Prescription and Prescription Drugs (일반의약품과 전문의약품 의도적 음독 자살 시도자 특성 분석 연구)

  • Cho, Eulah;Cho, Ji Hyun;Jho, Kyeng Hyeng;Sim, Hyun-Bo
    • Korean Journal of Psychosomatic Medicine
    • /
    • v.28 no.2
    • /
    • pp.116-125
    • /
    • 2020
  • Objectives : Self-poisoning is the leading cause of visits to the emergency departments after a suicide attempts. This study is aimed to compare the patient characteristics according to the category of drugs ingested by the patients who attempted suicide. Methods : All medical charts were retrospectively reviewed from patients who visited the emergency center, at Seoul Medical Center, due to intentional self-poisoning from April of 2011 to July of 2019. We investigated the information regarding the subtype and quantity of the intoxication drug, how it was obtained, suicidal history, and psychiatric history, as well as, sociodemographic information. Variables were compared between prescription drug (PD) and non-prescription drug (NPD) poisoning groups. Results : The mean age of the NPD poisoning group was significantly lower than that of the PD poisoning group. The patient ratio of those enrolled in national health insurance and living with spouses were significantly higher in the NPD poisoning group. Compared to the NPD poisoning group, the PD poisoning group had a higher incidence of mental illnesses, underlying diseases and ratio of involuntary visit to the emergency department. Among the prescription drugs, the benzodiazepine poisoning group had a higher rate of self-prescription than the non-poisoning group, while the zolpidem poisoning group had a higher rate of the using someone else's prescription than other drugs. Each single drug poisoning group (benzodiazepine, zolpidem, and antidepressant single-agent) had a higher rate of no mental illness than each of the mixed-poisoning group. Conclusions : Guidelines for regulating non-prescription drugs are needed as a matter of suicide prevention. Also, this study suggests that clinicians need to be careful when issuing prescriptions and should suicidal risk according to patients' characteristics, duration of follow-up and type of drug packaging.

Understanding Public Opinion by Analyzing Twitter Posts Related to Real Estate Policy (부동산 정책 관련 트위터 게시물 분석을 통한 대중 여론 이해)

  • Kim, Kyuli;Oh, Chanhee;Zhu, Yongjun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.56 no.3
    • /
    • pp.47-72
    • /
    • 2022
  • This study aims to understand the trends of subjects related to real estate policies and public's emotional opinion on the policies. Two keywords related to real estate policies such as "real estate policy" and "real estate measure" were used to collect tweets created from February 25, 2008 to August 31, 2021. A total of 91,740 tweets were collected and we applied sentiment analysis and dynamic topic modeling to the final preprocessed and categorized data of 18,925 tweets. Sentiment analysis and dynamic topic model analysis were conducted for a total of 18,925 posts after preprocessing data and categorizing them into supply, real estate tax, interest rate, and population variance. Keywords of each category are as follows: the supply categories (rental housing, greenbelt, newlyweds, homeless, supply, reconstruction, sale), real estate tax categories (comprehensive real estate tax, acquisition tax, holding tax, multiple homeowners, speculation), interest rate categories (interest rate), and population variance categories (Sejong, new city). The results of the sentiment analysis showed that one person posted on average one or two positive tweets whereas in the case of negative and neutral tweets, one person posted two or three. In addition, we found that part of people have both positive as well as negative and neutral opinions towards real estate policies. As the results of dynamic topic modeling analysis, negative reactions to real estate speculative forces and unearned income were identified as major negative topics and as for positive topics, expectation on increasing supply of housing and benefits for homeless people who purchase houses were identified. Unlike previous studies, which focused on changes and evaluations of specific real estate policies, this study has academic significance in that it collected posts from Twitter, one of the social media platforms, used emotional analysis, dynamic topic modeling analysis, and identified potential topics and trends of real estate policy over time. The results of the study can help create new policies that take public opinion on real estate policies into consideration.

The Effect of Users' Personality on Emotional and Cognitive Evaluation in UCC Web Site Usage (UCC(user-created-contents) 웹 사이트에서 사용자의 인성이 감정적, 인지적 평가와 UCC 활용에 미치는 영향)

  • Moon, Yun-Ji;Kang, So-Ra;Kim, Woo-Gon
    • Asia pacific journal of information systems
    • /
    • v.20 no.3
    • /
    • pp.167-190
    • /
    • 2010
  • The research conducted here focuses on the effect of factors that affect the behavior of UCC (User Created Content) website users, other than user's rational recognition of how useful a UCC website can be. Most discussions in the existing literature on information systems have focused on users' evaluation how a UCC website can help to attain the users' own goals. However, there are other factors and this research pays attention to an individual's 'personality,' which is stable and biological in nature. Specifically, I have noted here that 'extroversion' and 'neuroticism,' the two common personality factors presented in Eysenck's most representative 'EPQ Model' and 'Big Five Model,' are the two personality factors that affect a site's 'usefulness,' by this I mean how useful does the user consider the website and its content. How useful a site is considered by the user is the other factor that has been regarded as the antecedent factor that influences the adoption of information systems in the existing MIS (Management Information System) research. Secondly, as using or creating a UCC website does not guarantee the user's or the creator's extrinsic motivation, unlike when using the information system within an organization, there is a greater likelihood that the increase in user's activities in relation to a UCC website is motivated by emotional factors rather than rational factors. Thus, I have decided to include the relationship between an individual's personality and what they find pleasurable in the research model. Thirdly, when based on the S-O-R Paradigm of Mehrabian and Russell, the two cognitive factors and emotional factors are finally affected by stimulus, and thus these factors ultimately have an effect on an individual's respondent behavior. Therefore, this research has presented an assumption that the recognition of how useful the site and content is and what emotional pleasure it provides will finally affect the behavior of the UCC website users. Finally, the relationship between the recognition of how useful a site is and how pleasurable it is to useand UCC usage may differ depending on certain situational conditions. In other words, the relationship between the three factors may vary according to how much users are involved in the creation of the website content. Creation thus emerges as the keyword of UCC. I analyzed the above relationships through the moderating variable of the user's involvement in the creation of the site. The research result shows the following: When it comes to the relationship between an individual's personality and what they find pleasurable it is extroverted users who have a greater likelihood to feel pleasure when using a UCC website, as was expected in this research. This in turn leads to a more active usage of the UCC web site because a person who is an extrovert likes to spend time on activities with other people, is sensitive to new experiences and stimuli and thus actively responds to these. An extroverted person accepts new UCC activities as part of his/her social life, rather than getting away from this new UCC environment. This is represented by the term 'Foxonomy' where the users meet a variety of users from all over the world and contact new types of content created by these users. However, neuroticism creates the opposite situation to that created by extroversion. The representative symptoms of neuroticism are instability, stress, and tension. These dispositions are more closely related to stress caused by a new environment rather than this creatingcuriosity or pleasure. Thus, neurotic persons have an uneasy feeling and will eventually avoid the situation where their own or others' daily lives are frequently exposed to the open web environment, this eventually makes them have a negative attitude towards the web environment. When it comes to an individual's personality and how useful site is, the two personality factors of extroversion and neuroticism both have a positive relationship with the recognition of how useful the site and its content is. The positive, curious, and social dispositions of extroverted persons tend to make them consider the future usefulness and possibilities of a new type of information system, or website, based on their positive attitude, which has a significant influence on the recognition of how useful these UCC sites are. Neuroticism also favorably affects how useful a UCC website can be through a different mechanism from that of extroversion. As the neurotic persons tend to feel uneasy and have much doubt about a new type of information system, they actively explore its usefulness in order to relieve their uncomfortable feelings. In other words, neurotic persons seek out how useful a site can be in order to secure their own stable feelings. Meanwhile, extroverted persons explore how useful a site can be because of their positive attitude and curiosity. As a lot of MIS research has revealed that the recognition of how useful a site can be and how pleasurable it can be to use have been proven to have a significant effect on UCC activity. However, the relationship between these factors reveals different aspects based on the user's involvement in creation. This factor of creationgauges the interest of users in the creation of UCC contents. Involvement is a variable that shows the level of an individual's mental effort in creating UCC contents. When a user is highly involved in the creation process and makes an enormous effort to create UCC content (classed a part of a high-involvement group), their own pleasure and recognition of how useful the site is have a significantly higher effect on the future usage of the UCC contents, more significantly than the users who sit back and just retrieve the UCC content created by others. The cognitive and emotional response of those in the low-involvement group is unlikely to last long,even if they recognize the contents of a UCC website is pleasurable and useful to them. However, the high-involvement group tends to participate in the creation and the usage of UCC more favorably, connecting the experience with their own goals. In this respect, this research presents an answer to the question; why so many people are participating in the usage of UCC, the representative form of the Web 2.0 that has drastically involved more and more people in the creation of UCC, even if they cannot gain any monetary or social compensation. Neither information system nor a website can succeed unless it secures a certain level of user base. Moreover, it cannot be further developed when the reasons, or problems, for people's participation are not suitably explored, even if it has a certain user base. Thus, what is significant in this research is that it has studied users' respondent behavior based on an individual's innate personality, emotion, and cognitive interaction, unlike the existing research that has focused on 'compensation' to explain users' participation with the UCC website. There are also limitations in this research. Firstly, I divided an individual's personality into extroversion and neuroticism; however, there are many other personal factors such as neuro-psychiatricism, which also needs to be analyzed for its influence on UCC activities. Secondly, as a UCC website comes in many types such as multimedia, Wikis, and podcasting, these types need to be included as a sub-category of the UCC websites and their relationship with personality, emotion, cognition, and behavior also needs to be analyzed.

The Impacts of Need for Cognitive Closure, Psychological Wellbeing, and Social Factors on Impulse Purchasing (인지폐합수요(认知闭合需要), 심리건강화사회인소대충동구매적영향(心理健康和社会因素对冲动购买的影响))

  • Lee, Myong-Han;Schellhase, Ralf;Koo, Dong-Mo;Lee, Mi-Jeong
    • Journal of Global Scholars of Marketing Science
    • /
    • v.19 no.4
    • /
    • pp.44-56
    • /
    • 2009
  • Impulse purchasing is defined as an immediate purchase with no pre-shopping intentions. Previous studies of impulse buying have focused primarily on factors linked to marketing mix variables, situational factors, and consumer demographics and traits. In previous studies, marketing mix variables such as product category, product type, and atmospheric factors including advertising, coupons, sales events, promotional stimuli at the point of sale, and media format have been used to evaluate product information. Some authors have also focused on situational factors surrounding the consumer. Factors such as the availability of credit card usage, time available, transportability of the products, and the presence and number of shopping companions were found to have a positive impact on impulse buying and/or impulse tendency. Research has also been conducted to evaluate the effects of individual characteristics such as the age, gender, and educational level of the consumer, as well as perceived crowding, stimulation, and the need for touch, on impulse purchasing. In summary, previous studies have found that all products can be purchased impulsively (Vohs and Faber, 2007), that situational factors affect and/or at least facilitate impulse purchasing behavior, and that various individual traits are closely linked to impulse buying. The recent introduction of new distribution channels such as home shopping channels, discount stores, and Internet stores that are open 24 hours a day increases the probability of impulse purchasing. However, previous literature has focused predominantly on situational and marketing variables and thus studies that consider critical consumer characteristics are still lacking. To fill this gap in the literature, the present study builds on this third tradition of research and focuses on individual trait variables, which have rarely been studied. More specifically, the current study investigates whether impulse buying tendency has a positive impact on impulse buying behavior, and evaluates how consumer characteristics such as the need for cognitive closure (NFCC), psychological wellbeing, and susceptibility to interpersonal influences affect the tendency of consumers towards impulse buying. The survey results reveal that while consumer affective impulsivity has a strong positive impact on impulse buying behavior, cognitive impulsivity has no impact on impulse buying behavior. Furthermore, affective impulse buying tendency is driven by sub-components of NFCC such as decisiveness and discomfort with ambiguity, psychological wellbeing constructs such as environmental control and purpose in life, and by normative and informational influences. In addition, cognitive impulse tendency is driven by sub-components of NFCC such as decisiveness, discomfort with ambiguity, and close-mindedness, and the psychological wellbeing constructs of environmental control, as well as normative and informational influences. The present study has significant theoretical implications. First, affective impulsivity has a strong impact on impulse purchase behavior. Previous studies based on affectivity and flow theories proposed that low to moderate levels of impulsivity are driven by reduced self-control or a failure of self-regulatory mechanisms. The present study confirms the above proposition. Second, the present study also contributes to the literature by confirming that impulse buying tendency can be viewed as a two-dimensional concept with both affective and cognitive dimensions, and illustrates that impulse purchase behavior is explained mainly by affective impulsivity, not by cognitive impulsivity. Third, the current study accommodates new constructs such as psychological wellbeing and NFCC as potential influencing factors in the research model, thereby contributing to the existing literature. Fourth, by incorporating multi-dimensional concepts such as psychological wellbeing and NFCC, more diverse aspects of consumer information processing can be evaluated. Fifth, the current study also extends the existing literature by confirming the two competing routes of normative and informational influences. Normative influence occurs when individuals conform to the expectations of others or to enhance his/her self-image. Whereas informational influence occurs when individuals search for information from knowledgeable others or making inferences based upon observations of the behavior of others. The present study shows that these two competing routes of social influence can be attributed to different sources of influence power. The current study also has many practical implications. First, it suggests that people with affective impulsivity may be primary targets to whom companies should pay closer attention. Cultivating a more amenable and mood-elevating shopping environment will appeal to this segment. Second, the present results demonstrate that NFCC is closely related to the cognitive dimension of impulsivity. These people are driven by careless thoughts, not by feelings or excitement. Rational advertising at the point of purchase will attract these customers. Third, people susceptible to normative influences are another potential target market. Retailers and manufacturers could appeal to this segment by advertising their products and/or services as products that can be used to identify with or conform to the expectations of others in the aspiration group. However, retailers should avoid targeting people susceptible to informational influences as a segment market. These people are engaged in an extensive information search relevant to their purchase, and therefore more elaborate, long-term rational advertising messages, which can be internalized into these consumers' thought processes, will appeal to this segment. The current findings should be interpreted with caution for several reasons. The study used a small convenience sample, and only investigated behavior in two dimensions. Accordingly, future studies should incorporate a sample with more diverse characteristics and measure different aspects of behavior. Future studies should also investigate personality traits closely related to affectivity theories. Trait variables such as sensory curiosity, interpersonal curiosity, and atmospheric responsiveness are interesting areas for future investigation.

  • PDF

Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

  • Choi, Youji;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.155-175
    • /
    • 2017
  • As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.

Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews (온라인 상품평의 내용적 특성이 소비자의 인지된 유용성에 미치는 영향)

  • Park, Yoon-Joo;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.29-44
    • /
    • 2017
  • In Internet commerce, consumers are heavily influenced by product reviews written by other users who have already purchased the product. However, as the product reviews accumulate, it takes a lot of time and effort for consumers to individually check the massive number of product reviews. Moreover, product reviews that are written carelessly actually inconvenience consumers. Thus many online vendors provide mechanisms to identify reviews that customers perceive as most helpful (Cao et al. 2011; Mudambi and Schuff 2010). For example, some online retailers, such as Amazon.com and TripAdvisor, allow users to rate the helpfulness of each review, and use this feedback information to rank and re-order them. However, many reviews have only a few feedbacks or no feedback at all, thus making it hard to identify their helpfulness. Also, it takes time to accumulate feedbacks, thus the newly authored reviews do not have enough ones. For example, only 20% of the reviews in Amazon Review Dataset (Mcauley and Leskovec, 2013) have more than 5 reviews (Yan et al, 2014). The purpose of this study is to analyze the factors affecting the usefulness of online product reviews and to derive a forecasting model that selectively provides product reviews that can be helpful to consumers. In order to do this, we extracted the various linguistic, psychological, and perceptual elements included in product reviews by using text-mining techniques and identifying the determinants among these elements that affect the usability of product reviews. In particular, considering that the characteristics of the product reviews and determinants of usability for apparel products (which are experiential products) and electronic products (which are search goods) can differ, the characteristics of the product reviews were compared within each product group and the determinants were established for each. This study used 7,498 apparel product reviews and 106,962 electronic product reviews from Amazon.com. In order to understand a review text, we first extract linguistic and psychological characteristics from review texts such as a word count, the level of emotional tone and analytical thinking embedded in review text using widely adopted text analysis software LIWC (Linguistic Inquiry and Word Count). After then, we explore the descriptive statistics of review text for each category and statistically compare their differences using t-test. Lastly, we regression analysis using the data mining software RapidMiner to find out determinant factors. As a result of comparing and analyzing product review characteristics of electronic products and apparel products, it was found that reviewers used more words as well as longer sentences when writing product reviews for electronic products. As for the content characteristics of the product reviews, it was found that these reviews included many analytic words, carried more clout, and related to the cognitive processes (CogProc) more so than the apparel product reviews, in addition to including many words expressing negative emotions (NegEmo). On the other hand, the apparel product reviews included more personal, authentic, positive emotions (PosEmo) and perceptual processes (Percept) compared to the electronic product reviews. Next, we analyzed the determinants toward the usefulness of the product reviews between the two product groups. As a result, it was found that product reviews with high product ratings from reviewers in both product groups that were perceived as being useful contained a larger number of total words, many expressions involving perceptual processes, and fewer negative emotions. In addition, apparel product reviews with a large number of comparative expressions, a low expertise index, and concise content with fewer words in each sentence were perceived to be useful. In the case of electronic product reviews, those that were analytical with a high expertise index, along with containing many authentic expressions, cognitive processes, and positive emotions (PosEmo) were perceived to be useful. These findings are expected to help consumers effectively identify useful product reviews in the future.