• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.042 seconds

How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

  • Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.219-239
    • /
    • 2019
  • As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.

Influence analysis of Internet buzz to corporate performance : Individual stock price prediction using sentiment analysis of online news (온라인 언급이 기업 성과에 미치는 영향 분석 : 뉴스 감성분석을 통한 기업별 주가 예측)

  • Jeong, Ji Seon;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.37-51
    • /
    • 2015
  • Due to the development of internet technology and the rapid increase of internet data, various studies are actively conducted on how to use and analyze internet data for various purposes. In particular, in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of the current application of structured data. Especially, there are various studies on sentimental analysis to score opinions based on the distribution of polarity such as positivity or negativity of vocabularies or sentences of the texts in documents. As a part of such studies, this study tries to predict ups and downs of stock prices of companies by performing sentimental analysis on news contexts of the particular companies in the Internet. A variety of news on companies is produced online by different economic agents, and it is diffused quickly and accessed easily in the Internet. So, based on inefficient market hypothesis, we can expect that news information of an individual company can be used to predict the fluctuations of stock prices of the company if we apply proper data analysis techniques. However, as the areas of corporate management activity are different, an analysis considering characteristics of each company is required in the analysis of text data based on machine-learning. In addition, since the news including positive or negative information on certain companies have various impacts on other companies or industry fields, an analysis for the prediction of the stock price of each company is necessary. Therefore, this study attempted to predict changes in the stock prices of the individual companies that applied a sentimental analysis of the online news data. Accordingly, this study chose top company in KOSPI 200 as the subjects of the analysis, and collected and analyzed online news data by each company produced for two years on a representative domestic search portal service, Naver. In addition, considering the differences in the meanings of vocabularies for each of the certain economic subjects, it aims to improve performance by building up a lexicon for each individual company and applying that to an analysis. As a result of the analysis, the accuracy of the prediction by each company are different, and the prediction accurate rate turned out to be 56% on average. Comparing the accuracy of the prediction of stock prices on industry sectors, 'energy/chemical', 'consumer goods for living' and 'consumer discretionary' showed a relatively higher accuracy of the prediction of stock prices than other industries, while it was found that the sectors such as 'information technology' and 'shipbuilding/transportation' industry had lower accuracy of prediction. The number of the representative companies in each industry collected was five each, so it is somewhat difficult to generalize, but it could be confirmed that there was a difference in the accuracy of the prediction of stock prices depending on industry sectors. In addition, at the individual company level, the companies such as 'Kangwon Land', 'KT & G' and 'SK Innovation' showed a relatively higher prediction accuracy as compared to other companies, while it showed that the companies such as 'Young Poong', 'LG', 'Samsung Life Insurance', and 'Doosan' had a low prediction accuracy of less than 50%. In this paper, we performed an analysis of the share price performance relative to the prediction of individual companies through the vocabulary of pre-built company to take advantage of the online news information. In this paper, we aim to improve performance of the stock prices prediction, applying online news information, through the stock price prediction of individual companies. Based on this, in the future, it will be possible to find ways to increase the stock price prediction accuracy by complementing the problem of unnecessary words that are added to the sentiment dictionary.

Construction of Consumer Confidence index based on Sentiment analysis using News articles (뉴스기사를 이용한 소비자의 경기심리지수 생성)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.1-27
    • /
    • 2017
  • It is known that the economic sentiment index and macroeconomic indicators are closely related because economic agent's judgment and forecast of the business conditions affect economic fluctuations. For this reason, consumer sentiment or confidence provides steady fodder for business and is treated as an important piece of economic information. In Korea, private consumption accounts and consumer sentiment index highly relevant for both, which is a very important economic indicator for evaluating and forecasting the domestic economic situation. However, despite offering relevant insights into private consumption and GDP, the traditional approach to measuring the consumer confidence based on the survey has several limits. One possible weakness is that it takes considerable time to research, collect, and aggregate the data. If certain urgent issues arise, timely information will not be announced until the end of each month. In addition, the survey only contains information derived from questionnaire items, which means it can be difficult to catch up to the direct effects of newly arising issues. The survey also faces potential declines in response rates and erroneous responses. Therefore, it is necessary to find a way to complement it. For this purpose, we construct and assess an index designed to measure consumer economic sentiment index using sentiment analysis. Unlike the survey-based measures, our index relies on textual analysis to extract sentiment from economic and financial news articles. In particular, text data such as news articles and SNS are timely and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. There exist two main approaches to the automatic extraction of sentiment from a text, we apply the lexicon-based approach, using sentiment lexicon dictionaries of words annotated with the semantic orientations. In creating the sentiment lexicon dictionaries, we enter the semantic orientation of individual words manually, though we do not attempt a full linguistic analysis (one that involves analysis of word senses or argument structure); this is the limitation of our research and further work in that direction remains possible. In this study, we generate a time series index of economic sentiment in the news. The construction of the index consists of three broad steps: (1) Collecting a large corpus of economic news articles on the web, (2) Applying lexicon-based methods for sentiment analysis of each article to score the article in terms of sentiment orientation (positive, negative and neutral), and (3) Constructing an economic sentiment index of consumers by aggregating monthly time series for each sentiment word. In line with existing scholarly assessments of the relationship between the consumer confidence index and macroeconomic indicators, any new index should be assessed for its usefulness. We examine the new index's usefulness by comparing other economic indicators to the CSI. To check the usefulness of the newly index based on sentiment analysis, trend and cross - correlation analysis are carried out to analyze the relations and lagged structure. Finally, we analyze the forecasting power using the one step ahead of out of sample prediction. As a result, the news sentiment index correlates strongly with related contemporaneous key indicators in almost all experiments. We also find that news sentiment shocks predict future economic activity in most cases. In almost all experiments, the news sentiment index strongly correlates with related contemporaneous key indicators. Furthermore, in most cases, news sentiment shocks predict future economic activity; in head-to-head comparisons, the news sentiment measures outperform survey-based sentiment index as CSI. Policy makers want to understand consumer or public opinions about existing or proposed policies. Such opinions enable relevant government decision-makers to respond quickly to monitor various web media, SNS, or news articles. Textual data, such as news articles and social networks (Twitter, Facebook and blogs) are generated at high-speeds and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. Although research using unstructured data in economic analysis is in its early stages, but the utilization of data is expected to greatly increase once its usefulness is confirmed.

A Method of Analyzing Sentiment Polarity of Multilingual Social Media: A Case of Korean-Chinese Languages (다국어 소셜미디어에 대한 감성분석 방법 개발: 한국어-중국어를 중심으로)

  • Cui, Meina;Jin, Yoonsun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.91-111
    • /
    • 2016
  • It is crucial for the social media based marketing practices to perform sentiment analyze the unstructured data written by the potential consumers of their products and services. In particular, when it comes to the companies which are interested in global business, the companies must collect and analyze the data from the social media of multinational settings (e.g. Youtube, Instagram, etc.). In this case, since the texts are multilingual, they usually translate the sentences into a certain target language before conducting sentiment analysis. However, due to the lack of cultural differences and highly qualified data dictionary, translated sentences suffer from misunderstanding the true meaning. These result in decreasing the quality of sentiment analysis. Hence, this study aims to propose a method to perform a multilingual sentiment analysis, focusing on Korean-Chinese cases, while avoiding language translations. To show the feasibility of the idea proposed in this paper, we compare the performance of the proposed method with those of the legacy methods which adopt language translators. The results suggest that our method outperforms in terms of RMSE, and can be applied by the global business institutions.

Service Differentiation in Ad Hoc Networks by a Modified Backoff Algorithm (애드혹 네트워크 상에서 backoff 알고리즘 수정에 의한 서비스 차별화)

  • Seoung-Seok Kang;Jin Kim
    • Journal of KIISE:Information Networking
    • /
    • v.31 no.4
    • /
    • pp.414-428
    • /
    • 2004
  • Many portable devices are coming to be commercially successful and provide useful services to mobile users. Mobile devices may request a variety of data types, including text and multimedia data, thanks to the rich content of the Internet. Different types of data and/or different classes of users may need to be treated with different qualities of service. The implementation of service differentiation in wireless networks is very difficult because of device mobility and wireless channel contention when the backoff algorithm is used to resolve contention. Modification of the t)mary exponential backoff algorithm is one possibility to allow the design of several classes of data traffic flows. We present a study of modifications to the backoff algorithm to support three classes of flows: sold, silver, and bronze. For example, the gold c]ass flows are the highest priority and should satisfy their required target bandwidth, whereas the silver class flows should receive reasonably high bandwidth compared to the bronze class flows. The mixture of the two different transport protocols, UDP and TCP, in ad hoc networks raises significant challenges when defining backoff algorithm modifications. Due to the different characteristics of UDP and TCP, different backoff algorithm modifications are applied to each class of packets from the two transport protocols. Nevertheless, we show by means of simulation that our approach of backoff algorithm modification clearly differentiates service between different flows of classes regardless of the type of transport protocol.

An Exploration of the Relationship Between Virtual Museum Exhibitions and Visitors' Responses (미술관, 박물관 가상전시디자인에 대한 관람객의 반응연구)

  • Park, Nam-Jin
    • Archives of design research
    • /
    • v.19 no.1 s.63
    • /
    • pp.181-190
    • /
    • 2006
  • This study began with an assumption that virtual museum exhibitions will continue to be created in the future and more knowledge is required about designing effective virtual exhibit designs. This study explored the relationship between virtual exhibitions and visitor's opinions following the viewing of the virtual exhibit in order to determine the components of a well-constructed virtual exhibit design. To address the research problem, this study explored two aspects of virtual exhibit design: 1) what are the components of a well-constructed virtual exhibit, 2) how does viewing the virtual exhibit change visitors' opinions about both physical and virtual museum experiences. The methodology of the study employed surveys, interviews and observations as instruments of data collection. Twenty-five participants were given a survey prior to their viewing of the on-line exhibit, then they were given the opportunity to view the web-site and finally surveyed regarding their opinions. From the 25 participants, six were selected for observation to record behavior exhibited while they viewed the site. In addition, five were interviewed for a better understanding of their responses to various aspects of the virtual exhibit experiences. Data from the surveys was tabulated for descriptive percentages in order to identify numerical patterns of relationship. Observation data was analyzed for simple frequencies in categories of responses and interview data was tape recorded and transcribed into text files. Based on study results, recommendations were made for the future role of interior design in virtual space that stands independent from a physical building and resides only on the Internet.

  • PDF

Application of Advertisement Filtering Model and Method for its Performance Improvement (광고 글 필터링 모델 적용 및 성능 향상 방안)

  • Park, Raegeun;Yun, Hyeok-Jin;Shin, Ui-Cheol;Ahn, Young-Jin;Jeong, Seungdo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.11
    • /
    • pp.1-8
    • /
    • 2020
  • In recent years, due to the exponential increase in internet data, many fields such as deep learning have developed, but side effects generated as commercial advertisements, such as viral marketing, have been discovered. This not only damages the essence of the internet for sharing high-quality information, but also causes problems that increase users' search times to acquire high-quality information. In this study, we define advertisement as "a text that obscures the essence of information transmission" and we propose a model for filtering information according to that definition. The proposed model consists of advertisement filtering and advertisement filtering performance improvement and is designed to continuously improve performance. We collected data for filtering advertisements and learned document classification using KorBERT. Experiments were conducted to verify the performance of this model. For data combining five topics, accuracy and precision were 89.2% and 84.3%, respectively. High performance was confirmed, even if atypical characteristics of advertisements are considered. This approach is expected to reduce wasted time and fatigue in searching for information, because our model effectively delivers high-quality information to users through a process of determining and filtering advertisement paragraphs.

A Direction Computation and Media Retrieval Method of Moving Object using Weighted Vector Sum (가중치 벡터합을 이용한 이동객체의 방향계산 및 미디어 검색방법)

  • Suh, Chang-Duk;Han, Gi-Tae
    • The KIPS Transactions:PartD
    • /
    • v.15D no.3
    • /
    • pp.399-410
    • /
    • 2008
  • This paper suggests a new retrieval method using weighted vector sum to resolve a problem of traditional location-based retrieval method, nearest neighbor (NN) query, and NN query using direction. The proposed method filters out data with the radius, and then the remained retrieval area is filtered by a direction information compounded of a user's moving direction, a pre-fixed interesting direction, and a pre-fixed retrieval angle. The moving direction is computed from a vector or a weighted vector sum of several vectors using a weight to adopt several cases. The retrieval angle can be set from traditional $360^{\circ}$ to any degree you want. The retrieval data for this method can be a still and moving image recorded shooting location, and also several type of media like text, web, picture offering to customer with location of company or resort. The suggested method guarantees more accurate retrieval than traditional location-based retrieval methods because that the method selects data within the radius and then removes data of useless areas like passed areas or an area of different direction. Moreover, this method is more flexible and includes the direction based NN.

Development of a gridded crop growth simulation system for the DSSAT model using script languages (스크립트 언어를 사용한 DSSAT 모델 기반 격자형 작물 생육 모의 시스템 개발)

  • Yoo, Byoung Hyun;Kim, Kwang Soo;Ban, Ho-Young
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.20 no.3
    • /
    • pp.243-251
    • /
    • 2018
  • The gridded simulation of crop growth, which would be useful for shareholders and policy makers, often requires specialized computation tasks for preparation of weather input data and operation of a given crop model. Here we developed an automated system to allow for crop growth simulation over a region using the DSSAT (Decision Support System for Agrotechnology Transfer) model. The system consists of modules implemented using R and shell script languages. One of the modules has a functionality to create weather input files in a plain text format for each cell. Another module written in R script was developed for GIS data processing and parallel computing. The other module that launches the crop model automatically was implemented using the shell script language. As a case study, the automated system was used to determine the maximum soybean yield for a given set of management options in Illinois state in the US. The AgMERRA dataset, which is reanalysis data for agricultural models, was used to prepare weather input files during 1981 - 2005. It took 7.38 hours to create 1,859 weather input files for one year of soybean growth simulation in Illinois using a single CPU core. In contrast, the processing time decreased considerably, e.g., 35 minutes, when 16 CPU cores were used. The automated system created a map of the maturity group and the planting date that resulted in the maximum yield in a raster data format. Our results indicated that the automated system for the DSSAT model would help spatial assessments of crop yield at a regional scale.

Pattern Analysis for Civil Complaints of Local Governments Using a Text Mining (텍스트마이닝에 의한 지자체 민원청구 패턴 분석)

  • Won, Tae Hong;Yoo, Hwan Hee
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.34 no.3
    • /
    • pp.319-327
    • /
    • 2016
  • Korea faces a wide range of problems in areas such as safety, environment, and traffic due to the rapid economic development and urbanization process. Despite the local governments’ efforts to deal with electronic civil complaints and solve urban problems, civil complaints have been on the increase year by year. In this study, we collected civil complaint data over the last six years from a small and medium-sized city, Jinju-si. In order to conduct a spatial distribution pattern analysis, we indicated the location data on the area through Geocoding after classifying the reasons for civil complaints and then extracted the location data of the civil complaint occurrence spots in order to analyze the correlation between electronic civil complaints and land use. Results demonstrated that electronic civil complaints in Jinju-si were clustered in residential, central commercial, and residential-industrial mixed-use areas—areas where land development had been completed within the city center. After analyzing the civil complaints according to the land use, results revealed that complaints about illegal parking were the highest. Regarding the analysis results of facility distribution within a 50m radius from the civil complaint areas, civil complaints occurred a lot in detached housing areas located within the commercial and residential-industrial mixed-use areas. In the case of residential areas(old downtown), civil complaints were condensed in the areas with many ordinary restaurants. This research explored civil complaints in terms of the urban space and can be expected to be effectively utilized in finding solutions to the civil complaints