• Title/Summary/Keyword: Online mining

Search Result 398, Processing Time 0.026 seconds

A Study on the Effects of Online Word-of-Mouth on Game Consumers Based on Sentimental Analysis (감성분석 기반의 게임 소비자 온라인 구전효과 연구)

  • Jung, Keun-Woong;Kim, Jong Uk
    • Journal of Digital Convergence
    • /
    • v.16 no.3
    • /
    • pp.145-156
    • /
    • 2018
  • Unlike the past, when distributors distributed games through retail stores, they are now selling digital content, which is based on online distribution channels. This study analyzes the effects of eWOM (electronic Word of Mouth) on sales volume of game sold on Steam, an online digital content distribution channel. Recently, data mining techniques based on Big Data have been studied. In this study, emotion index of eWOM is derived by emotional analysis which is a text mining technique that can analyze the emotion of each review among factors of eWOM. Emotional analysis utilizes Naive Bayes and SVM classifier and calculates the emotion index through the SVM classifier with high accuracy. Regression analysis is performed on the dependent variable, sales variation, using the emotion index, the number of reviews of each game, the size of eWOM, and the user score of each game, which is a rating of eWOM. Regression analysis revealed that the size of the independent variable eWOM and the emotion index of the eWOM were influential on the dependent variable, sales variation. This study suggests the factors of eWOM that affect the sales volume when Korean game companies enter overseas markets based on steam.

Development of Hybrid Recommender System Using Review Data Mining: Kindle Store Data Analysis Case (리뷰 데이터 마이닝을 이용한 하이브리드 추천시스템 개발: Amazon Kindle Store 데이터 분석사례)

  • Yihua Zhang;Qinglong Li;Ilyoung Choi;Jaekyeong Kim
    • Information Systems Review
    • /
    • v.23 no.1
    • /
    • pp.155-172
    • /
    • 2021
  • With the recent increase in online product purchases, a recommender system that recommends products considering users' preferences has still been studied. The recommender system provides personalized product recommendation services to users. Collaborative Filtering (CF) using user ratings on products is one of the most widely used recommendation algorithms. During CF, the item-based method identifies the user's product by using ratings left on the product purchased by the user and obtains the similarity between the purchased product and the unpurchased product. CF takes a lot of time to calculate the similarity between products. In particular, it takes more time when using text-based big data such as review data of Amazon store. This paper suggests a hybrid recommendation system using a 2-phase methodology and text data mining to calculate the similarity between products easily and quickly. To this end, we collected about 980,000 online consumer ratings and review data from the online commerce store, Amazon Kinder Store. As a result of several experiments, it was confirmed that the suggested hybrid recommendation system reflecting the user's rating and review data has resulted in similar recommendation time, but higher accuracy compared to the CF-based benchmark recommender systems. Therefore, the suggested system is expected to increase the user's satisfaction and increase its sales.

Incidence of Online Public Opinion on Guangzhou Simultaneous Renting and Purchasing Policy - A data mining application

  • Wang, Yancheng;Li, Haixian
    • Asian Journal for Public Opinion Research
    • /
    • v.5 no.4
    • /
    • pp.266-284
    • /
    • 2018
  • This paper adopts the big data research method, and draws 491 data from the Tianya Forum about the Simultaneous Renting and Purchasing policy of Guangzhou. The qualitative analysis software Nvivo11 is used to cluster the main questions about the Simultaneous Renting and Purchasing policy in the forum. The 36 high-frequency word frequencies are obtained through text clustering. Through rooted theory analysis, the main driving factors for summarizing people's doubts are 9 main categories, 3 core categories, and the model of driving factors for online forums is established. The study finds that resource factors are the most key factor, economic factors are the important drivers, and policy guiding factors are sub-important drivers.

EXTENDED ONLINE DIVISIVE AGGLOMERATIVE CLUSTERING

  • Musa, Ibrahim Musa Ishag;Lee, Dong-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.406-409
    • /
    • 2008
  • Clustering data streams has an importance over many applications like sensor networks. Existing hierarchical methods follow a semi fuzzy clustering that yields duplicate clusters. In order to solve the problems, we propose an extended online divisive agglomerative clustering on data streams. It builds a tree-like top-down hierarchy of clusters that evolves with data streams using geometric time frame for snapshots. It is an enhancement of the Online Divisive Agglomerative Clustering (ODAC) with a pruning strategy to avoid duplicate clusters. Our main features are providing update time and memory space which is independent of the number of examples on data streams. It can be utilized for clustering sensor data and network monitoring as well as web click streams.

  • PDF

Profane or Not: Improving Korean Profane Detection using Deep Learning

  • Woo, Jiyoung;Park, Sung Hee;Kim, Huy Kang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.305-318
    • /
    • 2022
  • Abusive behaviors have become a common issue in many online social media platforms. Profanity is common form of abusive behavior in online. Social media platforms operate the filtering system using popular profanity words lists, but this method has drawbacks that it can be bypassed using an altered form and it can detect normal sentences as profanity. Especially in Korean language, the syllable is composed of graphemes and words are composed of multiple syllables, it can be decomposed into graphemes without impairing the transmission of meaning, and the form of a profane word can be seen as a different meaning in a sentence. This work focuses on the problem of filtering system mis-detecting normal phrases with profane phrases. For that, we proposed the deep learning-based framework including grapheme and syllable separation-based word embedding and appropriate CNN structure. The proposed model was evaluated on the chatting contents from the one of the famous online games in South Korea and generated 90.4% accuracy.

Development Problems and Countermeasures of Rural E-Commerce Logistics in the Context of Big Data and Internet of Things

  • Xianfeng Zhu
    • Journal of Information Processing Systems
    • /
    • v.19 no.2
    • /
    • pp.267-274
    • /
    • 2023
  • As the Internet has expanded and the continuous expansion of online shopping in China, many rural areas also have sales outlets. Due to the impact of economic conditions, rural locations have inadequate e-commerce logistical infrastructure, the number of outlets is small, and each other is in a decentralized state. For various reasons, the advancement of rural e-commerce logistics lags far behind that in urban areas. As the Internet of Things with big data grow in popularity, we can create and enhance the assurance system for the booming ecommerce in rural areas by building the support system of rural online shopping platform, and strengthening the joint distribution of logistics terminals based on data mining, so as to encourage the quick and healthy growth of rural online shopping.

Analyzing User Feedback on a Fan Community Platform 'Weverse': A Text Mining Approach

  • Thi Thao Van Ho;Mi Jin Noh;Yu Na Lee;Yang Sok Kim
    • Smart Media Journal
    • /
    • v.13 no.6
    • /
    • pp.62-71
    • /
    • 2024
  • This study applies topic modeling to uncover user experience and app issues expressed in users' online reviews of a fan community platform, Weverse on Google Play Store. It allows us to identify the features which need to be improved to enhance user experience or need to be maintained and leveraged to attract more users. Therefore, we collect 88,068 first-level English online reviews of Weverse on Google Play Store with Google-Play-Scraper tool. After the initial preprocessing step, a dataset of 31,861 online reviews is analyzed using Latent Dirichlet Allocation (LDA) topic modeling with Gensim library in Python. There are 5 topics explored in this study which highlight significant issues such as network connection error, delayed notification, and incorrect translation. Besides, the result revealed the app's effectiveness in fostering not only interaction between fans and artists but also fans' mutual relationships. Consequently, the business can strengthen user engagement and loyalty by addressing the identified drawbacks and leveraging the platform for user communication.

Exploring Simultaneous Presentation in Online Restaurant Reviews: An Analysis of Textual and Visual Content

  • Lin Li;Gang Ren;Taeho Hong;Sung-Byung Yang
    • Asia pacific journal of information systems
    • /
    • v.29 no.2
    • /
    • pp.181-202
    • /
    • 2019
  • The purpose of this study is to explore the effect of different types of simultaneous presentation (i.e., reviewer information, textual and visual content, and similarity between textual-visual contents) on review usefulness and review enjoyment in online restaurant reviews (ORRs), as they are interrelated yet have rarely been examined together in previous research. By using Latent Dirichlet Allocation (LDA) topic modeling and state-of-the-art machine learning (ML) methodologies, we found that review readability in textual content and salient objects in images in visual content have a significant impact on both review usefulness and review enjoyment. Moreover, similarity between textual-visual contents was found to be a major factor in determining review usefulness but not review enjoyment. As for reviewer information, reputation, expertise, and location of residence, these were found to be significantly related to review enjoyment. This study contributes to the body of knowledge on ORRs and provides valuable implications for general users and managers in the hospitality and tourism industries.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

A Clustering Algorithm for Sequence Data Using Rough Set Theory (러프 셋 이론을 이용한 시퀀스 데이터의 클러스터링 알고리즘)

  • Oh, Seung-Joon;Park, Chan-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.2
    • /
    • pp.113-119
    • /
    • 2008
  • The World Wide Web is a dynamic collection of pages that includes a huge number of hyperlinks and huge volumes of usage informations. The resulting growth in online information combined with the almost unstructured web data necessitates the development of powerful web data mining tools. Recently, a number of approaches have been developed for dealing with specific aspects of web usage mining for the purpose of automatically discovering user profiles. We analyze sequence data, such as web-logs, protein sequences, and retail transactions. In our approach, we propose the clustering algorithm for sequence data using rough set theory. We present a simple example and experimental results using a splice dataset and synthetic datasets.

  • PDF