• Title/Summary/Keyword: Online clustering

Search Result 104, Processing Time 0.026 seconds

Online VQ Codebook Generation using a Triangle Inequality (삼각 부등식을 이용한 온라인 VQ 코드북 생성 방법)

  • Lee, Hyunjin
    • Journal of Digital Contents Society
    • /
    • v.16 no.3
    • /
    • pp.373-379
    • /
    • 2015
  • In this paper, we propose an online VQ Codebook generation method for updating an existing VQ Codebook in real-time and adding to an existing cluster with newly created text data which are news paper, web pages, blogs, tweets and IoT data like sensor, machine. Without degrading the performance of the batch VQ Codebook to the existing data, it was able to take advantage of the newly added data by using a triangle inequality which modifying the VQ Codebook progressively show a high degree of accuracy and speed. The result of applying to test data showed that the performance is similar to the batch method.

TRIB : A Clustering and Visualization System for Responding Comments on Blogs (TRIB: 블로그 댓글 분류 및 시각화 시스템)

  • Lee, Yun-Jung;Ji, Jung-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The KIPS Transactions:PartD
    • /
    • v.16D no.5
    • /
    • pp.817-824
    • /
    • 2009
  • In recent years, Weblog has become the most typical social media for citizens to share their opinions. And, many Weblogs reflect several social issues. There are many internet users who actively express their opinions for internet news or Weblog articles through the replying comments on online community. Hence, we can easily find internet blogs including more than 10 thousand replying comments. It is hard to search and explore useful messages on weblogs since most of weblog systems show articles and their comments to the form of sequential list. In this paper, we propose a visualizing and clustering system called TRIB (Telescope for Responding comments for Internet Blog) for a large set of responding comments for a Weblog article. TRIB clusters and visualizes the replying comments considering their contents using pre-defined user dictionary. Also, TRIB provides various personalized views considering the interests of users. To show the usefulness of TRIB, we conducted some experiments, concerning the clustering and visualizing capabilities of TRIB, with articles that have more than 1,000 comments.

Multi-Document Summarization Method of Reviews Using Word Embedding Clustering (워드 임베딩 클러스터링을 활용한 리뷰 다중문서 요약기법)

  • Lee, Pil Won;Hwang, Yun Young;Choi, Jong Seok;Shin, Young Tae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.535-540
    • /
    • 2021
  • Multi-document refers to a document consisting of various topics, not a single topic, and a typical example is online reviews. There have been several attempts to summarize online reviews because of their vast amounts of information. However, collective summarization of reviews through existing summary models creates a problem of losing the various topics that make up the reviews. Therefore, in this paper, we present method to summarize the review with minimal loss of the topic. The proposed method classify reviews through processes such as preprocessing, importance evaluation, embedding substitution using BERT, and embedding clustering. Furthermore, the classified sentences generate the final summary using the trained Transformer summary model. The performance evaluation of the proposed model was compared by evaluating the existing summary model, seq2seq model, and the cosine similarity with the ROUGE score, and performed a high performance summary compared to the existing summary model.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

Cluster-Based Routing Mechanism for Efficient Data Delivery to Group Mobile Users in Wireless Ad-Hoc Networks (그룹 이동성을 가지는 모바일 사용자들 간의 효율적인 데이터 공유를 위한 클러스터 기반 그룹 라우팅 기법 메커니즘)

  • Yoo, Jinhee;Han, Kyeongah;Jeong, Dahee;Lee, HyungJune
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38C no.11
    • /
    • pp.1060-1073
    • /
    • 2013
  • In this paper, we present a cluster-based routing scheme for efficiently delivering data to group mobile users by extracting and clustering mobile user group simply from beacon message information in wireless ad-hoc networks. First, we propose an online-clustering mechanism that uses a local neighbor table on each node by recursively transmitting to neighbor nodes, and forms a group table where a set of listed nodes are classified as group members, without incurring much overhead. A node that appears the most frequently from neighbor tables throughout the network is selected as the cluster-head node, serving as a data gateway for the intra-cluster. Second, we design an inter-cluster routing that delivers data from stationary data sources to the selected cluster-head node, and a intra-cluster routing to deliver from the cluster-head node to users. Simulation results based on ns-2 in the ad-hoc networks consisting of 518 stationary nodes and 20 mobile nodes show that our proposed clustering mechanism achieves high clustering accuracy of 96 % on average. Regarding routing performance, our cluster-based routing scheme outperforms a naive one-to-one routing scheme without any clustering by reducing routing cost up to 1/20. Also, our intra-cluster routing utilizing a selected cluster-head node reduces routing cost in half as opposed to a counterpart of the intra-cluster routing through a randomly-selected internal group member.

Dessert Ateliers Recommendation Methods for Dessert E-commerce Services

  • Son, Yeonbin;Chang, Tai-Woo;Choi, Yerim
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.111-117
    • /
    • 2020
  • Dessert Ateliers (DA) are small shops that sell high-end homemade desserts such as macaroons, cakes, and cookies, and their popularity is increasing according to the emergence of small luxury trends. Even though each DA sells the same kinds of desserts, they are differentiated by the personality of their pastry chef; thus, there is a need to purchase desserts online that customers cannot see and purchase offline, and thus dessert e-commerce has emerged. However, it is impossible for customers to identify all the information of each DA and clearly understand customers' preferences when buying desserts through the dessert e-commerce. When a dessert e-commerce service provides a DA recommendation service, customers can reduce the time they hesitate before making a decision. Therefore, this paper proposes two kinds of DA recommendation method: a clustering-based recommendation method that calculates the similarity between customers' content and DAs and a dynamic weighting-based recommendation method that trains the importance of decision factors considering customer preferences. Various experiments were conducted using a real-world dataset to evaluate the performance of the proposed methods and it showed satisfactory results.

A Clustering Algorithm for Sequence Data Using Rough Set Theory (러프 셋 이론을 이용한 시퀀스 데이터의 클러스터링 알고리즘)

  • Oh, Seung-Joon;Park, Chan-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.2
    • /
    • pp.113-119
    • /
    • 2008
  • The World Wide Web is a dynamic collection of pages that includes a huge number of hyperlinks and huge volumes of usage informations. The resulting growth in online information combined with the almost unstructured web data necessitates the development of powerful web data mining tools. Recently, a number of approaches have been developed for dealing with specific aspects of web usage mining for the purpose of automatically discovering user profiles. We analyze sequence data, such as web-logs, protein sequences, and retail transactions. In our approach, we propose the clustering algorithm for sequence data using rough set theory. We present a simple example and experimental results using a splice dataset and synthetic datasets.

  • PDF

Customer Classification and Market Basket Analysis Using K-Means Clustering and Association Rules: Evidence from Distribution Big Data of Korean Retailing Company (군집분석과 연관규칙을 활용한 고객 분류 및 장바구니 분석: 소매 유통 빅데이터를 중심으로)

  • Liu, Run-Qing;Lee, Young-Chan;Mu, Hong-Lei
    • Knowledge Management Research
    • /
    • v.19 no.4
    • /
    • pp.59-76
    • /
    • 2018
  • With the arrival of the big data era, customer data and data mining analysis have gradually dominated the process of Customer Relationship Management (CRM). This phenomenon indicates that customer data along with the use of information techniques (IT) have become the basis for building a successful CRM strategy. However, some companies can not discover valuable information through a large amount of customer data, which leads to the failure of making appropriate business strategy. Without suitable strategies, the companies may lose the competitive advantage or probably go bankrupt. The purpose of this study is to propose CRM strategies by segmenting customers into VIPs and Non-VIPs and identifying purchase patterns using the the VIPs' transaction data and data mining techniques (K-means clustering and association rules) of online shopping mall in Korea. The results of this paper indicate that 227 customers were segmented into VIPs among 1866 customers. And according to 51,080 transactions data of VIPs, home product and women wear are frequently associated with food, which means that the purchase of home product or women wears mainly affect the purchase of food. Therefore, marketing managers of shopping mall should consider these shopping patterns when they build CRM strategy.

A Study of the Classification and Application of Digital Broadcast Program Type based on Machine Learning (머신러닝 기반의 디지털 방송 프로그램 유형 분류 및 활용 방안 연구)

  • Yoon, Sang-Hyeak;Lee, So-Hyun;Kim, Hee-Woong
    • Knowledge Management Research
    • /
    • v.20 no.3
    • /
    • pp.119-137
    • /
    • 2019
  • With the recent spread of digital content, more people have been watching the digital content of TV programs on their PCs or mobile devices, rather than on TVs. With the change in such media use pattern, genres(types) of broadcast programs change in the flow of the times and viewers' trends. The programs that were broadcast on TVs have been released in digital content, and thereby people watching such content change their perception. For this reason, it is necessary to newly and differently classify genres(types) of broadcast programs on the basis of digital content, from the conventional classification of program genres(types) in broadcasting companies or relevant industries. Therefore, this study suggests a plan for newly classifying broadcast programs through using machine learning with the log data of people watching the programs in online media and for applying the new classification. This study is academically meaningful in the point that it analyzes and classifies program types on the basis of digital content. In addition, it is meaningful in the point that it makes use of the program classification algorithm developed in relevant industries, and especially suggests the strategy and plan for applying it.

Automatic Recommendation of (IP)TV programs based on A Rank Model using Collaborative Filtering (협업 필터링을 이용한 순위 정렬 모델 기반 (IP)TV 프로그램 자동 추천)

  • Kim, Eun-Hui;Pyo, Shin-Jee;Kim, Mun-Churl
    • Journal of Broadcast Engineering
    • /
    • v.14 no.2
    • /
    • pp.238-252
    • /
    • 2009
  • Due to the rapid increase of available contents via the convergence of broadcasting and internet, the efficient access to personally preferred contents has become an important issue. In this paper, for recommendation scheme for TV programs using a collaborative filtering technique is studied. For recommendation of user preferred TV programs, our proposed recommendation scheme consists of offline and online computation. About offline computation, we propose reasoning implicitly each user's preference in TV programs in terms of program contents, genres and channels, and propose clustering users based on each user's preferences in terms of genres and channels by dynamic fuzzy clustering method. After an active user logs in, to recommend TV programs to the user with high accuracy, the online computation includes pulling similar users to an active user by similarity measure based on the standard preference list of active user and filtering-out of the watched TV programs of the similar users, which do not exist in EPG and ranking of the remaining TV programs by proposed rank model. Especially, in this paper, the BM (Best Match) algorithm is extended to make the recommended TV programs be ranked by taking into account user's preferences. The experimental results show that the proposed scheme with the extended BM model yields 62.1% of prediction accuracy in top five recommendations for the TV watching history of 2,441 people.