• Title/Summary/Keyword: Time-based Clustering

Search Result 716, Processing Time 0.025 seconds

Discovering Community Interests Approach to Topic Model with Time Factor and Clustering Methods

  • Ho, Thanh;Thanh, Tran Duy
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.163-177
    • /
    • 2021
  • Many methods of discovering social networking communities or clustering of features are based on the network structure or the content network. This paper proposes a community discovery method based on topic models using a time factor and an unsupervised clustering method. Online community discovery enables organizations and businesses to thoroughly understand the trend in users' interests in their products and services. In addition, an insight into customer experience on social networks is a tremendous competitive advantage in this era of ecommerce and Internet development. The objective of this work is to find clusters (communities) such that each cluster's nodes contain topics and individuals having similarities in the attribute space. In terms of social media analytics, the method seeks communities whose members have similar features. The method is experimented with and evaluated using a Vietnamese corpus of comments and messages collected on social networks and ecommerce sites in various sectors from 2016 to 2019. The experimental results demonstrate the effectiveness of the proposed method over other methods.

On the clustering of huge categorical data

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.6
    • /
    • pp.1353-1359
    • /
    • 2010
  • Basic objective in cluster analysis is to discover natural groupings of items. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input data. Various measures of similarities between objects are developed. In this paper, we consider a clustering of huge categorical real data set which shows the aspects of time-location-activity of Korean people. Some useful similarity measure for the data set, are developed and adopted for the categorical variables. Hierarchical and nonhierarchical clustering method are applied for the considered data set which is huge and consists of many categorical variables.

Plurality Rule-based Density and Correlation Coefficient-based Clustering for K-NN

  • Aung, Swe Swe;Nagayama, Itaru;Tamaki, Shiro
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.3
    • /
    • pp.183-192
    • /
    • 2017
  • k-nearest neighbor (K-NN) is a well-known classification algorithm, being feature space-based on nearest-neighbor training examples in machine learning. However, K-NN, as we know, is a lazy learning method. Therefore, if a K-NN-based system very much depends on a huge amount of history data to achieve an accurate prediction result for a particular task, it gradually faces a processing-time performance-degradation problem. We have noticed that many researchers usually contemplate only classification accuracy. But estimation speed also plays an essential role in real-time prediction systems. To compensate for this weakness, this paper proposes correlation coefficient-based clustering (CCC) aimed at upgrading the performance of K-NN by leveraging processing-time speed and plurality rule-based density (PRD) to improve estimation accuracy. For experiments, we used real datasets (on breast cancer, breast tissue, heart, and the iris) from the University of California, Irvine (UCI) machine learning repository. Moreover, real traffic data collected from Ojana Junction, Route 58, Okinawa, Japan, was also utilized to lay bare the efficiency of this method. By using these datasets, we proved better processing-time performance with the new approach by comparing it with classical K-NN. Besides, via experiments on real-world datasets, we compared the prediction accuracy of our approach with density peaks clustering based on K-NN and principal component analysis (DPC-KNN-PCA).

A Task Scheduling Method after Clustering for Data Intensive Jobs in Heterogeneous Distributed Systems

  • Hajikano, Kazuo;Kanemitsu, Hidehiro;Kim, Moo Wan;Kim, Hee-Dong
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.1
    • /
    • pp.9-20
    • /
    • 2016
  • Several task clustering heuristics are proposed for allocating tasks in heterogeneous systems to achieve a good response time in data intensive jobs. However, one of the challenging problems is the process in task scheduling after task allocation by task clustering. We propose a task scheduling method after task clustering, leveraging worst schedule length (WSL) as an upper bound of the schedule length. In our proposed method, a task in a WSL sequence is scheduled preferentially to make the WSL smaller. Experimental results by simulation show that the response time is improved in several task clustering heuristics. In particular, our proposed scheduling method with the task clustering outperforms conventional list-based task scheduling methods.

Black-Litterman Portfolio with K-shape Clustering (K-shape 군집화 기반 블랙-리터만 포트폴리오 구성)

  • Yeji Kim;Poongjin Cho
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.4
    • /
    • pp.63-73
    • /
    • 2023
  • This study explores modern portfolio theory by integrating the Black-Litterman portfolio with time-series clustering, specificially emphasizing K-shape clustering methodology. K-shape clustering enables grouping time-series data effectively, enhancing the ability to plan and manage investments in stock markets when combined with the Black-Litterman portfolio. Based on the patterns of stock markets, the objective is to understand the relationship between past market data and planning future investment strategies through backtesting. Additionally, by examining diverse learning and investment periods, it is identified optimal strategies to boost portfolio returns while efficiently managing associated risks. For comparative analysis, traditional Markowitz portfolio is also assessed in conjunction with clustering techniques utilizing K-Means and K-Means with Dynamic Time Warping. It is suggested that the combination of K-shape and the Black-Litterman model significantly enhances portfolio optimization in the stock market, providing valuable insights for making stable portfolio investment decisions. The achieved sharpe ratio of 0.722 indicates a significantly higher performance when compared to other benchmarks, underlining the effectiveness of the K-shape and Black-Litterman integration in portfolio optimization.

The Energy Efficiency of Improved Routing Technique Based on The LEACH

  • Gauta, Ganesh;Cho, Seongsoo;Jung, Kyedong;Lee, Jong-Yong
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.7 no.1
    • /
    • pp.49-56
    • /
    • 2015
  • As WSN is energy constraint so energy efficiency of nodes is important. Because avoiding long distance communication, clustering operating in rounds is an efficient algorithm for prolonging the lifetime of WSN and its performance depends on duration of a round. A short round time leads to frequent re-clustering while a long round time increases energy consume of cluster heads more. So existing clustering schemes determine proper round time, based on the parameters of initial WSN. But it is not appropriate to apply the round time according to initial value throughout the whole network time because WSN is very dynamic networks nodes can be added or vanished. In this paper we propose a new algorithm which calculates the round time relying on the alive node number to adapt the dynamic WSN. Simulation results validate the proposed algorithm has better performance in terms of energy consumption of nodes and loss rate of data.

Model-based Clustering of DOA Data Using von Mises Mixture Model for Sound Source Localization

  • Dinh, Quang Nguyen;Lee, Chang-Hoon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.59-66
    • /
    • 2013
  • In this paper, we propose a probabilistic framework for model-based clustering of direction of arrival (DOA) data to obtain stable sound source localization (SSL) estimates. Model-based clustering has been shown capable of handling highly overlapped and noisy datasets, such as those involved in DOA detection. Although the Gaussian mixture model is commonly used for model-based clustering, we propose use of the von Mises mixture model as more befitting circular DOA data than a Gaussian distribution. The EM framework for the von Mises mixture model in a unit hyper sphere is degenerated for the 2D case and used as such in the proposed method. We also use a histogram of the dataset to initialize the number of clusters and the initial values of parameters, thereby saving calculation time and improving the efficiency. Experiments using simulated and real-world datasets demonstrate the performance of the proposed method.

Clustering Algorithm by Grid-based Sampling

  • Park, Hee-Chang;Ryu, Jee-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.05a
    • /
    • pp.97-108
    • /
    • 2003
  • Cluster analysis has been widely used in many applications, such that pattern analysis or recognition, data analysis, image processing, market research on on-line or off-line and so on. Clustering can identify dense and sparse regions among data attributes or object attributes. But it requires many hours to get clusters that we want, because of clustering is more primitive, explorative and we make many data an object of cluster analysis. In this paper we propose a new method of clustering using sample based on grid. It is more fast than any traditional clustering method and maintains its accuracy. It reduces running time by using grid-based sample. And other clustering applications can be more effective by using this methods with its original methods.

  • PDF

Metro Station Clustering based on Travel-Time Distributions (통행시간 분포 기반의 전철역 클러스터링)

  • Gong, InTaek;Kim, DongYun;Min, Yunhong
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.2
    • /
    • pp.193-204
    • /
    • 2022
  • Smart card data is representative mobility data and can be used for policy development by analyzing public transportation usage behavior. This paper deals with the problem of classifying metro stations using metro usage patterns as one of these studies. Since the previous papers dealing with clustering of metro stations only considered traffic among usage behaviors, this paper proposes clustering considering traffic time as one of the complementary methods. Passengers at each station were classified into passengers arriving at work time, arriving at quitting time, leaving at work time, and leaving at quitting time, and then the estimated shape parameter was defined as the characteristic value of the station by modeling each transit time to Weibull distribution. And the characteristic vectors were clustered using the K-means clustering technique. As a result of the experiment, it was observed that station clustering considering pass time is not only similar to the clustering results of previous studies, but also enables more granular clustering.

Performance of Collaborative Filtering Agent System using Clustering for Better Recommendations (개선된 추천을 위해 클러스터링을 이용한 협동적 필터링 에이전트 시스템의 성능)

  • Hwang, Byeong-Yeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.5S
    • /
    • pp.1599-1608
    • /
    • 2000
  • Automated collaborative filtering is on the verge of becoming a popular technique to reduce overloaded information as well as to solve the problems that content-based information filtering systems cannot handle. In this paper, we describe three different algorithms that perform collaborative filtering: GroupLens that is th traditional technique; Best N, the modified one; and an algorithm that uses clustering. Based on the exeprimental results using real data, the algorithm using clustering is compared with the existing representative collaborative filtering agent algorithms such as GroupLens and Best N. The experimental results indicate that the algorithms using clustering is similar to Best N and better than GroupLens for prediction accuracy. The results also demonstrate that the algorithm using clustering produces the best performance according to the standard deviation of error rate. This means that the algorithm using clustering gives the most stable and the best uniform recommendation. In addition, the algorithm using clustering reduces the time of recommendation.

  • PDF