• 제목/요약/키워드: Online clustering

검색결과 104건 처리시간 0.024초

EXTENDED ONLINE DIVISIVE AGGLOMERATIVE CLUSTERING

  • Musa, Ibrahim Musa Ishag;Lee, Dong-Gyu;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2008년도 International Symposium on Remote Sensing
    • /
    • pp.406-409
    • /
    • 2008
  • Clustering data streams has an importance over many applications like sensor networks. Existing hierarchical methods follow a semi fuzzy clustering that yields duplicate clusters. In order to solve the problems, we propose an extended online divisive agglomerative clustering on data streams. It builds a tree-like top-down hierarchy of clusters that evolves with data streams using geometric time frame for snapshots. It is an enhancement of the Online Divisive Agglomerative Clustering (ODAC) with a pruning strategy to avoid duplicate clusters. Our main features are providing update time and memory space which is independent of the number of examples on data streams. It can be utilized for clustering sensor data and network monitoring as well as web click streams.

  • PDF

Online nonparametric Bayesian analysis of parsimonious Gaussian mixture models and scenes clustering

  • Zhou, Ri-Gui;Wang, Wei
    • ETRI Journal
    • /
    • 제43권1호
    • /
    • pp.74-81
    • /
    • 2021
  • The mixture model is a very powerful and flexible tool in clustering analysis. Based on the Dirichlet process and parsimonious Gaussian distribution, we propose a new nonparametric mixture framework for solving challenging clustering problems. Meanwhile, the inference of the model depends on the efficient online variational Bayesian approach, which enhances the information exchange between the whole and the part to a certain extent and applies to scalable datasets. The experiments on the scene database indicate that the novel clustering framework, when combined with a convolutional neural network for feature extraction, has meaningful advantages over other models.

Online Clustering Algorithms for Semantic-Rich Network Trajectories

  • Roh, Gook-Pil;Hwang, Seung-Won
    • Journal of Computing Science and Engineering
    • /
    • 제5권4호
    • /
    • pp.346-353
    • /
    • 2011
  • With the advent of ubiquitous computing, a massive amount of trajectory data has been published and shared in many websites. This type of computing also provides motivation for online mining of trajectory data, to fit user-specific preferences or context (e.g., time of the day). While many trajectory clustering algorithms have been proposed, they have typically focused on offline mining and do not consider the restrictions of the underlying road network and selection conditions representing user contexts. In clear contrast, we study an efficient clustering algorithm for Boolean + Clustering queries using a pre-materialized and summarized data structure. Our experimental results demonstrate the efficiency and effectiveness of our proposed method using real-life trajectory data.

A Computational Intelligence Based Online Data Imputation Method: An Application For Banking

  • Nishanth, Kancherla Jonah;Ravi, Vadlamani
    • Journal of Information Processing Systems
    • /
    • 제9권4호
    • /
    • pp.633-650
    • /
    • 2013
  • All the imputation techniques proposed so far in literature for data imputation are offline techniques as they require a number of iterations to learn the characteristics of data during training and they also consume a lot of computational time. Hence, these techniques are not suitable for applications that require the imputation to be performed on demand and near real-time. The paper proposes a computational intelligence based architecture for online data imputation and extended versions of an existing offline data imputation method as well. The proposed online imputation technique has 2 stages. In stage 1, Evolving Clustering Method (ECM) is used to replace the missing values with cluster centers, as part of the local learning strategy. Stage 2 refines the resultant approximate values using a General Regression Neural Network (GRNN) as part of the global approximation strategy. We also propose extended versions of an existing offline imputation technique. The offline imputation techniques employ K-Means or K-Medoids and Multi Layer Perceptron (MLP)or GRNN in Stage-1and Stage-2respectively. Several experiments were conducted on 8benchmark datasets and 4 bank related datasets to assess the effectiveness of the proposed online and offline imputation techniques. In terms of Mean Absolute Percentage Error (MAPE), the results indicate that the difference between the proposed best offline imputation method viz., K-Medoids+GRNN and the proposed online imputation method viz., ECM+GRNN is statistically insignificant at a 1% level of significance. Consequently, the proposed online technique, being less expensive and faster, can be employed for imputation instead of the existing and proposed offline imputation techniques. This is the significant outcome of the study. Furthermore, GRNN in stage-2 uniformly reduced MAPE values in both offline and online imputation methods on all datasets.

Incremental Fuzzy Clustering Based on a Fuzzy Scatter Matrix

  • Liu, Yongli;Wang, Hengda;Duan, Tianyi;Chen, Jingli;Chao, Hao
    • Journal of Information Processing Systems
    • /
    • 제15권2호
    • /
    • pp.359-373
    • /
    • 2019
  • For clustering large-scale data, which cannot be loaded into memory entirely, incremental clustering algorithms are very popular. Usually, these algorithms only concern the within-cluster compactness and ignore the between-cluster separation. In this paper, we propose two incremental fuzzy compactness and separation (FCS) clustering algorithms, Single-Pass FCS (SPFCS) and Online FCS (OFCS), based on a fuzzy scatter matrix. Firstly, we introduce two incremental clustering methods called single-pass and online fuzzy C-means algorithms. Then, we combine these two methods separately with the weighted fuzzy C-means algorithm, so that they can be applied to the FCS algorithm. Afterwards, we optimize the within-cluster matrix and betweencluster matrix simultaneously to obtain the minimum within-cluster distance and maximum between-cluster distance. Finally, large-scale datasets can be well clustered within limited memory. We implemented experiments on some artificial datasets and real datasets separately. And experimental results show that, compared with SPFCM and OFCM, our SPFCS and OFCS are more robust to the value of fuzzy index m and noise.

온라인 데이터 스트림에서의 동적 부분 공간 클러스터링 기법 (Dynamic Subspace Clustering for Online Data Streams)

  • 박남훈
    • 디지털융복합연구
    • /
    • 제20권2호
    • /
    • pp.217-223
    • /
    • 2022
  • 온라인 데이터 스트림에 대한 부분 공간 클러스터링은 데이터 공간 차원의 모든 부분 집합을 검사해야 하므로 많은 양의 메모리 자원을 필요로 한다. 유한한 메모리 공간에서 데이터 스트림에 대한 클러스터들의 지속적인 변화를 추적하기 위해 본 논문에서는 메모리 자원을 효과적으로 사용하는 격자기반 부분 공간 클러스터링 알고리즘을 제안한다. n차원 데이터 스트림이 주어지면 각 차원 데이터 공간에 있는 데이터 항목의 분포 정보를 격자셀 리스트에 의해 모니터링 된다. 첫번째 레벨의 격자셀 목록에서 데이터 항목의 빈도가 높아 단위 격자셀이 되면 해당 격자셀로부터 모든 가능한 부분 공간의 클러스터를 찾기 위해 다음 레벨의 격자셀 리스트를 자식 노드로 생성한다. 이와 같이 최대 다차원 n레벨의 격자셀 부분 공간 트리가 구성되고, k차원의 부분 공간 클러스터는 부분 공간 격자셀 트리의 k레벨에서 찾을 수 있다. 실험을 통해서 제안하는 방법이 기존 방법만큼 정확도를 유지하면서, 밀집 공간만 확장하여 컴퓨팅 자원을 보다 효율적으로 사용하는 것을 확인하였다.

강화학습의 Q-learning을 위한 함수근사 방법 (A Function Approximation Method for Q-learning of Reinforcement Learning)

  • 이영아;정태충
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제31권11호
    • /
    • pp.1431-1438
    • /
    • 2004
  • 강화학습(reinforcement learning)은 온라인으로 환경(environment)과 상호작용 하는 과정을 통하여 목표를 이루기 위한 전략을 학습한다. 강화학습의 기본적인 알고리즘인 Q-learning의 학습 속도를 가속하기 위해서, 거대한 상태공간 문제(curse of dimensionality)를 해결할 수 있고 강화학습의 특성에 적합한 함수 근사 방법이 필요하다. 본 논문에서는 이러한 문제점들을 개선하기 위해서, 온라인 퍼지 클러스터링(online fuzzy clustering)을 기반으로 한 Fuzzy Q-Map을 제안한다. Fuzzy Q-Map은 온라인 학습이 가능하고 환경의 불확실성을 표현할 수 있는 강화학습에 적합한 함수근사방법이다. Fuzzy Q-Map을 마운틴 카 문제에 적용하여 보았고, 학습 초기에 학습 속도가 가속됨을 보였다.

Online Burning Material Pile Detection on Color Clustering and Quaternion based Edge Detection in Boiler

  • Wang, Weixing;Liu, Sheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권1호
    • /
    • pp.190-207
    • /
    • 2015
  • In the combustion engineering, to decrease pollution and increase production efficiency, and to optimally keep solid burning material amount constant in a burner online, it needs a smart method to detect the amount variation of the burning materials in a high temperature environment. This paper presents an online machine vision system for automatically measuring and detecting the burning material amount inside a burner or a boiler. In the camera-protecting box of the system, a sub-system for cooling is constructed by using the cooling water circulation techqique. In addition, the key and intelligent step in the system is to detect the pile profile of the variable burning material, and the algorithm for the pile profile tracing was studied based on the combination of the gey level (color) discontinuity and similarity based image segmentation methods, the discontinuity based sub-algorithm is made on the quaternion convolution, and the similarity based sub-algorithm is designed according to the region growing with multi-scale clustering. The results of the two sub-algoritms are fused to delineate the final pile profile, and the algorithm has been tested and applied in different industrial burners and boilers. The experiements show that the proposed algorithm works satisfactorily.

온라인 상품 카테고리 내 주요 가격대 식별 (Identifying the Main Price Ranges of Online Product Category)

  • 김준우;임광혁
    • 한국콘텐츠학회논문지
    • /
    • 제12권12호
    • /
    • pp.733-741
    • /
    • 2012
  • 최근 많은 소비자들이 관심 있는 물품 카테고리에 대한 정보를 얻기 위한 목적으로 종합 쇼핑몰이나 가격 비교 사이트를 방문하고 있다. 하지만, 이러한 웹 사이트들은 종종 이들에게 많은 상품들과 판매자가 포함된 지나치게 방대한 정보를 제공하여 소비자들의 구매 결정을 효과적으로 지원하지 못한다. 따라서 현대 온라인 쇼핑 에이전트들은 검색된 정보를 사용자들에게 제공하기 전에 보다 지능적인 방법으로 이를 가공할 필요가 있다. 본 논문은 특정 물품 카테고리 내에서 많은 상품들이 분포하고 있는 주요 가격대를 식별하는 방법을 제안하고자 한다. 이를 위해 한 개 카테고리 내 상품의 가격들을 벡터로 표현하고, 여기에 k-means 군집 분석을 적용하여 서로 비슷한 가격 벡터들을 포함하는 군집을 형성한 다음, 각 군집에서 주요 가격대를 추출하는 방법을 적용하였다. 일반적으로 가격은 소비자들의 구매 결정에서 가장 중요한 요인 중 하나이기 때문에, 추출된 주요 가격대들은 온라인 쇼핑 이용자들이 효과적으로 상품을 검색하는데 도움이 될 것으로 기대된다.

Incidence of Online Public Opinion on Guangzhou Simultaneous Renting and Purchasing Policy - A data mining application

  • Wang, Yancheng;Li, Haixian
    • Asian Journal for Public Opinion Research
    • /
    • 제5권4호
    • /
    • pp.266-284
    • /
    • 2018
  • This paper adopts the big data research method, and draws 491 data from the Tianya Forum about the Simultaneous Renting and Purchasing policy of Guangzhou. The qualitative analysis software Nvivo11 is used to cluster the main questions about the Simultaneous Renting and Purchasing policy in the forum. The 36 high-frequency word frequencies are obtained through text clustering. Through rooted theory analysis, the main driving factors for summarizing people's doubts are 9 main categories, 3 core categories, and the model of driving factors for online forums is established. The study finds that resource factors are the most key factor, economic factors are the important drivers, and policy guiding factors are sub-important drivers.