• Title/Summary/Keyword: Time-based Clustering

Search Result 723, Processing Time 0.035 seconds

Classification of Seoul Metro Stations Based on Boarding/ Alighting Patterns Using Machine Learning Clustering (기계학습 클러스터링을 이용한 승하차 패턴에 따른 서울시 지하철역 분류)

  • Min, Meekyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.4
    • /
    • pp.13-18
    • /
    • 2018
  • In this study, we classify Seoul metro stations according to boarding and alighting patterns using machine earning technique. The target data is the number of boarding and alighting passengers per hour every day at 233 subway stations from 2008 to 2017 provided by the public data portal. Gaussian mixture model (GMM) and K-means clustering are used as machine learning techniques in order to classify subway stations. The distribution of the boarding time and the alighting time of the passengers can be modeled by the Gaussian mixture model. K-means clustering algorithm is used for unsupervised learning based on the data obtained by GMM modeling. As a result of the research, Seoul metro stations are classified into four groups according to boarding and alighting patterns. The results of this study can be utilized as a basic knowledge for analyzing the characteristics of Seoul subway stations and analyzing it economically, socially and culturally. The method of this research can be applied to public data and big data in areas requiring clustering.

Automatic Photograph Classification Using Geographical Information (지리정보를 이용한 자동사진분류)

  • Hong, Young-Jin;Kim, Seong-Woon;Yoo, Myung-Hyun;Lee, Yong-Beom;Kim, Sang-Ryong
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02a
    • /
    • pp.692-698
    • /
    • 2006
  • 점점 더 많은 디지털 카메라와 휴대폰이 고해상도 카메라가 장착되고 대용량의 저장공간이 제공되면서 사용자들의 사진촬영 빈도가 증대하고 있다. 조만간 휴대폰의 저장된 사진을 효과적으로 관리하고 브라우징할 수 있는 기술이 필요한 시기가 올 것이다. 본 논문은 휴대폰이나 디지털 카메라 혹은 카메라가 장착되어 사진을 찍을 수 있는 모든 형태의 휴대단말에서 촬영된 개인사진을 지리적 위치정보를 이용하여 자동으로 분류하는 시스템을 제시한다. 기존의 시간정보를 이용하여 촬영시간의 근접성을 이용해 순차적으로 자동 분류하는 시스템과는 달리 위치정보를 이용하여 촬영위치에 따라 비순차적으로 자동 분류한다. 촬영위치 근접성을 결정하기 위해 밀도기반 클러스터링 알고리즘을 사용하여 전체 사진을 대분류하고 기존의 자동사진 분류방식에서는 다루지 않았던 일상사진과 비일상사진을 분류하고, 대분류된 사진을 시간정보를 이용하여 소분류 함으로서 자동 사진분류 성능을 높이고자 한다.

  • PDF

A Hybrid Genetic Algorithm for K-Means Clustering

  • Jun, Sung-Hae;Han, Jin-Woo;Park, Minjae;Oh, Kyung-Whan
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.330-333
    • /
    • 2003
  • Initial cluster size for clustering of partitioning methods is very important to the clustering result. In K-means algorithm, the result of cluster analysis becomes different with optimal cluster size K. Usually, the initial cluster size is determined by prior and subjective information. Sometimes this may not be optimal. Now, more objective method is needed to solve this problem. In our research, we propose a hybrid genetic algorithm, a tree induction based evolution algorithm, for determination of optimal cluster size. Initial population of this algorithm is determined by the number of terminal nodes of tree induction. From the initial population based on decision tree, our optimal cluster size is generated. The fitness function of ours is defined an inverse of dissimilarity measure. And the bagging approach is used for saying computational time cost.

  • PDF

Context-Aware Mobile Gateway Relocation Scheme for Clustered Wireless Sensor Networks

  • Encarnacion, Nico N.;Yang, Hyunho
    • Journal of information and communication convergence engineering
    • /
    • v.10 no.4
    • /
    • pp.365-371
    • /
    • 2012
  • In recent years, researchers have been attracted to clustering methods to improve communication and data transmission in a network. Compared with traditional wireless networks, wireless sensor networks are energy constrained and have lower data rates. The concept of implementing a clustering algorithm in an existing project on gateway relocation is being explored here. Low energy adaptive clustering hierarchy (LEACH) is applied to an existing study on relocating a gateway. The study is further improved by moving the gateway to a specific cluster based on the number or significance of the events detected. The protocol is improved so that each cluster head can communicate with a mobile gateway. The cluster heads are the only nodes that can communicate with the mobile gateway when it (the mobile gateway) is out of the cluster nodes' transmission range. Once the gateway is in range, the nodes will begin their transmission of real-time data. This alleviates the load of the nodes that would be located closest to the gateway if it were static.

A Performance Comparison of Cluster Validity Indices based on K-means Algorithm (K-means 알고리즘 기반 클러스터링 인덱스 비교 연구)

  • Shim, Yo-Sung;Chung, Ji-Won;Choi, In-Chan
    • Asia pacific journal of information systems
    • /
    • v.16 no.1
    • /
    • pp.127-144
    • /
    • 2006
  • The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.

An Optimization Approach to Data Clustering

  • Kim, Ju-Mi;Olafsson, Sigurdur
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2005.05a
    • /
    • pp.621-628
    • /
    • 2005
  • Scalability of clustering algorithms is critical issues facing the data mining community. This is particularly true for computationally intense tasks such as data clustering. Random sampling of instances is one possible means of achieving scalability but a pervasive problem with this approach is how to deal with the noise that this introduces in the evaluation of the learning algorithm. This paper develops a new optimization based clustering approach using an algorithms specifically designed for noisy performance. Numerical results illustrate that with this algorithm substantial benefits can be achieved in terms of computational time without sacrificing solution quality.

  • PDF

Real-Time Traffic Sign Detection Using K-means Clustering and Neural Network (K-means Clustering 기법과 신경망을 이용한 실시간 교통 표지판의 위치 인식)

  • Park, Jung-Guk;Kim, Kyung-Joong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06a
    • /
    • pp.491-493
    • /
    • 2011
  • Traffic sign detection is the domain of automatic driver assistant systems. There are literatures for traffic sign detection using color information, however, color-based method contains ill-posed condition and to extract the region of interest is difficult. In our work, we propose a method for traffic sign detection using k-means clustering method, back-propagation neural network, and projection histogram features that yields the robustness for ill-posed condition. Using the color information of traffic signs enables k-means algorithm to cluster the region of interest for the detection efficiently. In each step of clustering, a cluster is verified by the neural network so that the cluster exactly represents the location of a traffic sign. Proposed method is practical, and yields robustness for the unexpected region of interest or for multiple detections.

Industrial load forecasting using the fuzzy clustering and wavelet transform analysis

  • Yu, In-Keun
    • Journal of IKEEE
    • /
    • v.4 no.2 s.7
    • /
    • pp.233-240
    • /
    • 2000
  • This paper presents fuzzy clustering and wavelet transform analysis based technique for the industrial hourly load forecasting fur the purpose of peak demand control. Firstly, one year of historical load data were sorted and clustered into several groups using fuzzy clustering and then wavelet transform is adopted using the Biorthogonal mother wavelet in order to forecast the peak load of one hour ahead. The 5-level decomposition of the daily industrial load curve is implemented to consider the weather sensitive component of loads effectively. The wavelet coefficients associated with certain frequency and time localization is adjusted using the conventional multiple regression method and the components are reconstructed to predict the final loads through a five-scale synthesis technique. The outcome of the study clearly indicates that the proposed composite model of fuzzy clustering and wavelet transform approach can be used as an attractive and effective means for the industrial hourly peak load forecasting.

  • PDF

EETCA: Energy Efficient Trustworthy Clustering Algorithm for WSN

  • Senthil, T.;Kannapiran, Dr.B.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.11
    • /
    • pp.5437-5454
    • /
    • 2016
  • A Wireless Sensor Network (WSN) is composed of several sensor nodes which are severely restricted to energy and memory. Energy is the lifeblood of sensors and thus energy conservation is a critical necessity of WSN. This paper proposes a clustering algorithm namely Energy Efficient Trustworthy Clustering algorithm (EETCA), which focuses on three phases such as chief node election, chief node recycling process and bi-level trust computation. The chief node election is achieved by Dempster-Shafer theory based on trust. In the second phase, the selected chief node is recycled with respect to the current available energy. The final phase is concerned with the computation of bi-level trust, which is triggered for every time interval. This is to check the trustworthiness of the participating nodes. The nodes below the fixed trust threshold are blocked, so as to ensure trustworthiness. The system consumes lesser energy, as all the nodes behave normally and unwanted energy consumption is completely weeded out. The experimental results of EETCA are satisfactory in terms of reduced energy consumption and prolonged lifetime of the network.

Automatic Categorization of Real World FAQs Using Hierarchical Document Clustering (계층적 문서 클러스터링을 이용한 실세계 질의 메일의 자동 분류)

  • 류중원;조성배
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.05a
    • /
    • pp.187-190
    • /
    • 2001
  • Due to the recent proliferation of the internet, it is broadly granted that the necessity of the automatic document categorization has been on the rise. Since it is a heavy time-consuming work and takes too much manpower to process and classify manually, we need a system that categorizes them automatically as their contents. In this paper, we propose the automatic E-mail response system that is based on 2 hierarchical document clustering methods. One is to get the final result from the classifier trained seperatly within each class, after clustering the whole documents into 3 groups so that the first classifier categorize the input documents as the corresponding group. The other method is that the system classifies the most distinct classes first as their similarity, successively. Neural networks have been adopted as classifiers, we have used dendrograms to show the hierarchical aspect of similarities between classes. The comparison among the performances of hierarchical and non-hierarchical classifiers tells us clustering methods have provided the classification efficiency.

  • PDF