• Title/Summary/Keyword: clustering techniques

Search Result 520, Processing Time 0.039 seconds

A Simple Tandem Method for Clustering of Multimodal Dataset

  • Cho C.;Lee J.W.;Lee J.W.
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2003.05a
    • /
    • pp.729-733
    • /
    • 2003
  • The presence of local features within clusters incurred by multi-modal nature of data prohibits many conventional clustering techniques from working properly. Especially, the clustering of datasets with non-Gaussian distributions within a cluster can be problematic when the technique with implicit assumption of Gaussian distribution is used. Current study proposes a simple tandem clustering method composed of k-means type algorithm and hierarchical method to solve such problems. The multi-modal dataset is first divided into many small pre-clusters by k-means or fuzzy k-means algorithm. The pre-clusters found from the first step are to be clustered again using agglomerative hierarchical clustering method with Kullback- Leibler divergence as the measure of dissimilarity. This method is not only effective at extracting the multi-modal clusters but also fast and easy in terms of computation complexity and relatively robust at the presence of outliers. The performance of the proposed method was evaluated on three generated datasets and six sets of publicly known real world data.

  • PDF

A Study on Process Data Compression Method by Clustering Method (클러스터링 기법을 이용한 공정 데이터의 압축 저장 기법에 관한 연구)

  • Kim Yoonsik;Mo Kyung Joo;Yoon En Sup
    • Journal of the Korean Institute of Gas
    • /
    • v.4 no.4 s.12
    • /
    • pp.58-64
    • /
    • 2000
  • Data compression and retrieval method are investigated for the effective utilization of measured process data. In this paper, a new data compression method, Clustering Compression(CC), which is based on the k-means clustering algorithm and piecewise linear approximation method is suggested. Case studies on industrial data set showed the superior performance of clustering based techniques compared to other conventional methods and showed that CC could handle the compression of multi-dimensional data.

  • PDF

Multi-scale Cluster Hierarchy for Non-stationary Functional Signals of Mutual Fund Returns (Mutual Fund 수익률의 비정상 함수형 시그널을 위한 다해상도 클러스터 계층구조)

  • Kim, Dae-Lyong;Jung, Uk
    • Korean Management Science Review
    • /
    • v.24 no.2
    • /
    • pp.57-72
    • /
    • 2007
  • Many Applications of scientific research have coupled with functional data signal clustering techniques to discover novel characteristics that can be used for the diagnoses of several issues. In this article we present an interpretable multi-scale cluster hierarchy framework for clustering functional data using its multi-aspect frequency information. The suggested method focuses on how to effectively select transformed features/variables in unsupervised manner so that finally reduce the data dimension and achieve the multi-purposed clustering. Specially, we apply our suggested method to mutual fund returns and make superior-performing funds group based on different aspects such as global patterns, seasonal variations, levels of noise, and their combinations. To promise our method producing a quality cluster hierarchy, we give some empirical results under the simulation study and a set of real life data. This research will contribute to financial market analysis and flexibly fit to other research fields with clustering purposes.

Performance Comparison of Clustering Techniques for Spatio-Temporal Data (시공간 데이터를 위한 클러스터링 기법 성능 비교)

  • Kang Nayoung;Kang Juyoung;Yong Hwan-Seung
    • Journal of Intelligence and Information Systems
    • /
    • v.10 no.2
    • /
    • pp.15-37
    • /
    • 2004
  • With the growth in the size of datasets, data mining has recently become an important research topic. Especially, interests about spatio-temporal data mining has been increased which is a method for analyzing massive spatio-temporal data collected from a wide variety of applications like GPS data, trajectory data of surveillance system and earth geographic data. In the former approaches, conventional clustering algorithms are applied as spatio-temporal data mining techniques without any modification. In this paper, we focused to SOM that is the most common clustering algorithm applied to clustering analysis in data mining wet and develop the spatio-temporal data mining module based on it. In addition, we analyzed the clustering results of developed SOM module and compare them with those of K-means and Agglomerative Hierarchical algorithm in the aspects of homogeneity, separation, separation, silhouette width and accuracy. We also developed specialized visualization module fur more accurate interpretation of mining result.

  • PDF

Probability-based Deep Learning Clustering Model for the Collection of IoT Information (IoT 정보 수집을 위한 확률 기반의 딥러닝 클러스터링 모델)

  • Jeong, Yoon-Su
    • Journal of Digital Convergence
    • /
    • v.18 no.3
    • /
    • pp.189-194
    • /
    • 2020
  • Recently, various clustering techniques have been studied to efficiently handle data generated by heterogeneous IoT devices. However, existing clustering techniques are not suitable for mobile IoT devices because they focus on statically dividing networks. This paper proposes a probabilistic deep learning-based dynamic clustering model for collecting and analyzing information on IoT devices using edge networks. The proposed model establishes a subnet by applying the frequency of the attribute values collected probabilistically to deep learning. The established subnets are used to group information extracted from seeds into hierarchical structures and improve the speed and accuracy of dynamic clustering for IoT devices. The performance evaluation results showed that the proposed model had an average 13.8 percent improvement in data processing time compared to the existing model, and the server's overhead was 10.5 percent lower on average than the existing model. The accuracy of extracting IoT information from servers has improved by 8.7% on average from previous models.

Similarity Measurement with Interestingness Weight for Improving the Accuracy of Web Transaction Clustering (웹 트랜잭션 클러스터링의 정확성을 높이기 위한 흥미가중치 적용 유사도 비교방법)

  • Kang, Tae-Ho;Min, Young-Soo;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.11D no.3
    • /
    • pp.717-730
    • /
    • 2004
  • Recently. many researches on the personalization of a web-site have been actively made. The web personalization predicts the sets of the most interesting URLs for each user through data mining approaches such as clustering techniques. Most existing methods using clustering techniques represented the web transactions as bit vectors that represent whether users visit a certain WRL or not to cluster web transactions. The similarity of the web transactions was decided according to the match degree of bit vectors. However, since the existing methods consider only whether users visit a certain URL or not, users' interestingness on the URL is excluded from clustering web transactions. That is, it is possible that the web transactions with different visit proposes or inclinations are classified into the same group. In this paper. we propose an enhanced transaction modeling with interestingness weight to solve such problems and a new similarity measuring method that exploits the proposed transaction modeling. It is shown through performance evaluation that our similarity measuring method improves the accuracy of the web transaction clustering over the existing method.

Comparison of Clustering Techniques in Flight Approach Phase using ADS-B Track Data (공항 근처 ADS-B 항적 자료에서의 클러스터링 기법 비교)

  • Jong-Chan Park;Heon Jin Park
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.29-38
    • /
    • 2021
  • Deviation of route in aviation safety management is a dangerous factor that can lead to serious accidents. In this study, the anomaly score is calculated by classifying the tracks through clustering and calculating the distance from the cluster center. The study was conducted by extracting tracks within 100 km of the airport from the ADS-B track data received for one year. The wake was vectorized using linear interpolation. Latitude, longitude, and altitude 3D coordinates were used. Through PCA, the dimension was reduced to an axis representing more than 90% of the overall data distribution, and k-means clustering, hierarchical clustering, and PAM techniques were applied. The number of clusters was selected using the silhouette measure, and an abnormality score was calculated by calculating the distance from the cluster center. In this study, we compare the number of clusters for each cluster technique, and evaluate the clustering result through the silhouette measure.

A Comparative Study on Statistical Clustering Methods and Kohonen Self-Organizing Maps for Highway Characteristic Classification of National Highway (일반국도 도로특성분류를 위한 통계적 군집분석과 Kohonen Self-Organizing Maps의 비교연구)

  • Cho, Jun Han;Kim, Seong Ho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.29 no.3D
    • /
    • pp.347-356
    • /
    • 2009
  • This paper is described clustering analysis of traffic characteristics-based highway classification in order to deviate from methodologies of existing highway functional classification. This research focuses on comparing the clustering techniques performance based on the total within-group errors and deriving the optimal number of cluster. This research analyzed statistical clustering method (Hierarchical Ward's minimum-variance method, Nonhierarchical K-means method) and Kohonen self-organizing maps clustering method for highway characteristic classification. The outcomes of cluster techniques compared for the number of samples and traffic characteristics from subsets derived by the optimal number of cluster. As a comprehensive result, the k-means method is superior result to other methods less than 12. For a cluster of more than 20, Kohonen self-organizing maps is the best result in the cluster method. The main contribution of this research is expected to use important the basic road attribution information that produced the highway characteristic classification.

Implementation of simple statistical pattern recognition methods for harmful gases classification using gas sensor array fabricated by MEMS technology (MEMS 기술로 제작된 가스 센서 어레이를 이용한 유해가스 분류를 위한 간단한 통계적 패턴인식방법의 구현)

  • Byun, Hyung-Gi;Shin, Jeong-Suk;Lee, Ho-Jun;Lee, Won-Bae
    • Journal of Sensor Science and Technology
    • /
    • v.17 no.6
    • /
    • pp.406-413
    • /
    • 2008
  • We have been implemented simple statistical pattern recognition methods for harmful gases classification using gas sensors array fabricated by MEMS (Micro Electro Mechanical System) technology. The performance of pattern recognition method as a gas classifier is highly dependent on the choice of pre-processing techniques for sensor and sensors array signals and optimal classification algorithms among the various classification techniques. We carried out pre-processing for each sensor's signal as well as sensors array signals to extract features for each gas. We adapted simple statistical pattern recognition algorithms, which were PCA (Principal Component Analysis) for visualization of patterns clustering and MLR (Multi-Linear Regression) for real-time system implementation, to classify harmful gases. Experimental results of adapted pattern recognition methods with pre-processing techniques have been shown good clustering performance and expected easy implementation for real-time sensing system.

Black-Litterman Portfolio with K-shape Clustering (K-shape 군집화 기반 블랙-리터만 포트폴리오 구성)

  • Yeji Kim;Poongjin Cho
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.4
    • /
    • pp.63-73
    • /
    • 2023
  • This study explores modern portfolio theory by integrating the Black-Litterman portfolio with time-series clustering, specificially emphasizing K-shape clustering methodology. K-shape clustering enables grouping time-series data effectively, enhancing the ability to plan and manage investments in stock markets when combined with the Black-Litterman portfolio. Based on the patterns of stock markets, the objective is to understand the relationship between past market data and planning future investment strategies through backtesting. Additionally, by examining diverse learning and investment periods, it is identified optimal strategies to boost portfolio returns while efficiently managing associated risks. For comparative analysis, traditional Markowitz portfolio is also assessed in conjunction with clustering techniques utilizing K-Means and K-Means with Dynamic Time Warping. It is suggested that the combination of K-shape and the Black-Litterman model significantly enhances portfolio optimization in the stock market, providing valuable insights for making stable portfolio investment decisions. The achieved sharpe ratio of 0.722 indicates a significantly higher performance when compared to other benchmarks, underlining the effectiveness of the K-shape and Black-Litterman integration in portfolio optimization.