• Title/Summary/Keyword: K-Means clustering algorithm

Search Result 548, Processing Time 0.029 seconds

Hierarchical and Incremental Clustering for Semi Real-time Issue Analysis on News Articles (준 실시간 뉴스 이슈 분석을 위한 계층적·점증적 군집화)

  • Kim, Hoyong;Lee, SeungWoo;Jang, Hong-Jun;Seo, DongMin
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.6
    • /
    • pp.556-578
    • /
    • 2020
  • There are many different researches about how to analyze issues based on real-time news streams. But, there are few researches which analyze issues hierarchically from news articles and even a previous research of hierarchical issue analysis make clustering speed slower as the increment of news articles. In this paper, we propose a hierarchical and incremental clustering for semi real-time issue analysis on news articles. We trained siamese neural network based weighted cosine similarity model, applied this model to k-means algorithm which is used to make word clusters and converted news articles to document vectors by using these word clusters. Finally, we initialized an issue cluster tree from document vectors, updated this tree whenever news articles happen, and analyzed issues in semi real-time. Through the experiment and evaluation, we showed that up to about 0.26 performance has been improved in terms of NMI. Also, in terms of speed of incremental clustering, we also showed about 10 times faster than before.

Blockchain-Enabled Decentralized Clustering for Enhanced Decision Support in the Coffee Supply Chain

  • Keo Ratanak;Muhammad Firdaus;Kyung-Hyune Rhee
    • Annual Conference of KIPS
    • /
    • 2023.11a
    • /
    • pp.260-263
    • /
    • 2023
  • Considering the growth of blockchain technology, the research aims to transform the efficiency of recommending optimal coffee suppliers within the complex supply chain network. This transformation relies on the extraction of vital transactional data and insights from stakeholders, facilitated by the dynamic interaction between the application interface (e.g., Rest API) and the blockchain network. These extracted data are then subjected to advanced data processing techniques and harnessed through machine learning methodologies to establish a robust recommendation system. This innovative approach seeks to empower users with informed decision-making abilities, thereby enhancing operational efficiency in identifying the most suitable coffee supplier for each customer. Furthermore, the research employs data visualization techniques to illustrate intricate clustering patterns generated by the K-Means algorithm, providing a visual dimension to the study's evaluation.

L1-penalized AUC-optimization with a surrogate loss

  • Hyungwoo Kim;Seung Jun Shin
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.203-212
    • /
    • 2024
  • The area under the ROC curve (AUC) is one of the most common criteria used to measure the overall performance of binary classifiers for a wide range of machine learning problems. In this article, we propose a L1-penalized AUC-optimization classifier that directly maximizes the AUC for high-dimensional data. Toward this, we employ the AUC-consistent surrogate loss function and combine the L1-norm penalty which enables us to estimate coefficients and select informative variables simultaneously. In addition, we develop an efficient optimization algorithm by adopting k-means clustering and proximal gradient descent which enjoys computational advantages to obtain solutions for the proposed method. Numerical simulation studies demonstrate that the proposed method shows promising performance in terms of prediction accuracy, variable selectivity, and computational costs.

A New Fuzzy Clustering Algorithm (새로운 퍼지 군집화 알고리즘)

  • Kim, Jae-Young;Park, Dong-Chul;Han, Ji-Ho;Thuy, Huynh Thi Thanh;Song, Young-Soo
    • Proceedings of the KIEE Conference
    • /
    • 2009.07a
    • /
    • pp.1905_1906
    • /
    • 2009
  • 본 논문은 데이터의 군집화를 효율적으로 수행하기 위하여 새로운 군집화 알고리즘을 제안한다. 제안되는 군집화 알고리즘은 Fuzzy C-Means (FCM)에 기반을 두는데, FCM 알고리즘은 모든 데이터에 대한 거리에 기본을 둔 멤버쉽을 기초로 하기 때문에 잡음에 약한 제약을 지니고 있었다. 이를 개선하기 위하여, 제안되었던 PCM(Probabilistic C-Means), FPCM(Fuzzy PCM), PFCM(Probabilistic FCM) 등 여러가지 알고리즘이 제안 되었다. 그러나 이들 알고리즘들은 초기 파라미터값 설정과 과다한 계산양에 따른 문제가 증가하였으며, 또한 잡음에 어느 정도 민감한 문제점을 지니고 있었다. 이 논문에서는 잡음에 대해 효과적으로 대응할 수 있는 새로운 군집화 알고리즘을 제안하고, 전통적인 군집화를 위한 Iris 데이터에 대한 실험을 통하여 효용성을 확인하였다.

  • PDF

Effective Image Segmentation using a Locally Weighted Fuzzy C-Means Clustering (지역 가중치 적용 퍼지 클러스터링을 이용한 효과적인 이미지 분할)

  • Alamgir, Nyma;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.12
    • /
    • pp.83-93
    • /
    • 2012
  • This paper proposes an image segmentation framework that modifies the objective function of Fuzzy C-Means (FCM) to improve the performance and computational efficiency of the conventional FCM-based image segmentation. The proposed image segmentation framework includes a locally weighted fuzzy c-means (LWFCM) algorithm that takes into account the influence of neighboring pixels on the center pixel by assigning weights to the neighbors. Distance between a center pixel and a neighboring pixels are calculated within a window and these are basis for determining weights to indicate the importance of the memberships as well as to improve the clustering performance. We analyzed the segmentation performance of the proposed method by utilizing four eminent cluster validity functions such as partition coefficient ($V_{pc}$), partition entropy ($V_{pe}$), Xie-Bdni function ($V_{xb}$) and Fukuyama-Sugeno function ($V_{fs}$). Experimental results show that the proposed LWFCM outperforms other FCM algorithms (FCM, modified FCM, and spatial FCM, FCM with locally weighted information, fast generation FCM) in the cluster validity functions as well as both compactness and separation.

Multiobjective Space Search Optimization and Information Granulation in the Design of Fuzzy Radial Basis Function Neural Networks

  • Huang, Wei;Oh, Sung-Kwun;Zhang, Honghao
    • Journal of Electrical Engineering and Technology
    • /
    • v.7 no.4
    • /
    • pp.636-645
    • /
    • 2012
  • This study introduces an information granular-based fuzzy radial basis function neural networks (FRBFNN) based on multiobjective optimization and weighted least square (WLS). An improved multiobjective space search algorithm (IMSSA) is proposed to optimize the FRBFNN. In the design of FRBFNN, the premise part of the rules is constructed with the aid of Fuzzy C-Means (FCM) clustering while the consequent part of the fuzzy rules is developed by using four types of polynomials, namely constant, linear, quadratic, and modified quadratic. Information granulation realized with C-Means clustering helps determine the initial values of the apex parameters of the membership function of the fuzzy neural network. To enhance the flexibility of neural network, we use the WLS learning to estimate the coefficients of the polynomials. In comparison with ordinary least square commonly used in the design of fuzzy radial basis function neural networks, WLS could come with a different type of the local model in each rule when dealing with the FRBFNN. Since the performance of the FRBFNN model is directly affected by some parameters such as e.g., the fuzzification coefficient used in the FCM, the number of rules and the orders of the polynomials present in the consequent parts of the rules, we carry out both structural as well as parametric optimization of the network. The proposed IMSSA that aims at the simultaneous minimization of complexity and the maximization of accuracy is exploited here to optimize the parameters of the model. Experimental results illustrate that the proposed neural network leads to better performance in comparison with some existing neurofuzzy models encountered in the literature.

Performance Improvement of Radial Basis Function Neural Networks Using Adaptive Feature Extraction (적응적 특징추출을 이용한 Radial Basis Function 신경망의 성능개선)

  • 조용현
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.3
    • /
    • pp.253-262
    • /
    • 2000
  • This paper proposes a new RBF neural network that determines the number and the center of hidden neurons based on the adaptive feature extraction for the input data. The principal component analysis is applied for extracting adaptively the features by reducing the dimension of the given input data. It can simultaneously achieve a superior property of both the principal component analysis by mapping input data into set of statistically independent features and the RBF neural networks. The proposed neural networks has been applied to classify the 200 breast cancer databases by 2-class. The simulation results shows that the proposed neural networks has better performances of the learning time and the classification for test data, in comparison with those using the k-means clustering algorithm. And it is affected less than the k-means clustering algorithm by the initial weight setting and the scope of the smoothing factor.

  • PDF

Backlit Region Detection Using Adaptively Partitioned Block and Fuzzy C-means Clustering for Backlit Image Enhancement (역광 영상 개선을 위한 퍼지 C-평균 분류기와 적응적 블록 분할을 사용한 역광 영역 검출)

  • Kim, Nahyun;Lee, Seungwon;Paik, Joonki
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.2
    • /
    • pp.124-132
    • /
    • 2014
  • In this paper, we present a novel backlit region detection and contrast enhancement method using fuzzy C-means clustering and adaptively partitioned block based contrast stretching. The proposed method separates an image into both dark backlit and bright background regions using adaptively partitioned blocks based on the optimal threshold value computed by fuzzy logic. The detected block-wise backlit region is refined using the guided filter for removing block artifacts. Contrast stretching algorithm is then applied to adaptively enhance the detected backlit region. Experimental results show that the proposed method can successfully detect the backlit region without a complicated segmentation algorithm and enhance the object information in the backlit region.

Multiple Classifier Fusion Method based on k-Nearest Templates (k-최근접 템플릿기반 다중 분류기 결합방법)

  • Min, Jun-Ki;Cho, Sung-Bae
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.4
    • /
    • pp.451-455
    • /
    • 2008
  • In this paper, the k-nearest templates method is proposed to combine multiple classifiers effectively. First, the method decomposes training samples of each class into several subclasses based on the outputs of classifiers to represent a class as multiple models, and estimates a localized template by averaging the outputs for each subclass. The distances between a test sample and templates are then calculated. Lastly, the test sample is assigned to the class that is most frequently represented among the k most similar templates. In this paper, C-means clustering algorithm is used as the decomposition method, and k is automatically chosen according to the intra-class compactness and inter-class separation of a given data set. Since the proposed method uses multiple models per class and refers to k models rather than matches with the most similar one, it could obtain stable and high accuracy. In this paper, experiments on UCI and ELENA database showed that the proposed method performed better than conventional fusion methods.

Structural damage identification with output-only measurements using modified Jaya algorithm and Tikhonov regularization method

  • Guangcai Zhang;Chunfeng Wan;Liyu Xie;Songtao Xue
    • Smart Structures and Systems
    • /
    • v.31 no.3
    • /
    • pp.229-245
    • /
    • 2023
  • The absence of excitation measurements may pose a big challenge in the application of structural damage identification owing to the fact that substantial effort is needed to reconstruct or identify unknown input force. To address this issue, in this paper, an iterative strategy, a synergy of Tikhonov regularization method for force identification and modified Jaya algorithm (M-Jaya) for stiffness parameter identification, is developed for damage identification with partial output-only responses. On the one hand, the probabilistic clustering learning technique and nonlinear updating equation are introduced to improve the performance of standard Jaya algorithm. On the other hand, to deal with the difficulty of selection the appropriate regularization parameters in traditional Tikhonov regularization, an improved L-curve method based on B-spline interpolation function is presented. The applicability and effectiveness of the iterative strategy for simultaneous identification of structural damages and unknown input excitation is validated by numerical simulation on a 21-bar truss structure subjected to ambient excitation under noise free and contaminated measurements cases, as well as a series of experimental tests on a five-floor steel frame structure excited by sinusoidal force. The results from these numerical and experimental studies demonstrate that the proposed identification strategy can accurately and effectively identify damage locations and extents without the requirement of force measurements. The proposed M-Jaya algorithm provides more satisfactory performance than genetic algorithm, Gaussian bare-bones artificial bee colony and Jaya algorithm.