• 제목/요약/키워드: and K-means algorithm

검색결과 1,325건 처리시간 0.032초

Semantic-Based K-Means Clustering for Microblogs Exploiting Folksonomy

  • Heu, Jee-Uk
    • Journal of Information Processing Systems
    • /
    • 제14권6호
    • /
    • pp.1438-1444
    • /
    • 2018
  • Recently, with the development of Internet technologies and propagation of smart devices, use of microblogs such as Facebook, Twitter, and Instagram has been rapidly increasing. Many users check for new information on microblogs because the content on their timelines is continually updating. Therefore, clustering algorithms are necessary to arrange the content of microblogs by grouping them for a user who wants to get the newest information. However, microblogs have word limits, and it has there is not enough information to analyze for content clustering. In this paper, we propose a semantic-based K-means clustering algorithm that not only measures the similarity between the data represented as a vector space model, but also measures the semantic similarity between the data by exploiting the TagCluster for clustering. Through the experimental results on the RepLab2013 Twitter dataset, we show the effectiveness of the semantic-based K-means clustering algorithm.

Wavelet을 이용한 K-means clustering algorithm의 초기화

  • 김국환;장우진;이준석
    • 한국경영과학회:학술대회논문집
    • /
    • 대한산업공학회/한국경영과학회 2006년도 춘계공동학술대회 논문집
    • /
    • pp.305-312
    • /
    • 2006
  • K-means clustering algorithm 에서 주로 이루어지는 랜덤 초기화 (random initialization) 방법은 전역 최적화된 해(global minimum)를 찾아내기에 문제점을 지니고 있다. 즉, 여러 횟수의 알고리듬 반복(iteration)을 실행하더라도 전역 최적화된 해를 찾아내기가 매우 힘들며 주어진 자료의 크기(data size)가 큰 경우에 있어서 이는 거의 불가능하다. 본 논문은 이러한 문제점들을 극복하기 위한 방안으로, wavelet을 이용하여 최적의 초기 군집 중심점(initial clustering center)들을 선택하는 방법을 제시한다. 즉, 웨이블릿을 이용한 효과적인 초기화 (initialization)를 통해서 작은 알고리듬 반복 횟수만으로도 전역 최적화에 도달하는 초기화 방법을 기술한다. 이런 초기화 방법이 군집 알고리즘에 사용될 경우, 온라인상에서 실시간 이루어지는 군집 분석에 큰 도움이 된 수 있다.

  • PDF

Assessment of Premature Ventricular Contraction Arrhythmia by K-means Clustering Algorithm

  • Kim, Kyeong-Seop
    • 한국컴퓨터정보학회논문지
    • /
    • 제22권5호
    • /
    • pp.65-72
    • /
    • 2017
  • Premature Ventricular Contraction(PVC) arrhythmia is most common abnormal-heart rhythm that may increase mortal risk of a cardiac patient. Thus, it is very important issue to identify the specular portraits of PVC pattern especially from the patient. In this paper, we propose a new method to extract the characteristics of PVC pattern by applying K-means machine learning algorithm on Heart Rate Variability depicted in Poinecare plot. For the quantitative analysis to distinguish the trend of cluster patterns between normal sinus rhythm and PVC beat, the Euclidean distance measure was sought between the clusters. Experimental simulations on MIT-BIH arrhythmia database draw the fact that the distance measure on the cluster is valid for differentiating the pattern-traits of PVC beats. Therefore, we proposed a method that can offer the simple remedy to identify the attributes of PVC beats in terms of K-means clusters especially in the long-period Electrocardiogram(ECG).

제2종 퍼지 집합을 이용한 퍼지 C-means (A Type 2 Fuzzy C-means)

  • 황철;이정훈
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2001년도 춘계학술대회 학술발표 논문집
    • /
    • pp.16-19
    • /
    • 2001
  • This paper presents a type-2 fuzzy C-means (FCM) algorithm that is an extension of the conventional fuzzy C-means algorithm. In our proposed method, the membership values for each pattern are extended as type-2 fuzzy memberships by assigning membership grades to the type-1 memberships. In doing so, cluster centers that are estimated by type-2 memberships may converge to a more desirable location than cluster centers obtained by a type-1 FCM method in the presence of noise.

  • PDF

Blind linear/nonlinear equalization for heavy noise-corrupted channels

  • Han, Soo- Whan;Park, Sung-Dae
    • Journal of information and communication convergence engineering
    • /
    • 제7권3호
    • /
    • pp.383-391
    • /
    • 2009
  • In this paper, blind equalization using a modified Fuzzy C-Means algorithm with Gaussian Weights (MFCM_GW) is attempted to the heavy noise-corrupted channels. The proposed algorithm can deal with both of linear and nonlinear channels, because it searches for the optimal channel output states of a channel instead of estimating the channel parameters in a direct manner. In contrast to the common Euclidean distance in Fuzzy C-Means (FCM), the use of the Bayesian likelihood fitness function and the Gaussian weighted partition matrix is exploited in its search procedure. The selected channel states by MFCM_GW are always close to the optimal set of a channel even the additive white Gaussian noise (AWGN) is heavily corrupted in it. Simulation studies demonstrate that the performance of the proposed method is relatively superior to existing genetic algorithm (GA) and conventional FCM based methods in terms of accuracy and speed.

K-means based Clustering Method with a Fixed Number of Cluster Members

  • Yi, Faliu;Moon, Inkyu
    • 한국멀티미디어학회논문지
    • /
    • 제17권10호
    • /
    • pp.1160-1170
    • /
    • 2014
  • Clustering methods are very useful in many fields such as data mining, classification, and object recognition. Both the supervised and unsupervised grouping approaches can classify a series of sample data with a predefined or automatically assigned cluster number. However, there is no constraint on the number of elements for each cluster. Numbers of cluster members for each cluster obtained from clustering schemes are usually random. Thus, some clusters possess a large number of elements whereas others only have a few members. In some areas such as logistics management, a fixed number of members are preferred for each cluster or logistic center. Consequently, it is necessary to design a clustering method that can automatically adjust the number of group elements. In this paper, a k-means based clustering method with a fixed number of cluster members is proposed. In the proposed method, first, the data samples are clustered using the k-means algorithm. Then, the number of group elements is adjusted by employing a greedy strategy. Experimental results demonstrate that the proposed clustering scheme can classify data samples efficiently for a fixed number of cluster members.

데이터 클러스터링을 위한 혼합 시뮬레이티드 어닐링 (Hybrid Simulated Annealing for Data Clustering)

  • 김성수;백준영;강범수
    • 산업경영시스템학회지
    • /
    • 제40권2호
    • /
    • pp.92-98
    • /
    • 2017
  • Data clustering determines a group of patterns using similarity measure in a dataset and is one of the most important and difficult technique in data mining. Clustering can be formally considered as a particular kind of NP-hard grouping problem. K-means algorithm which is popular and efficient, is sensitive for initialization and has the possibility to be stuck in local optimum because of hill climbing clustering method. This method is also not computationally feasible in practice, especially for large datasets and large number of clusters. Therefore, we need a robust and efficient clustering algorithm to find the global optimum (not local optimum) especially when much data is collected from many IoT (Internet of Things) devices in these days. The objective of this paper is to propose new Hybrid Simulated Annealing (HSA) which is combined simulated annealing with K-means for non-hierarchical clustering of big data. Simulated annealing (SA) is useful for diversified search in large search space and K-means is useful for converged search in predetermined search space. Our proposed method can balance the intensification and diversification to find the global optimal solution in big data clustering. The performance of HSA is validated using Iris, Wine, Glass, and Vowel UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KSAK (K-means+SA+K-means) and SAK (SA+K-means) are better than KSA(K-means+SA), SA, and K-means in our simulations. Our method has significantly improved accuracy and efficiency to find the global optimal data clustering solution for complex, real time, and costly data mining process.

유전자 알고리즘을 이용한 클러스터링 기반 협력필터링 (Clustering-based Collaborative Filtering Using Genetic Algorithms)

  • 이수정
    • 창의정보문화연구
    • /
    • 제4권3호
    • /
    • pp.221-230
    • /
    • 2018
  • 추천 시스템의 주요 방법인 협력 필터링 기술은 실제 상업용 온라인 시스템에서 성공적으로 구현되어 서비스가 제공되고 있다. 그러나, 이 기술은 본질적으로 여러 가지 단점을 내포하는데, 데이터 희소성, 콜드 스타트, 확장성 문제 등이 그 예이다. 확장성 문제를 해결하기 위하여 클러스터링 기법을 활용한 협력 필터링 방법이 연구되어 왔다. 본 연구에서 제안하는 협력 필터링 시스템에서는 가장 널리 활용되는 클러스터링 기법들 중 하나인 K-means 알고리즘의 단점을 개선하고자 유전자 알고리즘을 이용한다. 또한, 기존 연구에서 최적화된 클러스터링 결과를 추구하였던 것과는 달리, 제안 방법은 클러스터링 결과를 활용한 협력 필터링 시스템 성능의 최적화를 목표로 하므로, 실질적으로 시스템의 성능을 향상시킬 수 있다.

3D 형광이미지 분석을 위한 레인 검출 및 추적 알고리즘 (Lane Detection and Tracking Algorithm for 3D Fluorescence Image Analysis)

  • 이복주;문혁;최영규
    • 반도체디스플레이기술학회지
    • /
    • 제15권1호
    • /
    • pp.27-32
    • /
    • 2016
  • A new lane detection algorithm is proposed for the analysis of DNA fingerprints from a polymerase chain reaction (PCR) gel electrophoresis image. Although several research results have been previously reported, it is still challenging to extract lanes precisely from images having abrupt background brightness difference and bent lanes. We propose an edge based algorithm for calculating the average lane width and lane cycle. Our method adopts sub-pixel algorithm for extracting rising-edges and falling edges precisely and estimates the lane width and cycle by using k-means clustering algorithm. To handle the curved lanes, we partition the gel image into small portions, and track the lane centers in each partitioned image. 32 gel images including 534 lanes are used to evaluate the performance of our method. Experimental results show that our method is robust to images having background difference and bent lanes without any preprocessing.