• 제목/요약/키워드: Correlation-based clustering algorithm

검색결과 53건 처리시간 0.027초

Data Correlation-Based Clustering Algorithm in Wireless Sensor Networks

  • Yeo, Myung-Ho;Seo, Dong-Min;Yoo, Jae-Soo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제3권3호
    • /
    • pp.331-343
    • /
    • 2009
  • Many types of sensor data exhibit strong correlation in both space and time. Both temporal and spatial suppressions provide opportunities for reducing the energy cost of sensor data collection. Unfortunately, existing clustering algorithms are difficult to utilize the spatial or temporal opportunities, because they just organize clusters based on the distribution of sensor nodes or the network topology but not on the correlation of sensor data. In this paper, we propose a novel clustering algorithm based on the correlation of sensor data. We modify the advertisement sub-phase and TDMA schedule scheme to organize clusters by adjacent sensor nodes which have similar readings. Also, we propose a spatio-temporal suppression scheme for our clustering algorithm. In order to show the superiority of our clustering algorithm, we compare it with the existing suppression algorithms in terms of the lifetime of the sensor network and the size of data which have been collected in the base station. As a result, our experimental results show that the size of data is reduced and the whole network lifetime is prolonged.

클러스터링 알고리즘기반의 COVID-19 상황인식 분석 (Analysis of COVID-19 Context-awareness based on Clustering Algorithm)

  • 이강환
    • 한국정보통신학회논문지
    • /
    • 제26권5호
    • /
    • pp.755-762
    • /
    • 2022
  • 본 논문에서는 학습 예측이 가능한 군집적 알고리즘으로 COVID-19에서 상황인식정보인 질병의 속성정보와 클러스터링를 이용한 군집적 알고리즘을 제안한다. 클러스터링 내에서 처리되는 군집 데이터는 신규 또는 새롭게 입력되는 정보가 상호관계를 예측하기 위해 분류 제공되는데, 이때 새롭게 입력되는 정보가 비교정보에서 오염된 정보로 처리되면 기존 분류된 군집으로부터 벗어나게 되어 군집성을 저하시키는 요인으로 작용하게 된다. 본 논문에서는 COVID-19에서의 질병속성 정보내 K-means알고리즘을 이용함에 있어 이러한 문제를 해결하기 위해 질병 상호관계 정보 추출이 가능한 사용자 군집 분석 방식을 제안하고자 한다. 제안하는 알고리즘은 자율적인 사용자 군집 특징의 상호관계를 분석학습하고 이를 통하여 사용자 질병속성간에 따른 클러스터를 구성해 사용자의 누적 정보로부터 클러스터의 중심점을 제공하게 된다. 논문에서 제안된 COVID-19의 다중질병 속성정보군집단위로 분류하고 학습하는 알고리즘은 적용한 모의실험 결과를 통해 사용자 관리 시스템의 예측정확도가 학습과정에서 향상됨을 보여주었다.

A new Ensemble Clustering Algorithm using a Reconstructed Mapping Coefficient

  • Cao, Tuoqia;Chang, Dongxia;Zhao, Yao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권7호
    • /
    • pp.2957-2980
    • /
    • 2020
  • Ensemble clustering commonly integrates multiple basic partitions to obtain a more accurate clustering result than a single partition. Specifically, it exists an inevitable problem that the incomplete transformation from the original space to the integrated space. In this paper, a novel ensemble clustering algorithm using a newly reconstructed mapping coefficient (ECRMC) is proposed. In the algorithm, a newly reconstructed mapping coefficient between objects and micro-clusters is designed based on the principle of increasing information entropy to enhance effective information. This can reduce the information loss in the transformation from micro-clusters to the original space. Then the correlation of the micro-clusters is creatively calculated by the Spearman coefficient. Therefore, the revised co-association graph between objects can be built more accurately because the supplementary information can well ensure the completeness of the whole conversion process. Experiment results demonstrate that the ECRMC clustering algorithm has high performance, effectiveness, and feasibility.

GA기반 TSK 퍼지 분류기의 설계 및 응용 (The Design of GA-based TSK Fuzzy Classifier and Its application)

  • 곽근창;김승석;유정웅;전명근
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2001년도 추계학술대회 학술발표 논문집
    • /
    • pp.233-236
    • /
    • 2001
  • In this paper, we propose a TSK-type fuzzy classifier using PCA(Principal Component Analysis), FCM(Fuzzy C-Means) clustering and hybrid GA(genetic algorithm). First, input data is transformed to reduce correlation among the data components by PCA. FCM clustering is applied to obtain a initial TSK-type fuzzy classifier. Parameter identification is performed by AGA(Adaptive Genetic Algorithm) and RLSE(Recursive Least Square Estimate). we applied the proposed method to Iris data classification problems and obtained a better performance than previous works.

  • PDF

Extended Kepler Grid-based System for Diabetes Study Workspace

  • Hazemi, Fawaz Al;Youn, Chan-Hyun
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2011년도 춘계학술발표대회
    • /
    • pp.230-233
    • /
    • 2011
  • Chronic disease is linked to patient's' lifestyle. Therefore, doctor has to monitor his/her patient over time. This may involve reviewing many reports, finding any changes, and modifying several treatments. One solution to optimize the burden is using a visualizing tool over time such as a timeline-based visualization tool where all reports and medicine are integrated in a problem centric and time-based style to enable the doctor to predict and adjust the treatment plan. This solution was proposed by Bui et. al. [2] to observe the medical history of a patient. However, there was limitation of studying the diabetes patient's history to find out what was the cause of the current development in patient's condition; moreover what would be the prediction of current implication in one of the diabetes' related factors (such as fat, cholesterol, or potassium). In this paper, we propose a Grid-based Interactive Diabetes System (GIDS) to support bioinformatics analysis application for diabetes diseases. GIDS used an agglomerative clustering algorithm as clustering correlation algorithm as primary algorithm to focus medical researcher in the findings to predict the implication of the undertaken diabetes patient. The algorithm was Chronological Clustering proposed by P. Legendre [11] [12].

퍼지 예측 시스템을 이용한 전력 부하 예측 (Electric Power Load Forecasting using Fuzzy Prediction System)

  • 방영근;심재선
    • 전기학회논문지
    • /
    • 제62권11호
    • /
    • pp.1590-1597
    • /
    • 2013
  • Electric power is an important part in economic development. Moreover, an accurate load forecast can make a financing planning, power supply strategy and market research planned effectively. This paper used the fuzzy logic system to predict the regional electric power load. To design the fuzzy prediction system, the correlation-based clustering algorithm and TSK fuzzy model were used. Also, to improve the prediction system's capability, the moving average technique and relative increasing rate were used in the preprocessing procedure. Finally, using four regional electric power load in Taiwan, this paper verified the performance of the proposed system and demonstrated its effectiveness and usefulness.

무선 센서 네트워크에서 네트워크 트래픽 감소를 위한 데이타 중심 클러스터링 알고리즘 (A Data-Centric Clustering Algorithm for Reducing Network Traffic in Wireless Sensor Networks)

  • 여명호;이미숙;박종국;이석재;유재수
    • 한국정보과학회논문지:정보통신
    • /
    • 제35권2호
    • /
    • pp.139-148
    • /
    • 2008
  • 센서 네트워크를 사용하는 응용분야에 따라 보다 고차원적인 데이타 처리를 필요로 하는 경우 모든 센서 노드의 수집 데이타를 싱크 노드로 전송한다. 수집된 데이타는 일반적으로 센서 네트워크의 환경적인 특성상 시간적으로 혹은 공간적으로 연관성을 지닌다. 이러한 연관성은 싱크 노드가 일부의 데이터만 수집하고도 모든 데이타를 복원할 수 있는 기회를 제공한다. 센서 네트워크에서는 데이타 수집을 위한 기법으로 클러스터링 기법을 널리 사용한다. 하지만 기존의 클러스터링 기법의 경우 수집한 데이타의 연관성을 고려하지 않고, 센서 노드의 지역성(locality)만을 고려하여 클러스터를 생성하기 때문에 이러한 기회를 활용하기에 비효율적이다. 본 논문에서는 수집된 데이타를 중심으로 클러스터를 생성하고, 싱크 노드로 전송되는 데이타의 크기를 획기적으로 줄일 수 있는 클러스터링 기법을 제안한다 제안하는 클러스터링 기법의 우수함을 보이기 위해 시뮬레이션을 통한 성능 평가를 수행하였으며, 그 결과 기존 기법들에 비해 네트워크 트래픽이 약 $4{\sim}40%$ 감소하고, 네트워크의 수명이 약 $20{\sim}30%$ 연장되었다.

퍼지 클러스터링 기반의 국소평가 유전자 알고리즘 (Partially Evaluated Genetic Algorithm based on Fuzzy Clustering)

  • 유시호;조성배
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제31권9호
    • /
    • pp.1246-1257
    • /
    • 2004
  • 유전자 알고리즘은 원하는 최적해를 찾기 위해서 개체 집단의 크기를 가능한 크게 유지하여야 한다. 하지만 실제 문제에서 개체의 적합도를 평가하는 것이 어려운 경우가 많기 때문에 큰 집단의 모든 개체에 대하여 적합도를 평가하는 것은 많은 시간과 비용을 요구한다. 이에 본 논문에서는 집단의 크기를 크게 유지하되 클러스터링에 의해 대표 개체만을 평가함으로써 효율을 높이는 퍼지 글러스터링 기반의 국소 평가 유전자 알고리즘을 제안한다. 나머지 개체들은 대표 개체로부터 간접적으로 적합도를 분배받는다. 다수의 집단에 소속되는 개체들의 경우, 하드 클러스터링 방법으로는 정확한 적합도 분배를 하기 어렵기 때문에 퍼지 c-means 알고리즘을 사용하였고, 클러스터 결과인 퍼지 소속 행렬에 의해 적합도를 배분하였다. 9개의 벤치마크 적합도 함수에 대하여 6가지 하드 클러스터링 알고리즘을 적용한 유클리디안 거리와 피어슨 상관계수에 의한 적합도 배분 방법과 본 논문에서 제안하는 방법을 비교 실천한 결과, 제안한 방법의 우수한 성능을 확인할 수 있었다.

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • 제14권5호
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.

Plurality Rule-based Density and Correlation Coefficient-based Clustering for K-NN

  • Aung, Swe Swe;Nagayama, Itaru;Tamaki, Shiro
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제6권3호
    • /
    • pp.183-192
    • /
    • 2017
  • k-nearest neighbor (K-NN) is a well-known classification algorithm, being feature space-based on nearest-neighbor training examples in machine learning. However, K-NN, as we know, is a lazy learning method. Therefore, if a K-NN-based system very much depends on a huge amount of history data to achieve an accurate prediction result for a particular task, it gradually faces a processing-time performance-degradation problem. We have noticed that many researchers usually contemplate only classification accuracy. But estimation speed also plays an essential role in real-time prediction systems. To compensate for this weakness, this paper proposes correlation coefficient-based clustering (CCC) aimed at upgrading the performance of K-NN by leveraging processing-time speed and plurality rule-based density (PRD) to improve estimation accuracy. For experiments, we used real datasets (on breast cancer, breast tissue, heart, and the iris) from the University of California, Irvine (UCI) machine learning repository. Moreover, real traffic data collected from Ojana Junction, Route 58, Okinawa, Japan, was also utilized to lay bare the efficiency of this method. By using these datasets, we proved better processing-time performance with the new approach by comparing it with classical K-NN. Besides, via experiments on real-world datasets, we compared the prediction accuracy of our approach with density peaks clustering based on K-NN and principal component analysis (DPC-KNN-PCA).