• Title/Summary/Keyword: Correlation-based clustering algorithm

Search Result 53, Processing Time 0.026 seconds

Data Correlation-Based Clustering Algorithm in Wireless Sensor Networks

  • Yeo, Myung-Ho;Seo, Dong-Min;Yoo, Jae-Soo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.3 no.3
    • /
    • pp.331-343
    • /
    • 2009
  • Many types of sensor data exhibit strong correlation in both space and time. Both temporal and spatial suppressions provide opportunities for reducing the energy cost of sensor data collection. Unfortunately, existing clustering algorithms are difficult to utilize the spatial or temporal opportunities, because they just organize clusters based on the distribution of sensor nodes or the network topology but not on the correlation of sensor data. In this paper, we propose a novel clustering algorithm based on the correlation of sensor data. We modify the advertisement sub-phase and TDMA schedule scheme to organize clusters by adjacent sensor nodes which have similar readings. Also, we propose a spatio-temporal suppression scheme for our clustering algorithm. In order to show the superiority of our clustering algorithm, we compare it with the existing suppression algorithms in terms of the lifetime of the sensor network and the size of data which have been collected in the base station. As a result, our experimental results show that the size of data is reduced and the whole network lifetime is prolonged.

Analysis of COVID-19 Context-awareness based on Clustering Algorithm (클러스터링 알고리즘기반의 COVID-19 상황인식 분석)

  • Lee, Kangwhan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.5
    • /
    • pp.755-762
    • /
    • 2022
  • This paper propose a clustered algorithm that possible more efficient COVID-19 disease learning prediction within clustering using context-aware attribute information. In typically, clustering of COVID-19 diseases provides to classify interrelationships within disease cluster information in the clustering process. The clustering data will be as a degrade factor if new or newly processing information during treated as contaminated factors in comparative interrelationships information. In this paper, we have shown the solving the problems and developed a clustering algorithm that can extracting disease correlation information in using K-means algorithm. According to their attributes from disease clusters using accumulated information and interrelationships clustering, the proposed algorithm analyzes the disease correlation clustering possible and centering points. The proposed algorithm showed improved adaptability to prediction accuracy of the classification management system in terms of learning as a group of multiple disease attribute information of COVID-19 through the applied simulation results.

A new Ensemble Clustering Algorithm using a Reconstructed Mapping Coefficient

  • Cao, Tuoqia;Chang, Dongxia;Zhao, Yao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.2957-2980
    • /
    • 2020
  • Ensemble clustering commonly integrates multiple basic partitions to obtain a more accurate clustering result than a single partition. Specifically, it exists an inevitable problem that the incomplete transformation from the original space to the integrated space. In this paper, a novel ensemble clustering algorithm using a newly reconstructed mapping coefficient (ECRMC) is proposed. In the algorithm, a newly reconstructed mapping coefficient between objects and micro-clusters is designed based on the principle of increasing information entropy to enhance effective information. This can reduce the information loss in the transformation from micro-clusters to the original space. Then the correlation of the micro-clusters is creatively calculated by the Spearman coefficient. Therefore, the revised co-association graph between objects can be built more accurately because the supplementary information can well ensure the completeness of the whole conversion process. Experiment results demonstrate that the ECRMC clustering algorithm has high performance, effectiveness, and feasibility.

The Design of GA-based TSK Fuzzy Classifier and Its application (GA기반 TSK 퍼지 분류기의 설계 및 응용)

  • 곽근창;김승석;유정웅;전명근
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.233-236
    • /
    • 2001
  • In this paper, we propose a TSK-type fuzzy classifier using PCA(Principal Component Analysis), FCM(Fuzzy C-Means) clustering and hybrid GA(genetic algorithm). First, input data is transformed to reduce correlation among the data components by PCA. FCM clustering is applied to obtain a initial TSK-type fuzzy classifier. Parameter identification is performed by AGA(Adaptive Genetic Algorithm) and RLSE(Recursive Least Square Estimate). we applied the proposed method to Iris data classification problems and obtained a better performance than previous works.

  • PDF

Extended Kepler Grid-based System for Diabetes Study Workspace

  • Hazemi, Fawaz Al;Youn, Chan-Hyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.230-233
    • /
    • 2011
  • Chronic disease is linked to patient's' lifestyle. Therefore, doctor has to monitor his/her patient over time. This may involve reviewing many reports, finding any changes, and modifying several treatments. One solution to optimize the burden is using a visualizing tool over time such as a timeline-based visualization tool where all reports and medicine are integrated in a problem centric and time-based style to enable the doctor to predict and adjust the treatment plan. This solution was proposed by Bui et. al. [2] to observe the medical history of a patient. However, there was limitation of studying the diabetes patient's history to find out what was the cause of the current development in patient's condition; moreover what would be the prediction of current implication in one of the diabetes' related factors (such as fat, cholesterol, or potassium). In this paper, we propose a Grid-based Interactive Diabetes System (GIDS) to support bioinformatics analysis application for diabetes diseases. GIDS used an agglomerative clustering algorithm as clustering correlation algorithm as primary algorithm to focus medical researcher in the findings to predict the implication of the undertaken diabetes patient. The algorithm was Chronological Clustering proposed by P. Legendre [11] [12].

Electric Power Load Forecasting using Fuzzy Prediction System (퍼지 예측 시스템을 이용한 전력 부하 예측)

  • Bang, Young-Keun;Shim, Jae-Sun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.11
    • /
    • pp.1590-1597
    • /
    • 2013
  • Electric power is an important part in economic development. Moreover, an accurate load forecast can make a financing planning, power supply strategy and market research planned effectively. This paper used the fuzzy logic system to predict the regional electric power load. To design the fuzzy prediction system, the correlation-based clustering algorithm and TSK fuzzy model were used. Also, to improve the prediction system's capability, the moving average technique and relative increasing rate were used in the preprocessing procedure. Finally, using four regional electric power load in Taiwan, this paper verified the performance of the proposed system and demonstrated its effectiveness and usefulness.

A Data-Centric Clustering Algorithm for Reducing Network Traffic in Wireless Sensor Networks (무선 센서 네트워크에서 네트워크 트래픽 감소를 위한 데이타 중심 클러스터링 알고리즘)

  • Yeo, Myung-Ho;Lee, Mi-Sook;Park, Jong-Guk;Lee, Seok-Jae;Yoo, Jae-Soo
    • Journal of KIISE:Information Networking
    • /
    • v.35 no.2
    • /
    • pp.139-148
    • /
    • 2008
  • Many types of sensor data exhibit strong correlation in both space and time. Suppression, both temporal and spatial, provides opportunities for reducing the energy cost of sensor data collection. Unfortunately, existing clustering algorithms are difficult to utilize the spatial or temporal opportunities, because they just organize clusters based on the distribution of sensor nodes or the network topology but not correlation of sensor data. In this paper, we propose a novel clustering algorithm with suppression techniques. To guarantee independent communication among clusters, we allocate multiple channels based on sensor data. Also, we propose a spatio-temporal suppression technique to reduce the network traffic. In order to show the superiority of our clustering algorithm, we compare it with the existing suppression algorithms in terms of the lifetime of the sensor network and the site of data which have been collected in the base-station. As a result, our experimental results show that the size of data was reduced by $4{\sim}40%$, and whole network lifetime was prolonged by $20{\sim}30%$.

Partially Evaluated Genetic Algorithm based on Fuzzy Clustering (퍼지 클러스터링 기반의 국소평가 유전자 알고리즘)

  • Yoo Si-Ho;Cho Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1246-1257
    • /
    • 2004
  • To find an optimal solution with genetic algorithm, it is desirable to maintain the population sire as large as possible. In some cases, however, the cost to evaluate each individual is relatively high and it is difficult to maintain large population. To solve this problem we propose a novel genetic algorithm based on fuzzy clustering, which considerably reduces evaluation number without any significant loss of its performance by evaluating only one representative for each cluster. The fitness values of other individuals are estimated from the representative fitness values indirectly. We have used fuzzy c-means algorithm and distributed the fitness using membership matrix, since it is hard to distribute precise fitness values by hard clustering method to individuals which belong to multiple groups. Nine benchmark functions have been investigated and the results are compared to six hard clustering algorithms with Euclidean distance and Pearson correlation coefficients as fitness distribution method.

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.

Plurality Rule-based Density and Correlation Coefficient-based Clustering for K-NN

  • Aung, Swe Swe;Nagayama, Itaru;Tamaki, Shiro
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.3
    • /
    • pp.183-192
    • /
    • 2017
  • k-nearest neighbor (K-NN) is a well-known classification algorithm, being feature space-based on nearest-neighbor training examples in machine learning. However, K-NN, as we know, is a lazy learning method. Therefore, if a K-NN-based system very much depends on a huge amount of history data to achieve an accurate prediction result for a particular task, it gradually faces a processing-time performance-degradation problem. We have noticed that many researchers usually contemplate only classification accuracy. But estimation speed also plays an essential role in real-time prediction systems. To compensate for this weakness, this paper proposes correlation coefficient-based clustering (CCC) aimed at upgrading the performance of K-NN by leveraging processing-time speed and plurality rule-based density (PRD) to improve estimation accuracy. For experiments, we used real datasets (on breast cancer, breast tissue, heart, and the iris) from the University of California, Irvine (UCI) machine learning repository. Moreover, real traffic data collected from Ojana Junction, Route 58, Okinawa, Japan, was also utilized to lay bare the efficiency of this method. By using these datasets, we proved better processing-time performance with the new approach by comparing it with classical K-NN. Besides, via experiments on real-world datasets, we compared the prediction accuracy of our approach with density peaks clustering based on K-NN and principal component analysis (DPC-KNN-PCA).