• Title/Summary/Keyword: and clustering

Search Result 5,612, Processing Time 0.026 seconds

Hybrid Simulated Annealing for Data Clustering (데이터 클러스터링을 위한 혼합 시뮬레이티드 어닐링)

  • Kim, Sung-Soo;Baek, Jun-Young;Kang, Beom-Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.2
    • /
    • pp.92-98
    • /
    • 2017
  • Data clustering determines a group of patterns using similarity measure in a dataset and is one of the most important and difficult technique in data mining. Clustering can be formally considered as a particular kind of NP-hard grouping problem. K-means algorithm which is popular and efficient, is sensitive for initialization and has the possibility to be stuck in local optimum because of hill climbing clustering method. This method is also not computationally feasible in practice, especially for large datasets and large number of clusters. Therefore, we need a robust and efficient clustering algorithm to find the global optimum (not local optimum) especially when much data is collected from many IoT (Internet of Things) devices in these days. The objective of this paper is to propose new Hybrid Simulated Annealing (HSA) which is combined simulated annealing with K-means for non-hierarchical clustering of big data. Simulated annealing (SA) is useful for diversified search in large search space and K-means is useful for converged search in predetermined search space. Our proposed method can balance the intensification and diversification to find the global optimal solution in big data clustering. The performance of HSA is validated using Iris, Wine, Glass, and Vowel UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KSAK (K-means+SA+K-means) and SAK (SA+K-means) are better than KSA(K-means+SA), SA, and K-means in our simulations. Our method has significantly improved accuracy and efficiency to find the global optimal data clustering solution for complex, real time, and costly data mining process.

The Clustering Method Of Central Control System In New Distribution Automation System (배전자동화시스템 중앙제어장치 이중화 적용방안)

  • Cho, Nam-Hun;Ha, Bok-Nam;Lee, Jung-Ho;Lim, Seong-Il
    • Proceedings of the KIEE Conference
    • /
    • 1999.07c
    • /
    • pp.1120-1122
    • /
    • 1999
  • This paper introduces a clustering for Central Control System in New Distribution Automation System. There are three primary benefits to use clustering: improved availability, easier manageability and more cost-effective scalability. Availability: Clustering can automatically detect the failure of an application or server and quickly restart it on a surviving server. Clients only experience a momentary pause in service. Manageability: Clustering lets administrators quickly inspect the status of all cluster resources and easily move workload around onto different servers within a cluster. Scalability: Applications can use the Clustering services through the MSCS Application Programming Interface(API) to do dynamic load balancing and scale across multiple servers within a cluster.

  • PDF

A Study on K -Means Clustering

  • Bae, Wha-Soo;Roh, Se-Won
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.497-508
    • /
    • 2005
  • This paper aims at studying on K-means Clustering focusing on initialization which affect the clustering results in K-means cluster analysis. The four different methods(the MA method, the KA method, the Max-Min method and the Space Partition method) were compared and the clustering result shows that there were some differences among these methods, especially that the MA method sometimes leads to incorrect clustering due to the inappropriate initialization depending on the types of data and the Max-Min method is shown to be more effective than other methods especially when the data size is large.

Semantic-Based K-Means Clustering for Microblogs Exploiting Folksonomy

  • Heu, Jee-Uk
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1438-1444
    • /
    • 2018
  • Recently, with the development of Internet technologies and propagation of smart devices, use of microblogs such as Facebook, Twitter, and Instagram has been rapidly increasing. Many users check for new information on microblogs because the content on their timelines is continually updating. Therefore, clustering algorithms are necessary to arrange the content of microblogs by grouping them for a user who wants to get the newest information. However, microblogs have word limits, and it has there is not enough information to analyze for content clustering. In this paper, we propose a semantic-based K-means clustering algorithm that not only measures the similarity between the data represented as a vector space model, but also measures the semantic similarity between the data by exploiting the TagCluster for clustering. Through the experimental results on the RepLab2013 Twitter dataset, we show the effectiveness of the semantic-based K-means clustering algorithm.

Clustering Validity of Social Network Subgroup Using Attribute Similarity (속성유사도에 따른 사회연결망 서브그룹의 군집유효성)

  • Yoon, Han-Seong
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.17 no.1
    • /
    • pp.75-84
    • /
    • 2021
  • For analyzing big data, the social network is increasingly being utilized through relational data, which means the connection characteristics between entities such as people and objects. When the relational data does not exist directly, a social network can be configured by calculating relational data such as attribute similarity from attribute data of entities and using it as links. In this paper, the composition method of the social network using the attribute similarity between entities as a connection relationship, and the clustering method using subgroups for the configured social network are suggested, and the clustering effectiveness of the clustering results is evaluated. The analysis results can vary depending on the type and characteristics of the data to be analyzed, the type of attribute similarity selected, and the criterion value. In addition, the clustering effectiveness may not be consistent depending on the its evaluation method. Therefore, selections and experiments are necessary for better analysis results. Since the analysis results may be different depending on the type and characteristics of the analysis target, options for clustering, etc., there is a limitation. In addition, for performance evaluation of clustering, a study is needed to compare the method of this paper with the conventional method such as k-means.

Clustering Meta Information of K-Pop Girl Groups Using Term Frequency-inverse Document Frequency Vectorization (단어-역문서 빈도 벡터화를 통한 한국 걸그룹의 음반 메타 정보 군집화)

  • JoonSeo Hyeon;JaeHyuk Cho
    • Journal of Platform Technology
    • /
    • v.11 no.3
    • /
    • pp.12-23
    • /
    • 2023
  • In the 2020s, the K-Pop market has been dominated by girl groups over boy groups and the fourth generation over the third generation. This paper presents methods and results on lyric clustering to investigate whether the generation of girl groups has started to change. We collected meta-information data for 1469 songs of 47 groups released from 2013 to 2022 and classified them into lyric information and non-lyric meta-information and quantified them respectively. The lyrics information was preprocessed by applying word-translation frequency vectorization based on previous studies and then selecting only the top vector values. Non-lyric meta-information was preprocessed and applied with One-Hot Encoding to reduce the bias of using only lyric information and show better clustering results. The clustering performance on the preprocessed data is 129%, 45% higher for Spherical K-Means' Silhouette Score and Calinski-Harabasz Score, respectively, compared to Hierarchical Clustering. This paper is expected to contribute to the study of Korean popular song development and girl group lyrics analysis and clustering.

  • PDF

A New Image Clustering Method Based on the Fuzzy Harmony Search Algorithm and Fourier Transform

  • Bekkouche, Ibtissem;Fizazi, Hadria
    • Journal of Information Processing Systems
    • /
    • v.12 no.4
    • /
    • pp.555-576
    • /
    • 2016
  • In the conventional clustering algorithms, an object could be assigned to only one group. However, this is sometimes not the case in reality, there are cases where the data do not belong to one group. As against, the fuzzy clustering takes into consideration the degree of fuzzy membership of each pixel relative to different classes. In order to overcome some shortcoming with traditional clustering methods, such as slow convergence and their sensitivity to initialization values, we have used the Harmony Search algorithm. It is based on the population metaheuristic algorithm, imitating the musical improvisation process. The major thrust of this algorithm lies in its ability to integrate the key components of population-based methods and local search-based methods in a simple optimization model. We propose in this paper a new unsupervised clustering method called the Fuzzy Harmony Search-Fourier Transform (FHS-FT). It is based on hybridization fuzzy clustering and the harmony search algorithm to increase its exploitation process and to further improve the generated solution, while the Fourier transform to increase the size of the image's data. The results show that the proposed method is able to provide viable solutions as compared to previous work.

Mobile User Interface Pattern Clustering Using Improved Semi-Supervised Kernel Fuzzy Clustering Method

  • Jia, Wei;Hua, Qingyi;Zhang, Minjun;Chen, Rui;Ji, Xiang;Wang, Bo
    • Journal of Information Processing Systems
    • /
    • v.15 no.4
    • /
    • pp.986-1016
    • /
    • 2019
  • Mobile user interface pattern (MUIP) is a kind of structured representation of interaction design knowledge. Several studies have suggested that MUIPs are a proven solution for recurring mobile interface design problems. To facilitate MUIP selection, an effective clustering method is required to discover hidden knowledge of pattern data set. In this paper, we employ the semi-supervised kernel fuzzy c-means clustering (SSKFCM) method to cluster MUIP data. In order to improve the performance of clustering, clustering parameters are optimized by utilizing the global optimization capability of particle swarm optimization (PSO) algorithm. Since the PSO algorithm is easily trapped in local optima, a novel PSO algorithm is presented in this paper. It combines an improved intuitionistic fuzzy entropy measure and a new population search strategy to enhance the population search capability and accelerate the convergence speed. Experimental results show the effectiveness and superiority of the proposed clustering method.

Intelligent Clustering in Vehicular ad hoc Networks

  • Aadil, Farhan;Khan, Salabat;Bajwa, Khalid Bashir;Khan, Muhammad Fahad;Ali, Asad
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.8
    • /
    • pp.3512-3528
    • /
    • 2016
  • A network with high mobility nodes or vehicles is vehicular ad hoc Network (VANET). For improvement in communication efficiency of VANET, many techniques have been proposed; one of these techniques is vehicular node clustering. Cluster nodes (CNs) and Cluster Heads (CHs) are elected or selected in the process of clustering. The longer the lifetime of clusters and the lesser the number of CHs attributes to efficient networking in VANETs. In this paper, a novel Clustering algorithm is proposed based on Ant Colony Optimization (ACO) for VANET named ACONET. This algorithm forms optimized clusters to offer robust communication for VANETs. For optimized clustering, parameters of transmission range, direction, speed of the nodes and load balance factor (LBF) are considered. The ACONET is compared empirically with state of the art methods, including Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO) based clustering techniques. An extensive set of experiments is performed by varying the grid size of the network, the transmission range of nodes, and total number of nodes in network to evaluate the effectiveness of the algorithms in comparison. The results indicate that the ACONET has significantly outperformed the competitors.

Maximizing Information Transmission for Energy Harvesting Sensor Networks by an Uneven Clustering Protocol and Energy Management

  • Ge, Yujia;Nan, Yurong;Chen, Yi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.4
    • /
    • pp.1419-1436
    • /
    • 2020
  • For an energy harvesting sensor network, when the network lifetime is not the only primary goal, maximizing the network performance under environmental energy harvesting becomes a more critical issue. However, clustering protocols that aim at providing maximum information throughput have not been thoroughly explored in Energy Harvesting Wireless Sensor Networks (EH-WSNs). In this paper, clustering protocols are studied for maximizing the data transmission in the whole network. Based on a long short-term memory (LSTM) energy predictor and node energy consumption and supplement models, an uneven clustering protocol is proposed where the cluster head selection and cluster size control are thoroughly designed for this purpose. Simulations and results verify that the proposed scheme can outperform some classic schemes by having more data packets received by the cluster heads (CHs) and the base station (BS) under these energy constraints. The outcomes of this paper also provide some insights for choosing clustering routing protocols in EH-WSNs, by exploiting the factors such as uneven clustering size, number of clusters, multiple CHs, multihop routing strategy, and energy supplementing period.