• Title/Summary/Keyword: Clustering Problem

Search Result 708, Processing Time 0.021 seconds

A Study on Representative Skyline Using Connected Component Clustering

  • Choi, Jong-Hyeok;Nasridinov, Aziz
    • Journal of Multimedia Information System
    • /
    • v.6 no.1
    • /
    • pp.37-42
    • /
    • 2019
  • Skyline queries are used in a variety of fields to make optimal decisions. However, as the volume of data and the dimension of the data increase, the number of skyline points increases with the amount of time it takes to discover them. Mainly, because the number of skylines is essential in many real-life applications, various studies have been proposed. However, previous researches have used the k-parameter methods such as top-k and k-means to discover representative skyline points (RSPs) from entire skyline point set, resulting in high query response time and reduced representativeness due to k dependency. To solve this problem, we propose a new Connected Component Clustering based Representative Skyline Query (3CRS) that can discover RSP quickly even in high-dimensional data through connected component clustering. 3CRS performs fast discovery and clustering of skylines through hash indexes and connected components and selects RSPs from each cluster. This paper proves the superiority of the proposed method by comparing it with representative skyline queries using k-means and DBSCAN with the real-world dataset.

An Efficient Clustering Method based on Multi Centroid Set using MapReduce (맵리듀스를 이용한 다중 중심점 집합 기반의 효율적인 클러스터링 방법)

  • Kang, Sungmin;Lee, Seokjoo;Min, Jun-ki
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.7
    • /
    • pp.494-499
    • /
    • 2015
  • As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.

Gene Sequences Clustering for the Prediction of Functional Domain (기능 도메인 예측을 위한 유전자 서열 클러스터링)

  • Han Sang-Il;Lee Sung-Gun;Hou Bo-Kyeng;Byun Yoon-Sup;Hwang Kyu-Suk
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.12 no.10
    • /
    • pp.1044-1049
    • /
    • 2006
  • Multiple sequence alignment is a method to compare two or more DNA or protein sequences. Most of multiple sequence alignment tools rely on pairwise alignment and Smith-Waterman algorithm to generate an alignment hierarchy. Therefore, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST and CDD (Conserved Domain Database)search were combined with a clustering tool. Our clustering and annotating tool consists of constructing suffix tree, overlapping common subsequences, clustering gene sequences and annotating gene clusters by BLAST and CDD search. The system was successfully evaluated with 36 gene sequences in the pentose phosphate pathway, clustering 10 clusters, finding out representative common subsequences, and finally identifying functional domains by searching CDD database.

Clustering based Routing Algorithm for Efficient Emergency Messages Transmission in VANET (차량 통신 네트워크에서 효율적인 긴급 메시지 전파를 위한 클러스터링 기반의 라우팅 알고리즘)

  • Kim, Jun-Su;Ryu, Min-Woo;Cha, Si-Ho;Lee, Jong-Eon;Cho, Kuk-Hyun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.8
    • /
    • pp.3672-3679
    • /
    • 2012
  • Vehicle Ad hoc Network (VANET) is next-generation network technology to provide various services using V2V (Vehicle-to-Vehicle) and V2I (Vehicle-to-Infrastructure). In VANET, many researchers proposed various studies for the safety of drivers. In particular, using the emergency message to increase the efficiency of traffic safety have been actively studied. In order to efficiently transmit to moving vehicle, to send a quick message to as many nodes is very important via broadcasting belong to communication range of vehicle nodes. However, existing studies have suggested a message for transmission to the communication node through indiscriminate broadcasting and broadcast storm problems, thereby decreasing the overall performance has caused the problem. In addition, theses problems has decreasing performance of overall network in various form of road and high density of vehicle node as urban area. Therefore, this paper proposed Clustering based Routing Algorithm (CBRA) to efficiently transmit emergency message in high density of vehicle as urban area. The CBRA managed moving vehicle via clustering when vehicle transmit emergency messages. In addition, we resolve linkage problem between vehicles according to various form of road. The CBRA resolve link brokage problem according to various form of road as urban using clustering. In addition, we resolve broadcasting storm problem and improving efficacy using selection flooding method. simulation results using ns-2 revealed that the proposed CBRA performs much better than the existing routing protocols.

Building Topic Hierarchy of e-Documents using Text Mining Technology

  • Kim, Han-Joon
    • Proceedings of the CALSEC Conference
    • /
    • 2004.02a
    • /
    • pp.294-301
    • /
    • 2004
  • ·Text-mining approach to e-documents organization based on topic hierarchy - Machine-Learning & information Theory-based ㆍ 'Category(topic) discovery' problem → document bundle-based user-constraint document clustering ㆍ 'Automatic categorization' problem → Accelerated EM with CU-based active learning → 'Hierarchy Construction' problem → Unsupervised learning of category subsumption relation

  • PDF

Modeling Large S-System using Clustering and Genetic Algorithm

  • Jung, Sung-Won;Lee, Kwang-H.;Lee, Co-Heon
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.197-201
    • /
    • 2005
  • When we want to find out the regulatory relationships between genes from gene expression data, dimensionality is one of the big problem. In general, the size of search space in modeling the regulatory relationships grows in O(n$^2$) while the number of genes is increasing. However, hopefully it can be reduced to O(kn) with selected k by applying divide and conquer heuristics which depend on some assumptions about genetic network. In this paper, we approach the modeling problem in divide-and-conquer manner. We applied clustering to make the problem into small sub-problems, then hierarchical model process is applied to those small sub-problems.

  • PDF

Generalized Clustering Algorithm for Part-Machine Grouping with Alternative Process Plans (대체가공경로를 가지는 부품-기계 군집 문제를 위한 일반화된 군집 알고리듬)

  • Kim, Chang-Ouk;Park, Yun-Sun;Jun, Jin
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.27 no.3
    • /
    • pp.281-288
    • /
    • 2001
  • We consider in this article a multi-objective part-machine grouping problem in which parts have alternative process plans and expected annual demand of each part is known. This problem is characterized as optimally determining part sets and corresponding machine cells such that total sum of distance (or dissimilarity) between parts and total sum of load differences between machines are simultaneously minimized. Two heuristic algorithms are proposed, and examples are given to compare the performance of the algorithms.

  • PDF

Adaptive Clustering Algorithm for Recycling Cell Formation: An Application of Fuzzy ART Neural Networks

  • Seo, Kwang-Kyu;Park, Ji-Hyung
    • Journal of Mechanical Science and Technology
    • /
    • v.18 no.12
    • /
    • pp.2137-2147
    • /
    • 2004
  • The recycling cell formation problem means that disposal products are classified into recycling part families using group technology in their end-of-life phase. Disposal products have the uncertainties of product status by usage influences during product use phase, and recycling cells are formed design, process and usage attributes. In order to deal with the uncertainties, fuzzy set theory and fuzzy logic-based neural network model are applied to recycling cell formation problem for disposal products. Fuzzy C-mean algorithm and a heuristic approach based on fuzzy ART neural network is suggested. Especially, the modified Fuzzy ART neural network is shown that it has a good clustering results and gives an extension for systematically generating alternative solutions in the recycling cell formation problem. Disposal refrigerators are shown as examples.

Research on the Energy Hole Problem Based on Non-uniform Node Distribution for Wireless Sensor Networks

  • Liu, Tang;Peng, Jian;Wang, Xiao-Fen;Yang, Jin;Guo, Bing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.9
    • /
    • pp.2017-2036
    • /
    • 2012
  • Based on the current solutions to the problem of energy hole, this paper proposed a nonuniform node distribution clustering algorithm, NNDC. Firstly, we divide the network into rings, and then have an analysis and calculation on nodes' energy consumption in each ring of the network when clustering algorithm is applied to collect data. We also put forward a scheme of nonuniform node distribution on the basis of the proportion of nodes' energy consumption in each ring, and change nodes' active/hibernating states under density control mechanism when network coverage is guaranteed. Simulation shows NNDC algorithm can satisfyingly balance nodes' energy consumption and effectively avoid the problem of energy hole.

Heuristic for the Pick-up and Delivery Vehicle Routing Problem: Case Study for the Remicon Truck Routing in the Metropolitan Area (배달과 수집을 수행하는 차량경로문제 휴리스틱에 관한 연구: 수도권 레미콘 운송사례)

  • Ji, Chang-Hun;Kim, Mi-Yi;Lee, Young-Hoon
    • Korean Management Science Review
    • /
    • v.24 no.2
    • /
    • pp.43-56
    • /
    • 2007
  • VRP(Vehicle Routing Problem) is studied in this paper, where two different kinds of missions are to be completed. The objective is to minimize the total vehicle operating distance. A mixed integer programming formulation and a heuristic algorithm for a practical use are suggested. A heuristic algorithm consists of three phases such as clustering, constructing routes, and adjustment. In the first phase, customers are clustered so that the supply nodes are grouped with demand nodes to be served by the same vehicle. Vehicle routes are generated within the cluster in the second phase. Clusters and routes are adjusted in the third phase using the UF (unfitness) rule designed to determine the customers and the routes to be moved properly. It is shown that the suggested heuristic algorithm yields good performances within a relatively short computational time through computational experiment.