• Title/Summary/Keyword: clustering problem

Search Result 709, Processing Time 0.028 seconds

A Study on Clustering and Identifying Gene Sequences using Suffix Tree Clustering Method and BLAST (서픽스트리 클러스터링 방법과 블라스트를 통합한 유전자 서열의 클러스터링과 기능검색에 관한 연구)

  • Han, Sang-Il;Lee, Sung-Gun;Kim, Kyung-Hoon;Lee, Ju-Yeong;Kim, Young-Han;Hwang, Kyu-Suk
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.11 no.10
    • /
    • pp.851-856
    • /
    • 2005
  • The DNA and protein data of diverse species have been daily discovered and deposited in the public archives according to each established format. Database systems in the public archives provide not only an easy-to-use, flexible interface to the public, but also in silico analysis tools of unidentified sequence data. Of such in silico analysis tools, multiple sequence alignment [1] methods relying on pairwise alignment and Smith-Waterman algorithm [2] enable us to identify unknown DNA, protein sequences or phylogenetic relation among several species. However, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST was combined with a clustering tool. Our clustering and annotating tool is summarized as the following steps: (1) construction of suffix tree; (2) masking of cross-matching pairs; (3) clustering of gene sequences and (4) annotating gene clusters by BLAST search. The system was successfully evaluated with 22 gene sequences in the pyrubate pathway of bacteria, clustering 7 clusters and finding out representative common subsequences of each cluster

Context-awareness User Analysis based on Clustering Algorithm (클러스터링 알고리즘기반의 상황인식 사용자 분석)

  • Lee, Kang-whan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.7
    • /
    • pp.942-948
    • /
    • 2020
  • In this paper, we propose a clustered algorithm that possible more efficient user distinction within clustering using context-aware attribute information. In typically, the data provided to classify interrelationships within cluster information in the process of clustering data will be as a degrade factor if new or newly processing information is treated as contaminated information in comparative information. In this paper, we have developed a clustering algorithm that can extract user's recognition information to solve this problem in using K-means algorithm. The proposed algorithm analyzes the user's clustering attributed parameters from user clusters using accumulated information and clustering according to their attributes. The results of the simulation with the proposed algorithm showed that the user management system was more adaptable in terms of classifying and maintaining multiple users in clusters.

Metro Station Clustering based on Travel-Time Distributions (통행시간 분포 기반의 전철역 클러스터링)

  • Gong, InTaek;Kim, DongYun;Min, Yunhong
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.2
    • /
    • pp.193-204
    • /
    • 2022
  • Smart card data is representative mobility data and can be used for policy development by analyzing public transportation usage behavior. This paper deals with the problem of classifying metro stations using metro usage patterns as one of these studies. Since the previous papers dealing with clustering of metro stations only considered traffic among usage behaviors, this paper proposes clustering considering traffic time as one of the complementary methods. Passengers at each station were classified into passengers arriving at work time, arriving at quitting time, leaving at work time, and leaving at quitting time, and then the estimated shape parameter was defined as the characteristic value of the station by modeling each transit time to Weibull distribution. And the characteristic vectors were clustered using the K-means clustering technique. As a result of the experiment, it was observed that station clustering considering pass time is not only similar to the clustering results of previous studies, but also enables more granular clustering.

Min-Distance Hop Count based Multi-Hop Clustering In Non-uniform Wireless Sensor Networks

  • Kim, Eun-Ju;Kim, Dong-Joo;Park, Jun-Ho;Seong, Dong-Ook;Lee, Byung-Yup;Yoo, Jae-Soo
    • International Journal of Contents
    • /
    • v.8 no.2
    • /
    • pp.13-18
    • /
    • 2012
  • In wireless sensor networks, an energy efficient data gathering scheme is one of core technologies to process a query. The cluster-based data gathering methods minimize the energy consumption of sensor nodes by maximizing the efficiency of data aggregation. However, since the existing clustering methods consider only uniform network environments, they are not suitable for the real world applications that sensor nodes can be distributed unevenly. To solve such a problem, we propose a balanced multi-hop clustering scheme in non-uniform wireless sensor networks. The proposed scheme constructs a cluster based on the logical distance to the cluster head using a min-distance hop count. To show the superiority of our proposed scheme, we compare it with the existing clustering schemes in sensor networks. Our experimental results show that our proposed scheme prolongs about 48% lifetime over the existing methods on average.

New Optimization Algorithm for Data Clustering (최적화에 기반 한 데이터 클러스터링 알고리즘)

  • Kim, Ju-Mi
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.3
    • /
    • pp.31-45
    • /
    • 2007
  • Large data handling is one of critical issues that the data mining community faces. This is particularly true for computationally intense tasks such as data clustering. Random sampling of instances is one possible means of achieving large data handling, but a pervasive problem with this approach is how to deal with the noise in the evaluation of the learning algorithm. This paper develops a new optimization based clustering approach using an algorithm specifically designed for noisy performance. Numerical results show this algorithm better than the other algorithms such as PAM and CLARA. Also with this algorithm substantial benefits can be achieved in terms of computational time without sacrificing solution quality using partial data.

  • PDF

Topical Clustering Techniques of Twitter Documents Using Korean Wikipedia (한글 위키피디아를 이용한 트위터 문서의 주제별 클러스터링 기법)

  • Chang, Jae-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.5
    • /
    • pp.189-196
    • /
    • 2014
  • Recently, the need for retrieving documents is growing in SNS environment such as twitter. For supporting the twitter search, a clustering technique classifying the massively retrieved documents in terms of topics is required. However, due to the nature of twitter, there is a limit in applying previous simple techniques to clustering the twitter documents. To overcome such problem, we propose in this paper a new clustering technique suitable to twitter environment. In proposed method, we augment new terms to feature vectors representing the twitter documents, and recalculate the weights of features using Korean Wikipedia. In addition, we performed the experiments with Korean twitter documents, and proved the usability of proposed method through performance comparison with the previous techniques.

Multi-Objective Genetic Algorithm based on Multi-Robot Positions for Scheduling Problems (스케줄링 문제를 위한 멀티로봇 위치 기반 다목적 유전 알고리즘)

  • Choi, Jong Hoon;Kim, Je Seok;Jeong, Jin Han;Kim, Jung Min;Park, Jahng Hyon
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.31 no.8
    • /
    • pp.689-696
    • /
    • 2014
  • This paper presents a scheduling problem for a high-density robotic workcell using multi-objective genetic algorithm. We propose a new algorithm based on NSGA-II(Non-dominated Sorting Algorithm-II) which is the most popular algorithm to solve multi-objective optimization problems. To solve the problem efficiently, the proposed algorithm divides the problem into two processes: clustering and scheduling. In clustering process, we focus on multi-robot positions because they are fixed in manufacturing system and have a great effect on task distribution. We test the algorithm by changing multi-robot positions and compare it to previous work. Test results shows that the proposed algorithm is effective under various conditions.

A Fuzzy Clustering Method based on Genetic Algorithm

  • Jo, Jung-Bok;Do, Kyeong-Hoon;Linhu Zhao;Mitsuo Gen
    • Proceedings of the IEEK Conference
    • /
    • 2000.07b
    • /
    • pp.1025-1028
    • /
    • 2000
  • In this paper, we apply to a genetic algorithm for fuzzy clustering. We propose initialization procedure and genetic operators such as selection, crossover and mutation, which are suitable for solving the problems. To illustrate the effectiveness of the proposed algorithm, we solve the manufacturing cell formation problem and present computational comparisons to generalized Fuzzy c-Means algorithm.

  • PDF

Improved VRP & GA-TSP Model for Multi-Logistics Center (복수물류센터에 대한 VRP 및 GA-TSP의 개선모델개발)

  • Lee, Sang-Cheol;Yu, Jeong-Cheol
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.8 no.5
    • /
    • pp.1279-1288
    • /
    • 2007
  • A vehicle routing problem with time constraint is one of the must important problem in distribution and logistics. In practice, the service for a customer must start and finish within a given delivery time. This study is concerned about the development of a model to optimize vehicle routing problem under the multi-logistics center problem. And we used a two-step approach with an improved genetic algorithm. In step one, a sector clustering model is developed by transfer the multi-logistics center problem to a single logistics center problem which is more easy to be solved. In step two, we developed a GA-TSP model with an improved genetic algorithm which can search a optimize vehicle routing with given time constraints. As a result, we developed a Network VRP computer programs according to the proposed solution VRP used ActiveX and distributed object technology.

  • PDF

An Image Contrast Enhancement Technique Using Integrated Adaptive Fuzzy Clustering Model (IAFC 모델을 이용한 영상 대비 향상 기법)

  • 이금분;김용수
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.279-282
    • /
    • 2001
  • This paper presents an image contrast enhancement technique for improving the low contrast images using the improved IAFC(Integrated Adaptive Fuzzy Clustering) Model. The low pictorial information of a low contrast image is due to the vagueness or fuzziness of the multivalued levels of brightness rather than randomness. Fuzzy image processing has three main stages, namely, image fuzzification, modification of membership values, and image defuzzification. Using a new model of automatic crossover point selection, optimal crossover point is selected automatically. The problem of crossover point selection can be considered as the two-category classification problem. The improved MEC can classify the image into two classes with unsupervised teaming rule. The proposed method is applied to some experimental images with 256 gray levels and the results are compared with those of the histogram equalization technique. We utilized the index of fuzziness as a measure of image quality. The results show that the proposed method is better than the histogram equalization technique.

  • PDF