• 제목/요약/키워드: Multiple clustering

검색결과 357건 처리시간 0.028초

MOC: 다중 오브젝트 클러스터링을 통한 BSD VM의 페이지-아웃 성능 향상 (MOC: A Multiple-Object Clustering Scheme for High Performance of Page-out in BSD VM)

  • 양종철;안우현;오재원
    • 한국정보과학회논문지:시스템및이론
    • /
    • 제36권6호
    • /
    • pp.476-487
    • /
    • 2009
  • BSD 가상 메모리 시스템(BSD VM)은 페이지-아웃 시 디스크 I/O 횟수를 줄이기 위해 클러스터링 기법을 사용한다. 이 기법은 페이지-아웃 대상 페이지와 가상 메모리 공간에서 인접한 변경 페이지들을 그 대상 페이지와 함께 클러스터(그룹)를 만들어 한 번의 디스크 I/O로 디스크에 저장한다. 하지만 응용 프로그램이 가상 메모리 공간에서 서로 인접하지 않은 다수의 페이지들을 변경하면 클러스터들의 크기가 작아져 클러스터링의 효과가 감소된다. 이 문제점을 해결하기 위해 본 논문에서는 Multiple-Object Clustering(MOC) 기법을 제안한다. MOC는 클러스터별로 디스크 I/O를 하는 대신 여러 클러스터들을 모아 단일 디스크 쓰기로 페이지-아웃시킨다. 따라서 이 페치지-아웃 방식은 디스크 I/O 횟수를 감소시켜 시스틴 성능을 크게 향상시킨다. MOC는 성능 검증을 위해 FreeBSD 6.2 운영체제 커널에서 구현되었다. NS2, Scimark2 SOR, nbench LU 벤치마크를 통한 MOC 성능 측정 결과 기존 BSD VM보다 MOC의 실행 씨간이 9~45% 단축되었다.

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young
    • 응용통계연구
    • /
    • 제24권2호
    • /
    • pp.315-321
    • /
    • 2011
  • Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

Energy Efficient Cooperative LEACH Protocol for Wireless Sensor Networks

  • Asaduzzaman, Asaduzzaman;Kong, Hyung-Yun
    • Journal of Communications and Networks
    • /
    • 제12권4호
    • /
    • pp.358-365
    • /
    • 2010
  • We develop a low complexity cooperative diversity protocol for low energy adaptive clustering hierarchy (LEACH) based wireless sensor networks. A cross layer approach is used to obtain spatial diversity in the physical layer. In this paper, a simple modification in clustering algorithm of the LEACH protocol is proposed to exploit virtual multiple-input multiple-output (MIMO) based user cooperation. In lieu of selecting a single cluster-head at network layer, we proposed M cluster-heads in each cluster to obtain a diversity order of M in long distance communication. Due to the broadcast nature of wireless transmission, cluster-heads are able to receive data from sensor nodes at the same time. This fact ensures the synchronization required to implement a virtual MIMO based space time block code (STBC) in cluster-head to sink node transmission. An analytical method to evaluate the energy consumption based on BER curve is presented. Analysis and simulation results show that proposed cooperative LEACH protocol can save a huge amount of energy over LEACH protocol with same data rate, bit error rate, delay and bandwidth requirements. Moreover, this proposal can achieve higher order diversity with improved spectral efficiency compared to other virtual MIMO based protocols.

A Density Peak Clustering Algorithm Based on Information Bottleneck

  • Yongli Liu;Congcong Zhao;Hao Chao
    • Journal of Information Processing Systems
    • /
    • 제19권6호
    • /
    • pp.778-790
    • /
    • 2023
  • Although density peak clustering can often easily yield excellent results, there is still room for improvement when dealing with complex, high-dimensional datasets. One of the main limitations of this algorithm is its reliance on geometric distance as the sole similarity measurement. To address this limitation, we draw inspiration from the information bottleneck theory, and propose a novel density peak clustering algorithm that incorporates this theory as a similarity measure. Specifically, our algorithm utilizes the joint probability distribution between data objects and feature information, and employs the loss of mutual information as the measurement standard. This approach not only eliminates the potential for subjective error in selecting similarity method, but also enhances performance on datasets with multiple centers and high dimensionality. To evaluate the effectiveness of our algorithm, we conducted experiments using ten carefully selected datasets and compared the results with three other algorithms. The experimental results demonstrate that our information bottleneck-based density peaks clustering (IBDPC) algorithm consistently achieves high levels of accuracy, highlighting its potential as a valuable tool for data clustering tasks.

Cluster Analysis Using Principal Coordinates for Binary Data

  • Chae, Seong-San;Kim, Jeong, Il
    • Communications for Statistical Applications and Methods
    • /
    • 제12권3호
    • /
    • pp.683-696
    • /
    • 2005
  • The results of using principal coordinates prior to cluster analysis are investigated on the samples from multiple binary outcomes. The retrieval ability of the known clustering algorithm is significantly improved by using principal coordinates instead of using the distance directly transformed from four association coefficients for multiple binary variables.

서픽스트리 클러스터링 방법과 블라스트를 통합한 유전자 서열의 클러스터링과 기능검색에 관한 연구 (A Study on Clustering and Identifying Gene Sequences using Suffix Tree Clustering Method and BLAST)

  • 한상일;이성근;김경훈;이주영;김영한;황규석
    • 제어로봇시스템학회논문지
    • /
    • 제11권10호
    • /
    • pp.851-856
    • /
    • 2005
  • The DNA and protein data of diverse species have been daily discovered and deposited in the public archives according to each established format. Database systems in the public archives provide not only an easy-to-use, flexible interface to the public, but also in silico analysis tools of unidentified sequence data. Of such in silico analysis tools, multiple sequence alignment [1] methods relying on pairwise alignment and Smith-Waterman algorithm [2] enable us to identify unknown DNA, protein sequences or phylogenetic relation among several species. However, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST was combined with a clustering tool. Our clustering and annotating tool is summarized as the following steps: (1) construction of suffix tree; (2) masking of cross-matching pairs; (3) clustering of gene sequences and (4) annotating gene clusters by BLAST search. The system was successfully evaluated with 22 gene sequences in the pyrubate pathway of bacteria, clustering 7 clusters and finding out representative common subsequences of each cluster

가변어휘 핵심어 검출 성능 향상을 위한 비핵심어 모델 (Non-Keyword Model for the Improvement of Vocabulary Independent Keyword Spotting System)

  • 김민제;이정철
    • 한국음향학회지
    • /
    • 제25권7호
    • /
    • pp.319-324
    • /
    • 2006
  • 본 논문에서는 화자독립 가변어휘 핵심어 검출기의 성능을 개선하기 위하여 두 가지의 새로운 비핵심어 모델링 방법을 제안한다. 첫째는 K-means 알고리즘 기반 monophone 군집화 방법을 개선하기 위해 monophone을 state단위로 결정트리를 기반으로 군집화하여 비핵심어를 모델링하는 방법이다. 둘째는 single state multiple mixture 방법을 개선하기 위해 음절단위 multi-state multiple mixture 방법으로 모델링하는 방법이다. 실험에서 ETRI 표준 한국어 공통음성 단어 DB를 이용하여 트라이폰 모델을 훈련하였고, 훈련에 사용하지 않은 음성데이터를 이용하여 핵심어 검출closed 테스트를 수행하였다. 그리고 사무실 환경에서 4명의 화자가 각각 100문장씩 발성한 400문장의 음성데이터를 이용하여 100단어 핵심어 검출 open 테스트를 수행하였다. 실험 결과 결정트리기반 상태 군집화 방법이 기존의 K-means 알고리듬 기반 monophone clustering 방법보다 핵심어 검출 성능이 28%/29%(closed/open test) 향상되었다 그리고 음절단위 multi-state multiple mixture 방법이 비핵심어 전체를 single state 모델로 구성하는 방법보다 핵심어 검출 성능이 22%/2%(closed/open test) 향상됨으로써 본 논문에서 제안한 두 가지 알고리듬이 우수한 결과를 나타내었다

셀프리 다중안테나 네트워크를 위한 임계값 기반 사용자 중심 클러스터링 (Threshold based User-centric Clustering for Cell-free MIMO Network)

  • 류종열;이웅섭;반태원
    • 한국정보통신학회논문지
    • /
    • 제26권1호
    • /
    • pp.114-121
    • /
    • 2022
  • 본 논문에서는 셀프리 다중안테나 환경에서 네트워크 전체 사용자의 성능을 보장하기 위한 사용자 중심의 클러스터링 기법을 고려한다. 사용자 중심 클러스터링 기법에서 각 사용자는 자신과 연결된 AP(Access Point)들 사이의 대규모 페이딩(large-scale fading) 채널 정보를 이용해 페이딩 계수가 가장 큰 AP와 페이딩 계수의 상대적 크기가 임계값 이상의 값을 갖는 AP들로 클러스터를 구성한다. 사용자 중심으로 구성된 클러스터를 바탕으로 AP들은 분산적인 기법으로 빔형성과 전력할당을 설계하고 이를 이용해 사용자들의 데이터를 협력 전송한다. 시뮬레이션을 통해 주파수 효율 관점에서 사용자 중심 클러스터링의 성능을 검증하고 주어진 환경에서 최적의 성능을 나타내는 임계값을 찾는다.

배전자동화시스템 중앙제어장치 이중화 적용방안 (The Clustering Method Of Central Control System In New Distribution Automation System)

  • 조남훈;하복남;이중호;임성일
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1999년도 하계학술대회 논문집 C
    • /
    • pp.1120-1122
    • /
    • 1999
  • This paper introduces a clustering for Central Control System in New Distribution Automation System. There are three primary benefits to use clustering: improved availability, easier manageability and more cost-effective scalability. Availability: Clustering can automatically detect the failure of an application or server and quickly restart it on a surviving server. Clients only experience a momentary pause in service. Manageability: Clustering lets administrators quickly inspect the status of all cluster resources and easily move workload around onto different servers within a cluster. Scalability: Applications can use the Clustering services through the MSCS Application Programming Interface(API) to do dynamic load balancing and scale across multiple servers within a cluster.

  • PDF

Enhanced Locality Sensitive Clustering in High Dimensional Space

  • Chen, Gang;Gao, Hao-Lin;Li, Bi-Cheng;Hu, Guo-En
    • Transactions on Electrical and Electronic Materials
    • /
    • 제15권3호
    • /
    • pp.125-129
    • /
    • 2014
  • A dataset can be clustered by merging the bucket indices that come from the random projection of locality sensitive hashing functions. It should be noted that for this to work the merging interval must be calculated first. To improve the feasibility of large scale data clustering in high dimensional space we propose an enhanced Locality Sensitive Hashing Clustering Method. Firstly, multiple hashing functions are generated. Secondly, data points are projected to bucket indices. Thirdly, bucket indices are clustered to get class labels. Experimental results showed that on synthetic datasets this method achieves high accuracy at much improved cluster speeds. These attributes make it well suited to clustering data in high dimensional space.