• 제목/요약/키워드: Clustered data

검색결과 555건 처리시간 0.028초

Enhanced Locality Sensitive Clustering in High Dimensional Space

  • Chen, Gang;Gao, Hao-Lin;Li, Bi-Cheng;Hu, Guo-En
    • Transactions on Electrical and Electronic Materials
    • /
    • 제15권3호
    • /
    • pp.125-129
    • /
    • 2014
  • A dataset can be clustered by merging the bucket indices that come from the random projection of locality sensitive hashing functions. It should be noted that for this to work the merging interval must be calculated first. To improve the feasibility of large scale data clustering in high dimensional space we propose an enhanced Locality Sensitive Hashing Clustering Method. Firstly, multiple hashing functions are generated. Secondly, data points are projected to bucket indices. Thirdly, bucket indices are clustered to get class labels. Experimental results showed that on synthetic datasets this method achieves high accuracy at much improved cluster speeds. These attributes make it well suited to clustering data in high dimensional space.

Robustness, Data Analysis, and Statistical Modeling: The First 50 Years and Beyond

  • Barrios, Erniel B.
    • Communications for Statistical Applications and Methods
    • /
    • 제22권6호
    • /
    • pp.543-556
    • /
    • 2015
  • We present a survey of contributions that defined the nature and extent of robust statistics for the last 50 years. From the pioneering work of Tukey, Huber, and Hampel that focused on robust location parameter estimation, we presented various generalizations of these estimation procedures that cover a wide variety of models and data analysis methods. Among these extensions, we present linear models, clustered and dependent observations, times series data, binary and discrete data, models for spatial data, nonparametric methods, and forward search methods for outliers. We also present the current interest in robust statistics and conclude with suggestions on the possible future direction of this area for statistical science.

다변량기법을 활용한 용담호 수질측정지점 유사성 연구 (A Study on Measuring the Similarity Among Sampling Sites in Lake Yongdam with Water Quality Data Using Multivariate Techniques)

  • 이요상;권세혁
    • 환경영향평가
    • /
    • 제18권6호
    • /
    • pp.401-409
    • /
    • 2009
  • Multivariate statistical approaches to classify sampling sites with measuring their similarity by water quality data and understand the characteristics of classified clusters have been discussed for the optimal water quality monitering network. For empirical study, data of two years (2005, 2006) at the 9 sampling sites with the combination of 2 depth levels and 7 important variables related to water quality is collected in Yongdam reservoir. The similarity among sampling sites is measured with Euclidean distances of water quality related variables and they are classified by hierarchical clustering method. The clustered sites are discussed with principal component variables in the view of the geographical characteristics of them and reducing the number of measuring sites. Nine sampling sites are clustered as follows; One cluster of 5, 6, and 7 sampling sites shows the characteristic of low water depth and main stream of water. The sites of 2 and 4 are clustered into the same group by characteristics of hydraulics which come from that of main stream. But their changing pattern of water quality looks like different since the site of 2 is near to dam. The sampling sites of 3, 8, and 9 are individually positioned due to the different tributary.

에드-혹 네트워크에서 신뢰성 있는 클러스터 기반 그룹 멀티캐스트 방식에 관한 연구 (A Study on a Robust Clustered Group Multicast in Ad-hoc Networks)

  • 박양재;이정현
    • 정보처리학회논문지C
    • /
    • 제10C권2호
    • /
    • pp.163-170
    • /
    • 2003
  • 본 논문에서는 에드-혹 네트워크에서 조합가중치 클러스터링 알고리즘을 적용하여 강건하고 신뢰성 있는 클러스터 기반의 그룹 멀티캐스트 방식을 제안한다. 에드-혹 네트워크는 고정된 통신 하부 구조의 도움 없이 이동 단말기로만 구성된 무선 네트워크이다. 제한된 대역폭과 높은 이동성으로 인하여 에드-혹 네트워크에서의 라우팅 프로토콜은 강건하고, 간단하면서 에너지 소비를 최소화하여야 한다. WCGM(Weighted Cluster Group Multicast)방식은 조합 가중치 다중 클러스터 기반 구조를 이용하고 기존의 FGMP(forwarding Group Multicast Protocol)방식의 장점인 제한적인 플러딩에 의한 데이터 전달방식은 유지하면서 클러스터 헤드 선출시 조합가중치를 적용한다. 이것은 안정적이며 강건한 데이터 전달 구조를 가지기 때문에 데이터 전달 구조를 유지하기 위한 오버헤드(Overhead)와 데이터 전달을 위한 오버헤드를 모두 줄이는 효과가 있었다.

클러스터 기반 센서 망에서 데이터 전달 방법들의 성능 분석 (An Evaluation of Data Delivery Mechanisms in Clustered Sensor Networks)

  • 박태근
    • 한국통신학회논문지
    • /
    • 제31권3A호
    • /
    • pp.304-310
    • /
    • 2006
  • 본 논문은 클러스터 기반 센서 망에 적합한 에너지 효율적인 토폴로지 관리 기법 개발을 위한 기반 연구로서, 세가지 종류의 데이터 전달 방법의 성능을 비교 분석한다. 첫 번째 방법에서는 각 클러스터의 헤더들만 무선 송수신 모듈을 활성화시켜 RTS/CTS/DATA/ACK 메시지 송수신에 참여하구 두 번째 방법에서는 각 클러스터당 다수 노드들이 메시지 교환에 참여한다. 마지막 방법에서는 각 클러스터의 헤더들만 RTS/CTS 메시지 교환을 위하여 무선 송수신 모듈을 활성화하는데, 자신의 클러스터 ID가 목적지 클러스터로 지정되어 있는 RTS 메시지를 수신한 클러스터 헤더는 다수 노드들의 무선 송수신 모듈을 활성화시켜 DATA 메시지 수신과 ACK 메시지 송신에 참여하도록 한다. 시뮬레이션을 통하여, 클러스터당 활성화될 노드의 수와 부하 및 패킷 손실 확률에 따라 이상의 세 가지 방법의 에너지 소모량을 비교 분석한다.

Confidence Interval for the Difference or Ratio of Two Median Failure Times from Clustered Survival Data

  • Lee, Seung-Yeoun;Jung, Sin-Ho
    • 응용통계연구
    • /
    • 제22권2호
    • /
    • pp.355-364
    • /
    • 2009
  • A simple method is proposed for constructing nonparametric confidence intervals for the difference or ratio of two median failure times. The method applies when clustered survival data with censoring is randomized either (I) under cluster randomization or (II) subunit randomization. This method is simple to calculate and is based on non-parametric density estimation. The proposed method is illustrated with the otology study data and HL-A antigen study data. Moreover, the simulation results are reported for practical sample sizes.

A Clustered Dwarf Structure to Speed up Queries on Data Cubes

  • Bao, Yubin;Leng, Fangling;Wang, Daling;Yu, Ge
    • Journal of Computing Science and Engineering
    • /
    • 제1권2호
    • /
    • pp.195-210
    • /
    • 2007
  • Dwarf is a highly compressed structure, which compresses the cube by eliminating the semantic redundancies while computing a data cube. Although it has high compression ratio, Dwarf is slower in querying and more difficult in updating due to its structure characteristics. We all know that the original intention of data cube is to speed up the query performance, so we propose two novel clustering methods for query optimization: the recursion clustering method which clusters the nodes in a recursive manner to speed up point queries and the hierarchical clustering method which clusters the nodes of the same dimension to speed up range queries. To facilitate the implementation, we design a partition strategy and a logical clustering mechanism. Experimental results show our methods can effectively improve the query performance on data cubes, and the recursion clustering method is suitable for both point queries and range queries.

클러스터 이기종 셀룰러 네트워크를 위한 합동 셀 그룹핑 및 사용자 접속 기법 (Joint Cell Grouping and User Association Scheme for Clustered Heterogeneous Cellular Networks)

  • 박진배;이형열;최우리;김광순
    • 한국통신학회논문지
    • /
    • 제38A권6호
    • /
    • pp.520-527
    • /
    • 2013
  • 본 논문에서는 클러스터 이기종 셀룰러 네트워크에서 반동적 셀 그룹핑을 위한 합동 셀 그룹핑 및 사용자 접속 기법을 제안한다. 최근에는 핫스팟에서의 폭발적인 데이터 요구량을 지원하기 위해서, 소형 셀들이 기존의 매크로 기지국들과 함께 설치되고 있다. 이러한 클러스터 이기종 셀룰러 네트워크에서는 간섭과 부하 불균형으로 인하여 성능 열화가 발생할 수 있다. 본 논문에서 제안하는 기법은 사용자들의 비례공평이 최대화되도록 두 가지 문제를 합동적으로 다룬다. 모의실험을 통하여, 기존의 기법들보다 제안하는 기법을 통해 훨씬 더 향상된 사용자 평균 전송률 및 사용자간의 비례공평을 얻을 수 있음을 알아본다.

클러스터드 VOD 서버에서 가변 비트율을 고려한 스트라이핑 정책 (A Striping Strategy Considering Variable Bit Rate in Clustered VOD Servers)

  • 이재호;김종훈;안유정
    • 정보교육학회논문지
    • /
    • 제2권1호
    • /
    • pp.10-18
    • /
    • 1998
  • In a VOD server, media data are usually encoded by VBR compression technique such as MPEG, therefore, media stream rates vary. We propose a striping strategy called VCS considering VBR compression in Clustered VOD servers. Simulation are conducted to evaluate and compare the new strategy with a known striping strategy. The results show that the VCS strategy improves the performance.

  • PDF

저수지 수질조사 지점간 유사성 분석 (A Study on Measuring the Similarity Among Sampling Sites in Lake)

  • 이요상;고덕구;이현석
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2010년도 학술발표회
    • /
    • pp.957-961
    • /
    • 2010
  • Multivariate statistical approaches to classify sampling sites with measuring their similarity by water quality data. For empirical study, data of two years at the 9 sampling sites with the combination of 2 depth levels and 7 important variables related to water quality is collected in reservoir. The similarity among sampling sites is measured with Euclidean distances of water quality related variables and they are classified by hierarchical clustering method. The clustered sites are discussed with principal component variables in the view of the geographical characteristics of them and reducing the number of measuring sites. Nine sampling sites are clustered as follows; One cluster of 5, 6, and 7 sampling sites shows the characteristic of low water depth and main stream of water. The sites of 2 and 4 are clustered into the same group by characteristics of hydraulics which come from that of main stream. But their changing pattern of water quality looks like different since the site of 2 is near to dam. The sampling sites of 3, 8, and 9 are individually positioned due to the different tributary.

  • PDF