• 제목/요약/키워드: k-NN graph

검색결과 8건 처리시간 0.018초

MapReduce 환경에서 재그룹핑을 이용한 Locality Sensitive Hashing 기반의 K-Nearest Neighbor 그래프 생성 알고리즘의 개선 (An Improvement in K-NN Graph Construction using re-grouping with Locality Sensitive Hashing on MapReduce)

  • 이인희;오혜성;김형주
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제21권11호
    • /
    • pp.681-688
    • /
    • 2015
  • k-Nearest Neighbor(k-NN)그래프는 모든 노드에 대한 k-NN 정보를 나타내는 데이터 구조로써, 협업 필터링, 유사도 탐색과 여러 정보검색 및 추천 시스템에서 k-NN그래프를 활용하고 있다. 이러한 장점에도 불구하고 brute-force방법의 k-NN그래프 생성 방법은 $O(n^2)$의 시간복잡도를 갖기 때문에 빅데이터 셋에 대해서는 처리가 곤란하다. 따라서, 고차원, 희소 데이터에 효율적인 Locality Sensitive Hashing 기법을 (key, value)기반의 분산환경인 MapReduce환경에서 사용하여 k-NN그래프를 생성하는 알고리즘이 연구되고 있다. Locality Sensitive Hashing 기법을 사용하여 사용자를 이웃후보 그룹으로 만들고 후보내의 쌍에 대해서만 brute-force하게 유사도를 계산하는 two-stage 방법을 MapReduce환경에서 사용하였다. 특히, 그래프 생성과정 중 유사도 계산하는 부분이 가장 많은 시간이 소요되므로 후보 그룹을 어떻게 만드는 것인지가 중요하다. 기존의 방법은 사이즈가 큰 후보그룹을 방지하는데 한계점이 있다. 본 논문에서는 효율적인 k-NN 그래프 생성을 위하여 사이즈가 큰 후보그룹을 재구성하는 알고리즘을 제시하였다. 실험을 통해 본 논문에서 제안한 알고리즘이 그래프의 정확성, Scan Rate측면에서 좋은 성능을 보임을 확인하였다.

Balanced Canopy Clustering에 기반한 일반적 k-인접 이웃 그래프 생성 알고리즘 (A Generic Algorithm for k-Nearest Neighbor Graph Construction Based on Balanced Canopy Clustering)

  • 박영기;황혜수;이상구
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제21권4호
    • /
    • pp.327-332
    • /
    • 2015
  • k-인접 이웃 그래프는 모든 정점에 대한 k-NN 정보를 나타내는 데이터 구조로서, 많은 정보검색 및 추천 시스템에서 k-인접 이웃 그래프를 활용하고 있다. 현재까지 k-인접 이웃 그래프를 생성하는 다양한 방법들이 제안되었지만, 다음의 두 조건을 동시에 만족하는 알고리즘은 제안되지 못했다: (1) 특정유사도 척도를 가정하지 않는다. (2) 정점 또는 차원의 수가 증가하더라도 정확도가 감소하지 않는다. 본 논문에서는 balanced canopy clustering을 이용하여 위 두 조건을 모두 만족하는 k-NN 그래프 생성 알고리즘을 제안한다. 실험 결과, 정점과 차원의 수에 상관없이 기본 알고리즘에 비해 5배 이상 빠르면서 약 92%의 정확도를 유지했다. 본 알고리즘은 새로운 유사도 척도를 사용하거나, 높은 정확도를 보장해야 할 경우 효과적으로 사용될 수 있다.

Robust Similarity Measure for Spectral Clustering Based on Shared Neighbors

  • Ye, Xiucai;Sakurai, Tetsuya
    • ETRI Journal
    • /
    • 제38권3호
    • /
    • pp.540-550
    • /
    • 2016
  • Spectral clustering is a powerful tool for exploratory data analysis. Many existing spectral clustering algorithms typically measure the similarity by using a Gaussian kernel function or an undirected k-nearest neighbor (kNN) graph, which cannot reveal the real clusters when the data are not well separated. In this paper, to improve the spectral clustering, we consider a robust similarity measure based on the shared nearest neighbors in a directed kNN graph. We propose two novel algorithms for spectral clustering: one based on the number of shared nearest neighbors, and one based on their closeness. The proposed algorithms are able to explore the underlying similarity relationships between data points, and are robust to datasets that are not well separated. Moreover, the proposed algorithms have only one parameter, k. We evaluated the proposed algorithms using synthetic and real-world datasets. The experimental results demonstrate that the proposed algorithms not only achieve a good level of performance, they also outperform the traditional spectral clustering algorithms.

Spectral Clustering with Sparse Graph Construction Based on Markov Random Walk

  • Cao, Jiangzhong;Chen, Pei;Ling, Bingo Wing-Kuen;Yang, Zhijing;Dai, Qingyun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권7호
    • /
    • pp.2568-2584
    • /
    • 2015
  • Spectral clustering has become one of the most popular clustering approaches in recent years. Similarity graph constructed on the data is one of the key factors that influence the performance of spectral clustering. However, the similarity graphs constructed by existing methods usually contain some unreliable edges. To construct reliable similarity graph for spectral clustering, an efficient method based on Markov random walk (MRW) is proposed in this paper. In the proposed method, theMRW model is defined on the raw k-NN graph and the neighbors of each sample are determined by the probability of the MRW. Since the high order transition probabilities carry complex relationships among data, the neighbors in the graph determined by our proposed method are more reliable than those of the existing methods. Experiments are performed on the synthetic and real-world datasets for performance evaluation and comparison. The results show that the graph obtained by our proposed method reflects the structure of the data better than those of the state-of-the-art methods and can effectively improve the performance of spectral clustering.

TextRank 알고리즘을 이용한 문서 범주화 (Text Categorization Using TextRank Algorithm)

  • 배원식;차정원
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제16권1호
    • /
    • pp.110-114
    • /
    • 2010
  • 본 논문에서는 TextRank 알고리즘을 이용한 문서 범주화 방법에 대해 기술한다. TextRank 알고리즘은 그래프 기반의 순위화 알고리즘이다. 문서에서 나타나는 각각의 단어를 노드로, 단어들 사이의 동시출현성을 이용하여 간선을 만들면 문서로부터 그래프를 생성할 수 있다. TextRank 알고리즘을 이용하여 생성된 그래프로부터 중요도가 높은 단어를 선택하고, 그 단어와 인접한 단어를 묶어 하나의 자질로 사용하여 문서 분류를 수행하였다. 동시출현 자질(인접한 단어 쌍)은 단어 하나가 갖는 의미를 보다 명확하게 만들어주므로 문서 분류에 좋은 자질로 사용될 수 있을 것이라 가정하였다. 문서 분류기로는 지지 벡터 기계, 베이지언 분류기, 최대 엔트로피 모델, k-NN 분류기 등을 사용하였다. 20 Newsgroups 문서 집합을 사용한 실험에서 모든 분류기에서 제안된 방법을 사용했을 때, 문서 분류 성능이 향상된 결과를 확인할 수 있었다.

Smooth Formation Navigation of Multiple Mobile Robots for Avoiding Moving Obstacles

  • Chen Xin;Li Yangmin
    • International Journal of Control, Automation, and Systems
    • /
    • 제4권4호
    • /
    • pp.466-479
    • /
    • 2006
  • This paper addresses a formation navigation issue for a group of mobile robots passing through an environment with either static or moving obstacles meanwhile keeping a fixed formation shape. Based on Lyapunov function and graph theory, a NN formation control is proposed, which guarantees to maintain a formation if the formation pattern is $C^k,\;k\geq1$. In the process of navigation, the leader can generate a proper trajectory to lead formation and avoid moving obstacles according to the obtained information. An evolutionary computational technique using particle swarm optimization (PSO) is proposed for motion planning so that the formation is kept as $C^1$ function. The simulation results demonstrate that this algorithm is effective and the experimental studies validate the formation ability of the multiple mobile robots system.

7구역진단기와 심박변이도의 연관성에 대한 임상연구 (A Clinical Study of the Relation between the 7-Zone-Diagnostic System and Heart Rate Variability)

  • 송범용;권경숙
    • Journal of Acupuncture Research
    • /
    • 제25권1호
    • /
    • pp.15-23
    • /
    • 2008
  • Objectives : The aim of our study was to demonstrate the clinical application of a diagnosis relating the 7-zone-diagnostic system and heart rate variability. Materials and Methods : Subjects were divided into two groups according to the factor AA form of the 7-zone-diagnostic system(VEGA-DFM722, VEGA, Germany). Subjects in group A showed a factor-AA red bar graph in which zone 2 was higher than the normal range, and zone 6 was lower than the normal range. Subjects in group B showed a factor-AA red bar graph in which zone 2 was lower than the normal range, and zone 6 was higher than the normal range. We investigated how to differentiate the index of heart-rate variability(HRV, LX-3202, LAXTHA, Korea) for each group. We did independent sample t-tests and evaluated the results of the HRV at the 5% significance level using SPSS 10.0 for Windows. Results : The differences of the MeanRR, MeanHRV, SDNN, complexity, HRV-index, RMSSD, SDSD, and pNN50 values between the groups was not significant. The differences of the Ln(TP), Ln(VLF), Ln(HF), LF/(LF+HF), LF/HF, norm LF, and norm HF values between the groups was also not significant. The differences were not significant, but generally the values of SDNN, complexity, RMSSD, SDSD, Ln(VLF), Ln(HF) and norm LF for group B were higher than those for group A, and the values of pNN50 and norm HF for group B were lower than those for group A. Conclusions : This study suggests that differences in the HRV values between the groups was not significant, but group B has a tendency to be healthier than group A because of stress. Accordingly, further study will be required.

  • PDF

7구역진단기의 Factor AA 제4구역 유형과 심박변이도(HRV)와의 상관성 연구 (A Study on the Correlation between the Patterns of the Zone 4 of Factor AA in 7-Zone-diagnostic System and Heart Rate Variability)

  • 유정석;조이현;이진석;이휘용;송범용
    • Journal of Acupuncture Research
    • /
    • 제25권4호
    • /
    • pp.71-80
    • /
    • 2008
  • Objectives : The 7-zonediagnostic system is a diagnostic device to predetermine bodily locations by measuring the energy of body. This study was to investigate the relation between the different patterns of Zone 4 of Factor AA in VEGA DFM 722 (VEGA, Germany), 7-zone-diagnositic system and heart rate variability. Methods : We made three groups according to the Factor AA patterns of VEGA DFM 722. The Factor AA pattern of Group A is that the red bar graph of zone 4 was higher than the normal range. The Factor AA patterns of Group B was that the red bar graph of zone 4 was located at the normal range. The Factor AA patterns of Group C was that the red bar graph of zone 4 was lower than the normal range. We investigated how to difference of the index of heart rate variability(HRV, LX-3202, LAXTHA, Korea) according to each groups. Results : Complexity, HRV-index, RMSSD, SDSD values of Group B were higher than other Groups. pNN50 values of Group B were lower than other groups. And Ln(TP), Ln(VLF), Ln(LF), Ln(HF) values of Group B were higher than other groups. Conclusions : We presumed that Group B was healthier than other groups for the stress.

  • PDF