• Title/Summary/Keyword: 유사도 조인

Search Result 1,173, Processing Time 0.026 seconds

A Sampling-based Algorithm for Top-${\kappa}$ Similarity Joins (Top-${\kappa}$ 유사도 조인을 위한 샘플링 기반 알고리즘)

  • Park, Jong Soo
    • Journal of KIISE:Databases
    • /
    • v.41 no.4
    • /
    • pp.256-261
    • /
    • 2014
  • The problem of top-${\kappa}$ set similarity joins finds the top-${\kappa}$ pairs of records ranked by their similarities between two sets of input records. We propose an efficient algorithm to return top-${\kappa}$ similarity join pairs using a sampling technique. From a sample of the input records, we construct a histogram of set similarity joins, and then compute an estimated similarity threshold in the histogram for top-${\kappa}$ join pairs within the error bound of 95% confidence level based on statistical inference. Finally, the estimated threshold is applied to the traditional similarity join algorithm which uses the min-heap structure to get top-${\kappa}$ similarity joins. The experimental results show the good performance of the proposed algorithm on large real datasets.

A Similarity Join Algorithm Using a Median as a Filter (중앙값을 필터로 이용한 유사도 조인 알고리즘)

  • Park, Jong Soo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.2
    • /
    • pp.71-76
    • /
    • 2015
  • In similarity join processing, a general technique employs a generation-verification framework, which includes two phases: the first phase generates a set of candidate pairs from a collection of records; and the second phase verifies each candidate pair by computing real similarity. In order to reduce the number of candidate pairs in the verification phase, the median of one record of each candidate pair is used as a filter in this paper to test whether the other record can has the proper number of overlapped tokens. We propose a similarity join algorithm with the median filter, and show that the proposed algorithm has better performance in execution time than recent algorithms without the filter through extensive experiments on real-world datasets.

Efficient Similarity Joins by Adaptive Prefix Filtering (맞춤 접두 필터링을 이용한 효율적인 유사도 조인)

  • Park, Jong Soo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.4
    • /
    • pp.267-272
    • /
    • 2013
  • As an important operation with many applications such as data cleaning and duplicate detection, the similarity join is a challenging issue, which finds all pairs of records whose similarities are above a given threshold in a dataset. We propose a new algorithm that uses the prefix filtering principle as strong constraints on generation of candidate pairs for fast similarity joins. The candidate pair is generated only when the current prefix token of a probing record shares one prefix token of an indexing record within the constrained prefix tokens by the principle. This generation method needs not to compute an upper bound of the overlap between two records, which results in reduction of execution time. Experimental results show that our algorithm significantly outperforms the previous prefix filtering-based algorithms on real datasets.

Experimental Study on the Material Properties of Unreinforced Masonry Considering Earthquake Load (지진하중을 고려한 비보강 조적조의 재료특성 평가에 관한 실험연구)

  • 김희철;김관중;박진호;홍원기
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.5 no.2
    • /
    • pp.93-101
    • /
    • 2001
  • 본 논문은 국내의 비보강 조적조에 대해 내진성능을 조사하기 위하여 재료측성 평가를 위한 실험연구를 수행하였다. 실험결과를 바탕으로 조적용 모르터의 압축강도식을 제안하였다. 또한 조적용 모르터의 배합비에 따른 조적조 프리즘의 압축강도 특성을 비교하였다. 조적조 프리즘의 압축강도로써 조적조의 탄성계수를 구할 수 있는 약산식을 제시하였으며, 약산식을 사인장 조적조 실험을 통하여 구한 전단탄성계수값과 비교하여 볼 때 타당성을 가지고 있다고 판단된다. 실험결과로써 나온 재료특성 값을 바탕으로 2층 조적조 다세대 주택에 대한 유사동적해석을 수행하였다. 해석결과로 얻은 전단응력과 전단파괴가 나타난 사인장 조적조의 허용전단응력은 유사한 것으로 확인되었다.

  • PDF

Matrix-based Filtering and Load-balancing Algorithm for Efficient Similarity Join Query Processing in Distributed Computing Environment (분산 컴퓨팅 환경에서 효율적인 유사 조인 질의 처리를 위한 행렬 기반 필터링 및 부하 분산 알고리즘)

  • Yang, Hyeon-Sik;Jang, Miyoung;Chang, Jae-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.7
    • /
    • pp.667-680
    • /
    • 2016
  • As distributed computing platforms like Hadoop MapReduce have been developed, it is necessary to perform the conventional query processing techniques, which have been executed in a single computing machine, in distributed computing environments efficiently. Especially, studies on similarity join query processing in distributed computing environments have been done where similarity join means retrieving all data pairs with high similarity between given two data sets. But the existing similarity join query processing schemes for distributed computing environments have a problem of skewed computing load balance between clusters because they consider only the data transmission cost. In this paper, we propose Matrix-based Load-balancing Algorithm for efficient similarity join query processing in distributed computing environment. In order to uniform load balancing of clusters, the proposed algorithm estimates expected computing cost by using matrix and generates partitions based on the estimated cost. In addition, it can reduce computing loads by filtering out data which are not used in query processing in clusters. Finally, it is shown from our performance evaluation that the proposed algorithm is better on query processing performance than the existing one.

XML Join Query Processing using Structured Information from Multiple Documents (다중 문서에서 구조 정보를 이용한 XML 조인 질의 처리)

  • 정성호;김병곤;정헌석;이재호;임해철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10c
    • /
    • pp.100-102
    • /
    • 2002
  • XML 문서에 대한 다양한 질의를 위해서 W3C에서는 XQL, XML-QL, XML-GL, XQUERY와 같은 질의어를 제안하였다. 이들 질의어는 다양한 질의 유형의 분류와 표현은 가능하나, 조인 질의의 경우 단순 조인 질의만을 지원할 뿐, XML 문서의 구조나 텍스트 정보의 유사성을 이용한 보다 다양한 조인 질의에 대한 연구가 미비하였다. 본 논문에서는 다중 문서에 대한 조인 질의를 체계적이고 효과적으로 표현하기 위해, 문서에 대한 조인 질의를 여러 타입으로 분류하였다. 또한 효율적인 질의처리를 위하여 다양한 일반 조인 질의 및 정보검색 기능을 지원하는 유사성 조인 연산자(similarity join operator), 순수 구조 기반 조인을 지원하는 구조 조인 연산자(structured join operator)를 지원하도록 XML 질의어인 QUILT를 확장하였다. 특히, 구조 정보만을 이용한 질의시 구조의 깊이(depth)정보를 이용하여 사용자의 요구에 맞게 질의 검색 범위를 설정하고, XML 문서에 대한 질의 문을 좀더 간결하게 표현할 수 있도록 설계하였다.

  • PDF

A Face Verification using Iterative Light Enhancement in Low Light Environment (저조도 환경에서의 반복적 조도 향상을 이용한 얼굴 검증)

  • Lee, Sanghoon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.1222-1225
    • /
    • 2022
  • 본 논문에서는 저조도 환경에서 촬영된 영상의 조도를 개선하여 얼굴 검증 정확도를 높이는 방법을 제안하였다. 입력 이미지의 조도 개선을 통해 얼굴 검출 정확도를 개선하며, 검출된 얼굴의 반복적인 조도 향상을 통해 생성된 다수의 특징 벡터를 이용하여 얼굴 검증에 이용하였다. 얼굴 검출 및 검증 정확도 측정을 위해 K-FACE 데이터셋을 이용하였다. 저조도 환경에서 촬영된 검증 이미지에 대하여, 제안하는 특징 벡터 합성 방법으로 인해, 동일인 쌍 및 타인 쌍의 유사도 점수 분포의 표준 편차가 줄어드는 경향을 확인했으며, 이로 인해 검증 성능이 높아지는 결과를 얻었다.

  • PDF

Convergence study on the antioxidant effect of crude extracts of Nelumbo nucifera Gaertner (연밥 조추출물의 항산화 효능에 관한 융합 연구)

  • Kim, Hyun-Jin
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.3
    • /
    • pp.53-58
    • /
    • 2016
  • The present study was performed to evaluate the antioxidant effect of crude extracts of Nelumbo nucifera Gaertner. Antioxidant effect was analysed by DPPH-radical scavenging activity, lipid peroxidation and superoxide dismutase (SOD)-like activity. DPPH-radical scavenging activity and SOD-like activity of linoleic acid, glutamic acid, ethyl acetate crude extract and ethyl alcohol crude extract of Nelumbo nucifera Gaertner were dose-dependently increased. However, lipid peroxidation of glutamic acid, linoleic acid, ethyl acetate crude extract and ethyl alcohol crude extract of Nelumbo nucifera Gaertner were time-dependently decreased. The data suggests that crude extracts of Nelumbo nucifera Gaertner may be a putative antioxidant substance and apply the development of medicine through convergence study.

A Framework to Evaluate Communication Quality of Operators in Nuclear Power Plants Using Cosine Similarity (코사인 유사도를 이용한 원자력발전소 운전원 커뮤니케이션 품질 평가 프레임워크)

  • Kim, Seung-Hwan;Park, Jin-Kyun;Han, Sang-Yong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.9
    • /
    • pp.165-172
    • /
    • 2010
  • Communication problems have been regarded as one of the biggest causes in trouble in many industries. This led to extensive research on communication as a part of human error analysis. The results of existing researches have revealed that maintaining a good quality of communication is essential to secure the safety of a large and complex process system. In this paper, we suggested a method to measure the quality of communication during off-normal situation in main control room of nuclear power plants. It evaluates the cosine similarity that is a measure of sentence similarity between two operators by finding the cosine of the angle between them. To check the applicability of the method to evaluate communication quality, we compared the result of communication quality analysis with the result of operation performance that was performed by operators under simulated environment.