Search | Korea Science

Skewed Data Handling Technique Using an Enhanced Spatial Hash Join Algorithm (개선된 공간 해쉬 조인 알고리즘을 이용한 편중 데이터 처리 기법)

Shim Young-Bok;Lee Jong-Yun
- The KIPS Transactions:PartD
- /
- v.12D no.2 s.98
- /
- pp.179-188
- /
- 2005
Much research for spatial join has been extensively studied over the last decade. In this paper, we focus on the filtering step of candidate objects for spatial join operations on the input tables that none of the inputs is indexed. In this case, many algorithms has presented and showed excellent performance over most spatial data. However, if data sets of input table for the spatial join ale skewed, the join performance is dramatically degraded. Also, little research on solving the problem in the presence of skewed data has been attempted. Therefore, we propose a spatial hash strip join (SHSJ) algorithm that combines properties of the existing spatial hash join (SHJ) algorithm based on spatial partition for input data set's distribution and SSSJ algorithm. Finally, in order to show SHSJ the outperform in uniform/skew cases, we experiment SHSJ using the Tiger/line data sets and compare it with the SHJ algorithm.
https://doi.org/10.3745/KIPSTD.2005.12D.2.179 인용 PDF KSCI

A Skewed Data Handling Method using Spatial Hash Join Algorithm (공간 해쉬 조인 알고리즘을 이용한 편중 데이터 처리 기법)

심영복;이종연
- Proceedings of the Korean Information Science Society Conference
- /
- 2004.04b
- /
- pp.19-21
- /
- 2004
이 논문은 인덱스가 존재하지 않는 두 입력 테이블의 공간 조인 연산 과정 중 여과 단계 처리에 중점을 둔다. 관련 연구는 Spatial Hash Join(SHJ)과 Scalable Sweeping-Based Spatial Join(SSSJ) 알고리즘이 대표적이다. 하지만 조인을 위한 입력 테이블의 객체들이 편중 분포할 경우 성능이 급격히 저하되는 문제를 가지고 있다. 따라서, 이 논문에서는 이러한 문제를 해결하기 위해 기존 SHJ 알고리즘과 SSSJ 알고리즘의 특성을 이용한 Spatial Hash Strip Join(SHSJ) 알고리즘을 제안한다. 기존 SHJ 알고리즘과의 차이점은 입력 데이터 집합을 버킷에 할당할 때 버킷 용량에 제한을 두지 않는다는 점과 버킷의 조인 단계에서 I/O 성능의 향상을 위해 우수한 SSSJ 알고리즘을 사용한다는 것이다. 끝으로 이 논문에서 제안한 SHSJ 알고리즘의 성능은 실제 Tiger/line 데이터를 이용하여 실험한 결과 기존의 SHJ와 SSSJ 알고리즘 보다 편중된 입력 테이블의 조인 연산에 대해 월등히 우수함이 검증되었다.
PDF

Design of a Spatial Hash Strip Join Algorithm using Efficient Bucket Partitioning and Joining Methods (효율적인 버킷 분할과 조인 방법을 이용한 공간 해쉬 스트립 조인 알고리즘 설계)

Shim, Young-Bok;Lee, Jong-Yun;Jung, Soon-Key
- Proceedings of the Korea Information Processing Society Conference
- /
- 2003.11c
- /
- pp.1367-1370
- /
- 2003
본 논문에서는 인덱스가 존재하지 않는 두 개의 입력 릴레이션에 대해서도 최적의 조인 연산을 수행할 수 있는 공간 해쉬 조인 알고리즘을 제안한다. 인덱스가 존재하지 않는 릴레이션의 처리에 사용하는 기존의 공간 해쉬 조인(SHJ: Spatial Hash Join)과 Scalable Sweeping-Rased Spatial Join(SSSJ) 알고리즘을 결합하여 SHJ 알고리즘의 단점으로 지적되고 있는 편향된(skewed) 데이터에 대한 조인 연산의 성능저하 문제를 개선한 수 있는 Spatial Hash Strip Join(SHSJ) 알고리즘을 제안한다. SHJ에서 편향된 데이터의 경우 해쉬 버킷의 오버플로우 처리를 위해 버킷 재분할 방법을 사용하고 있는데 반하여 본 논문에서 제안한 SHSJ 알괴리즘에서는 버킷의 재분할 처리 대신에 버킷에 데이터를 삽입하고, 조인 연산과정에서 오버플로우가 발생한 버킷에 대하여 SSSJ 알고리즘을 사용함으로써 편향된 입력 릴레이션의 처리 성능을 제고시킬 수 있도록 한다.
PDF

Efficient Filter Step of DOT Spatial Join Algorithm (DOT 공간조인 알고리즘의 효율적인 여과단계 처리)

Yu, Yong-Hyuk;Back, Hyun;Yoon, Jee-Hee;Lee, Keon-Bae
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.04b
- /
- pp.39-41
- /
- 2000
공간조인 연산은 지리정보시스템의 연산 중 매우 높은 처리비용을 요구하는 연산이다. DOT 공간 색인 기법은 전통적인 데이터베이스 시스템의 주색인 기법을 적용할 수 있으며, 공간객체의 상호 인접성이 유지되도록 Hilbert 값으로 정렬되어 클러스터링 된다. 이러한 특징을 이용한 DOT공간 조인 알고리즘은 적정한 버퍼크기를 유지하는 경우 잘 알려진 R-tree를 이용한 공간조인 알고리즘에 비해 디스크 액세스면에서 유리한 장점이 있으나, 조인가능영역 산출시 많은 양의 공간변환 연산을 필요로 하므로 전체적인 성능이 만족스럽지 못하다. 본 논문은 DOT 공간조인 알고리즘의 성능을 향상시키기 위하여 이러한 공간변환 연사의 횟수를 최소화시킨 효율적인 여과단계처리 방법을 제시하며, 이를 적용한 DOT공간조인 알고리즘과 R-tree 공간조인 알고리즘의 실행시간을 비교 분석하여 DOT 공간조인 알고리즘이 최대 약 2배까지 우수한 성능을 가지고 있음을 보인다.
PDF

Implementation of Parallel Hash Join Algorithms in a Database sharing System (데이타베이스 공유 시스템에서 병렬 해쉬 조인 알고리즘의 구현)

김창현;조행래
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.04b
- /
- pp.43-45
- /
- 2002
기존에 제안된 대부분의 병렬 조인 알고리즘들은 데이타베이스가 여러 처리 노드에 분할되어 저장되는 데이타베이스 분할 시스템을 가정하였다. 데이타베이스 분할 시스템은 다수의 노드들을 연결할 수 있으며 지리적으로 분산된 환경도 지원할수 있다는 장점을 갖지만, 데이타베이스 공유 시스템에 비해 부하 분산이나 시스템 가용성이 떨어진다는 단점을 갖는다. 본 논문에서는 데이타베이스 공유 시스템에서 병렬 질의 처리기를 위한 병렬 해쉬 조인 알고리즘을 구현한다. 이를 위하여, 데이타베이스 공유 시스템에 적용 가능하도륵 병렬 질의 처리기를 구성하고 병렬 해쉬 조인 알고리즘의 처리 과정에 대해 설명 한다.
PDF

A Comparison of Multi- Way Join Algorithms in MapReduce (맵리듀스를 이용한 멀티웨이 조인 알고리즘의 비교)

Myung, Jae-Seok;Lee, Sang-Goo
- Proceedings of the Korean Information Science Society Conference
- /
- 2011.06c
- /
- pp.127-130
- /
- 2011
맵리듀스는 데이터의 분산 및 병렬 처리를 돕는 프레임워크로서, 하둡과 같은 오픈 소스 맵리듀스 구현이 배포되면서 많은 연구가 이루어지고 있다. 맵리듀스를 이용한 조인은 대용량 데이터 분석을 위한 필수적인 연산이며, 여러 개의 테이블을 한 번의 맵리듀스로 조인하기 위한 멀티웨이 조인 알고리즘에 대한 연구도 계속 진행되고 있다. 이 논문에서는 반복(iteration) 기반 멀티웨이 조인과 중복(replication) 기반 멀티웨이 조인 알고리즘의 장단점을 분석한다. 또한 두 가지 방식의 조인 알고리즘의 단점을 보완하여 하나의 통합적인 2단계 멀티웨이 세미조인을 제시하고, 이를 기존의 방식과 비교한다. 결과적으로, 2단계 멀티웨이 세미조인은 반복 기반의 조인에 비하여 입출력 비용을 절감하고, 중복 기반의 조인에 비하여 커뮤니케이션 비용을 절감한다.

Segment Join Technique for Processing in Queries Fast (빠른 XML질의 처리를 위한 세그먼트 조인 기법)

;Moon Bongki;Lee Sukho
- Journal of KIISE:Databases
- /
- v.32 no.3
- /
- pp.334-343
- /
- 2005
Complex queries such as path alld twig patterns have been the focus of much research on processing XML data. Structural join algorithms use a form of encoded structural information for elements in an XML document to facilitate join processing. Recently, structural join algorithms such as Twigstack and TSGeneric- have been developed to process such complex queries, and they have been shown that the processing costs of the algorithms are linearly proportional to the sum of input data. However, the algorithms have a shortcoming that their processing costs increase with the length of a queery. To overcome the shortcoming, we propose the segment join technique to augment the structural join with structural indexes such as the 1-Index. The SegmentTwig algorithm based on the segment join technique performs joins between a pair of segments, which is a series of query nodes, rather than joins between a pair of query nodes. Consequently, the query can be processed by reading only a query node per segment. Our experimental study shorts that segment join algorithms outperform the structural join methods consistently and considerably for various data sets.
PDF KSCI

Moving Objects Join Algorithms using TB-Tree (TB-Tree 를 이용한 이동객체 조인 알고리즘)

Lee, Jai-Ho;Lee, Seong-Ho
- Proceedings of the Korea Information Processing Society Conference
- /
- 2005.05a
- /
- pp.125-128
- /
- 2005
이동 객체 데이터베이스 시스템에서 시공간 조인 연산은 이동 객체들의 결합을 위한 중요한 연산이며 수행 시간은 이동 객체의 수가 증가함에 따라 기하급수적으로 증가한다. 그러므로 효과적인 시공간 조인 연산이 필수적이다. 본 논문에서는 기존의 공간 조인에서 활용되었던 기법들을 이동객체 조인에 적용하였다. 이동 객체의 궤적에 대한 정보를 잘 유지하고 있는 시공간 색인인 TB-Tree 를 이용한 깊이 우선 탐색 기반과 넓이 우선 탐색 기반 TB-Tree 조인에 대한 알고리즘들을 제시하고 구현한 알고리즘들의 성능 비교한 실험 결과를 제시한다.
PDF

A Spatial Hash Strip Join Algorithm for Effective Handling of Skewed Data (편중 데이타의 효율적인 처리를 위한 공간 해쉬 스트립 조인 알고리즘)

Shim Young-Bok;Lee Jong-Yun
- Journal of KIISE:Databases
- /
- v.32 no.5
- /
- pp.536-546
- /
- 2005
In this paper, we focus on the filtering step of candidate objects for spatial join operations on the input tables that none of the inputs is indexed. Over the last decade, several spatial Join algorithms for the input tables with index have been extensively studied. Those algorithms show excellent performance over most spatial data, while little research on solving the performance degradation in the presence of skewed data has been attempted. Therefore, we propose a spatial hash strip join(SHSJ) algorithm that can refine the problem of skewed data in the conventional spatial hash Join(SHJ) algorithm. The basic idea is similar to the conventional SHJ algorithm, but the differences are that bucket capacities are not limited while allocating data into buckets and SSSJ algorithm is applied to bucket join operations. Finally, as a result of experiment using Tiger/line data set, the performance of the spatial hash strip join operation was improved over existing SHJ algorithm and SSSJ algorithm.
PDF KSCI

Uniform Load Distribution Using Sampling-Based Cost Estimation in Parallel Join (병렬 조인에서 샘플링 기반 비용 예측 기법을 이용한 균등 부하 분산)

Park, Ung-Gyu
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.6
- /
- pp.1468-1480
- /
- 1999
In database systems, join operations are the most complex and time consuming ones which limit performance of such system. Many parallel join algorithms have been proposed for the systems. However, they did not consider data skew, such as attribute value skew (AVS) and join product skew (JPS). In the skewness environments, performance of framework for a uniform load distribution and an efficient parallel join algorithm using the framework to handle AVS and JPS. In our algorithm, we estimate data distributions of input and output relations of join operations using the sampling methodology and evaluate join cost for the estimated data distributions. Finally, using the histogram equalization method we distribute data among nodes to achieve good load balancing among nodes in the local joining phase. For performance comparison, we present simulation model of our algorithm and other join algorithms and present the result of some simulation experiments. The results indicate that our algorithm outperforms other algorithms in the skewed case.
PDF

Search Result 938, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)