Parallel Distributed Implementation of GHT on Ethernet Multicluster

이더넷 다중 클러스터에서 GHT의 병렬 분산 구현

  • Kim, Yeong-Soo (Div. of Computer Information, Yeungnam College of Science & Technology) ;
  • Kim, Myung-Ho (Div. of Computer Information, Yeungnam College of Science & Technology) ;
  • Choi, Heung-Moon (School of Electrical Engineering & Computer Science, Kyungpook National University)
  • 김영수 (영남이공대학 컴퓨터정보계열) ;
  • 김명호 (영남이공대학 컴퓨터정보계열) ;
  • 최흥문 (경북대학교 전자전기컴퓨터학부)
  • Published : 2009.05.25

Abstract

Extending the scale of the distributed processing in a single Ethernet cluster is physically restricted by maximum ports per switch. This paper presents an implementation of MPI-based multicluster consisting of multiple Ethernet switches for extending the scale of distributed processing, and a asymptotical analysis for communication overhead through execution-time analysis model. To determine an optimum task partitioning, we analyzed the processing time for various partitioning schemes, and AAP(accumulator array partitioning) scheme was finally chosen to minimize the overall communication overhead. The scope of data partitioned in AAP was modified to fit for incremented nodes, and suitable load balancing algorithm was implemented. We tried to alleviate the communication overhead through exploiting the pipelined broadcast and flat-tree based result gathering, and overlapping of the communication and the computation time. We used the linear pipeline broadcast to reduce the communication overhead in intercluster which is interconnected by a single link. Experimental results shows nearly linear speedup by the proposed parallel distributed GHT implemented on MPI-based Ethernet multicluster with four 100Mbps Ethernet switches and up to 128 nodes of Pentium PC.

이더넷 클러스터에서 그 분산처리 규모를 확장하려면 스위치 당 최대포트 수(현재 48포트)에 의해 물리적 제약을 받는다. 본 연구에서는 MPI기반 이더넷 클러스터에서 일반화 허프변환(generalized Hough transform: GHT)의 분산처리 규모를 확장하기 위해 다수의 이더넷 스위치들로 다중 클러스터를 구현하고, 확장에 따른 통신 부담을 병렬분산 시간분석 모델 및 통신성능 모델로 분석한 후 고속화 구현하였다. 다중 클러스터 분산처리환경에서 가능한 작업분할 정책들에 대해 평가하고, 허프공간 누산기 배열분할(accumulator array partitioning: AAP)정책을 수정 적용하여 노드간의 통신회수와 통신시간을 최소화하였고, 노드 수의 증가에 따라 AAP 정책의 분할 데이터 범위를 크게 하고 그에 부합하는 부하균형 알고리즘도 구현하였다. 단일링크 병목을 갖는 클러스터간(intercluster) 통신지연을 최대한 줄이기 위하여 일감 분배에는 선형 파이프라인 방송을 사용하고, 작은 결과 메시지들의 수합(gathering)에는 선형 플랫트리(flat tree)를 사용함으로써 총체적으로 계산과 통신을 최대한 시간 중첩시켰다. 제안한 병렬분산 GHT를 이더넷 다중 클러스터 상에서 그 성능을 점근해석하고 실험하여, 4개 고속 이더넷 스위치로 128 노드의 MPI 기반 다중 클러스터를 구현하여 거의 선형에 가까운 속도제고율(speedup)을 확인하였다.

Keywords

References

  1. Beinglass, A. and Wolfson, H. J., 'Articulated Object Recognition, or: How to Generalize the Generalized Hough Transform', Proc. of IEEE CVPR, pp. 461-466, June 1991
  2. Chaoqiang Liu, Tao Xia, and Hui Li, 'A Hierarchical Hough Transform for Fingerprint Matching', Lecture Notes in Computer Science, vol. 3072, 2004 https://doi.org/10.1007/978-3-540-25948-0_52
  3. Fujii, K. and Arikawa, T., 'Urban object reconstruction using airborne laser elevation image and aerial image', IEEE Trans. on Geoscience and Remote Sensing, vol. 40, Iss. 10, pp. 2234-2240, Oct. 2002 https://doi.org/10.1109/TGRS.2002.802460
  4. Chmielewski and Leszek, 'Choice of the Hough transform for image registration', Proc. of the SPIE, vol. 5505, pp. 122-134, 2004 https://doi.org/10.1117/12.577912
  5. B. Howe, A. Gururajan, H. Sari-Sarraf, and L. R. Long, 'Hierarchical segmentation of cervical and lumbar vertebrae using a customized generalized Hough transform and extensions to active appearance models', 6th IEEE Southwest Symposium on Image Analysis and Interpretation, pp. 182-186, March 2004
  6. C. Guerra and S. Hambrusch, 'Parallel algorithms for line detection on a mesh,' Journal of Parallel and Distributed Computing, vol. 6, no. 1, pp. 1-19, Feb 1989 https://doi.org/10.1016/0743-7315(89)90039-7
  7. Y. Pan and Y. H. Chuang, 'Parallel Hough transform algorithms on SIMD hypercube arrays,' Proc. of ICPP, vol. 3, pp. 83-86, Aug. 1990
  8. M. Atiquzzaman, 'Pipelined implementation of the multiresolution Hough transform in a pyramid multiprocessor,' Pattern Recognition Letters, vol. 15, no. 9, pp. 841-851, Sep. 1994 https://doi.org/10.1016/0167-8655(94)90145-7
  9. A. N. Choudhary and R. Ponnusamy, 'Implementation and evaluation of Hough algorithms on a shared-memory multiprocessor,' Journal of Parallel and Distributed Computing, vol. 12, no. 2, pp. 178-188, June 1991 https://doi.org/10.1016/0743-7315(91)90023-3
  10. D. Baumann and S. Ranka, 'The Generalized Hough Transform on an MIMD Machine,' Journal of Undergraduate Research in High-Performance Computing, 2, 1992
  11. A. Underhill, M. Atiquzzaman, and J. Ophel, ' Performance of the Hough transform on a distributed memory multiprocessor,' Microprocessors and Microsystems, vol. 22, no. 7, pp. 355-362, Jan. 1999 https://doi.org/10.1016/S0141-9331(98)00093-3
  12. P. Patarasuka, X. Yuan, and A. Farajb, 'Techniques for pipelined broadcast on ethernet switched clusters ', Journal of Parallel and Distributed Computing, vol. 68, Iss. 6, pp. 809-824, June 2008 https://doi.org/10.1016/j.jpdc.2007.11.003
  13. T. Kielmann, H. E. Bal, and K. Verstoep, 'Fast Measurement of LogP Parameters for Message Passing Platforms,' Proc. of IPDPS Workshop on Parallel and Distributed Processing, pp. 1176-1183, May 2000 https://doi.org/10.1007/3-540-45591-4_162
  14. Y. S. Kim, J. S. Kim, and H. M, Choi, 'Parallel Distributed Implementation of GHT on MPI-based PC Cluster', Journal of IEEK, vol. 44-CI, no. 3, May 2007
  15. S. S. Vadhiyar, G. E. Fagg, and J. J. Dongarra, 'Towards an Accurate Model for Collective Communications', International Journal of High Performance Computing Applications, Vol. 18, No. 1, pp. 159-167, 2004 https://doi.org/10.1177/1094342004041297