DOI QR코드

DOI QR Code

Design of MAHA Supercomputing System for Human Genome Analysis

대용량 유전체 분석을 위한 고성능 컴퓨팅 시스템 MAHA

  • 김영우 (한국전자통신연구원) ;
  • 김홍연 (한국전자통신연구원) ;
  • 배승조 (한국전자통신연구원) ;
  • 김학영 (한국전자통신연구원 서버플랫폼연구팀) ;
  • 우영춘 (한국전자통신연구원 고성능컴퓨팅시스템연구팀) ;
  • 박수준 (한국전자통신연구원 바이오의료IT융합연구부) ;
  • 최완 (한국전자통신연구원 클라우드컴퓨팅연구부)
  • Received : 2013.01.08
  • Accepted : 2013.01.11
  • Published : 2013.02.28

Abstract

During the past decade, many changes and attempts have been tried and are continued developing new technologies in the computing area. The brick wall in computing area, especially power wall, changes computing paradigm from computing hardwares including processor and system architecture to programming environment and application usage. The high performance computing (HPC) area, especially, has been experienced catastrophic changes, and it is now considered as a key to the national competitiveness. In the late 2000's, many leading countries rushed to develop Exascale supercomputing systems, and as a results tens of PetaFLOPS system are prevalent now. In Korea, ICT is well developed and Korea is considered as a one of leading countries in the world, but not for supercomputing area. In this paper, we describe architecture design of MAHA supercomputing system which is aimed to develop 300 TeraFLOPS system for bio-informatics applications like human genome analysis and protein-protein docking. MAHA supercomputing system is consists of four major parts - computing hardware, file system, system software and bio-applications. MAHA supercomputing system is designed to utilize heterogeneous computing accelerators (co-processors like GPGPUs and MICs) to get more performance/$, performance/area, and performance/power. To provide high speed data movement and large capacity, MAHA file system is designed to have asymmetric cluster architecture, and consists of metadata server, data server, and client file system on top of SSD and MAID storage servers. MAHA system softwares are designed to provide user-friendliness and easy-to-use based on integrated system management component - like Bio Workflow management, Integrated Cluster management and Heterogeneous Resource management. MAHA supercomputing system was first installed in Dec., 2011. The theoretical performance of MAHA system was 50 TeraFLOPS and measured performance of 30.3 TeraFLOPS with 32 computing nodes. MAHA system will be upgraded to have 100 TeraFLOPS performance at Jan., 2013.

지난 10여년 동안 컴퓨팅 분야는 다양한 연구와 변화를 통하여 눈부신 발전을 이루어오고 있다. 반도체 기술의 발전은 프로세서 및 시스템 아키텍처, 프로그래밍 환경 등에 새로운 패러다임의 변화를 야기하고 있다. 특히 고성능컴퓨팅(HPC)분야는 첨단 기술이 집적된 분야로써, 한 국가의 경쟁력으로 간주되고 있다. 2000년대 후반부터 선진 국가들은 Exascale의 슈퍼컴퓨팅 기술의 개발에 박차를 가하고 있으나, 한국의 경우 ICT 분야에 집중하여 관련 핵심기술의 확보가 시급한 상황이다. 본 논문에서는 슈퍼컴퓨팅 기술을 확보하고 대규모 유전체 분석 및 단백질 구조 분석을 위한 고성능 컴퓨팅 시스템인 MAHA 슈퍼컴퓨팅 시스템의 아키텍쳐를 제시하고 설계 및 구현에 관하여 서술한다. MAHA 슈퍼컴퓨팅 시스템은 컴퓨팅 하드웨어, 파일 시스템, 시스템 소프트웨어 및 바이오 응용으로 구성되며, 성능/$, 성능/면적 및 성능/전력을 향상시키기 위한 이종 매니코어 연산장치에 기반 한 고성능 컴퓨팅 구조를 설계하였다. 대규모 데이터에 대한 빠른 처리를 위하여 SSD 및 MAID시스템에 기반 한 고성능 저전력 파일시스템과 사용자 편의성 및 이종 매니코어 자원의 효과적인 활용을 통한 바이오 응용 성능 향상을 위한 시스템 소프트웨어를 설계하였다. 2011년 12월 MAHA 슈퍼컴퓨팅 시스템은 32개의 컴퓨팅 노드에 기반 하여 이론 성능 50 테라 플롭스, 실측 성능 30.3 테라 플롭스(시스템 효율 56.2%)로 설계, 구축 되었으며, 2013년 100 테라 플롭스 규모로 확장될 예정이다.

Keywords

References

  1. P. Kogge et al., ExaScale Computing Study: Technology Challenges in Achieving ExaScale Systems, DARPA Information Processing Techniques Office(IPTO) sponsored study, 2008.
  2. Kirk Skaugen, "Petascale to Exascale," ISC 2010 Keynote Presentation, Intel, 2010.
  3. "Development of Supercomputing system for genome analysis," IT industrial fusion core technology development project, MKE.
  4. YW Kim, SW Kim, "Technology and Trends of High Performance Processors," Electronics and Telecommunications Trends, Vol.25, No.5, pp.123-136, 2010.
  5. YW Kim, K Park, HY Kim, "Recent Trends on High Performance Computing System Technology," proceedings of the ITFE Summer Conference, pp.23-25, Aug., 2012.
  6. TOP500 Supercomputer sites, http://top500.org
  7. YW Kim, SW Kim, W Choi, "Summary on Worldwide HPC Developement Strategies and Status," Electronics and Telecommunications Trends, Vol.26, No.6, pp.174-188, 2011.
  8. 電子情報通信分野 科学技術.研究開発の國際比較, 2011年版, 擉立行政法人科學 技術振興機構研究開發戦略センター, June, 2012. http://crds.jst.go.jp/output/pdf/11ic03s.pdf
  9. Human Genome Project, Wikipedia, http://en.wikipedia.org/wiki/Human_Genome_Project
  10. Biology 2.0, Special report, The Economist, 2010, http://www. economist.com/node/16349358
  11. Intel Xeon Processor E5-1600/E5-2600/E5-460 Product Families Datasheet, Intel, 2012, http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-1600-2600-vol-1-datasheet.html
  12. NVIDIA whitepaper, "Tesla@ Kepler GPU Accelerators," NVIDIA, 2012, http://www.nvidia.com/content/tesla/pdf/ Tesla-KSeries-Overview-LR.pdf
  13. Intel $Intel^{(R)}$ $Xeon^{(R)}$ Processor X5670, Intel, 2011, http://ark.intel.com/products/47920/Intel-Xeon-Processor-X5670-12M-Cache-2_93-GHz-6_40-GTs-Intel-QPI
  14. "Intelの8コア版Sandy Bridgeとモジュラー設計戦略," 後藤弘茂のWeekly海外ニュース, Impress Watch, 2011. 04, http:// pc.watch.impress.co.jp/docs/column/kaigai/20110406_437481.html
  15. BY Jeong, et. al., "Data center operating cost savings for the eco-friendly air conditioning methods," Korea Patent pending, 2011.
  16. $Intel^{(R)}$ Xeon PhiTM Coprocessor, Intel, 2012, http://ark.intel.com/products/71992/Intel-Xeon-Phi-Coproce ssor-5110P-8GB-1_053-GHz-60-core

Cited by

  1. Design and Implementation of an Alternate System Interconnect based on PCI Express vol.52, pp.8, 2015, https://doi.org/10.5573/ieie.2015.52.8.074