DOI QR코드

DOI QR Code

Introduction to numba library in Python for efficient statistical computing

효율적인 통계 계산을 위한 파이썬 numba 라이브러리의 소개

  • Cho, Younsang (Department of Statistics, Inha University) ;
  • Yu, Donghyeon (Department of Statistics, Inha University) ;
  • Son, Won (Department of Information Statistics, Dankook University) ;
  • Park, Seoncheol (Pacific Climate Impacts Consortium, University of Victoria)
  • Received : 2020.09.22
  • Accepted : 2020.10.16
  • Published : 2020.12.31

Abstract

This paper introduces numba library in Python, which improves computational efficiency of the provided implemented code written by naive Python language by applying just-in-time (JIT) compilation. To apply just-in-time compilation, the numba only needs to use a decorator on a target Python function. We provide implementation examples with numba for the permutation test and the parameter estimation for Gaussian mixture distribution. We also numerically show the efficiency of numba by comparing the total computation times of the implementation using naive python and the implementation using numba for each application.

본 논문은 순수하게 파이썬 언어로 작성된 연산에 대하여 just-in-time (JIT) 컴파일을 적용하여 전체 계산 속도를 향상시킬 수 있는 numba 라이브러리에 대한 사용법과 응용에 대하여 소개한다. 실제 통계 계산 문제에 대한 numba 라이브러리의 적용에 대한 예제로 반복문 사용이 요구되는 통계 계산 문제들 중 순열 검정과 정규 혼합 분포의 모수 추정의 EM 알고리즘을 고려하였으며 순수한 파이썬 구문 및 반복문을 활용한 계산 시간과 numba를 활용한 계산 시간을 비교하여 numba 라이브러리 활용의 효율성을 수치적으로 제시하였다.

Keywords

References

  1. Behnel, S., Bradshaw, R., Citro, C., Dalcin, L., Seljebotn, D. S., and Smith, K. (2010). Cython: the best of both worlds, Computing in Science & Engineering, 13, 31-39.
  2. Cho, H. (2018). Initalizing method of finite mixture model using kernel density estimation and application on model-based clustering, Journal of the Korean Data & Information Science Society, 29, 327-338. https://doi.org/10.7465/jkdi.2018.29.2.327
  3. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B, 39, 1-38. https://doi.org/10.2307/2347807
  4. Eddelbuettel, D. and Francosis, R. (2011). Rcpp: Seamless R and C++ Integration, Journal of Statistical Software, 40, 1-18.
  5. Lam, S. K., Pitrou, A., and Seibert, S. (2015). Numba: A LLVM-based Python JIT compiler, Proc. 2nd Workshop LLVM Compiler Infrastructure HPC, 7, 1-6.
  6. Lees, J. A., Harris, S. R., Tonkin-Hill, G., Gladstone, R. A., Lo, S. W., Weiser, J. N., Corander, J., Bentley, S. D., and Croucher, N. J. (2019). Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Research, 29, 304-316. https://doi.org/10.1101/gr.241455.118
  7. McInnes, L., Healy, J., Saul, N., and GroBberger, L. (2018). UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3, 861. https://doi.org/10.21105/joss.00861
  8. Pitman, E. J. G. (1937). Significance tests which may be applied to samples from any populations, Journal of the Royal Statistical Society, 4, 119-130.
  9. Stone, J. E., Gohara, D., and Shi, G. (2010). OpenCL: a parallel programming standard for heterogeneous computing systems, Computing in Science & Engineering, 12, 66-73. https://doi.org/10.1109/MCSE.2010.69
  10. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Seires B, 58, 267-288.