DOI QR코드

DOI QR Code

단세포 RNA 시퀀싱 데이터를 위한 가중변수 스펙트럼 군집화 기법

One-step spectral clustering of weighted variables on single-cell RNA-sequencing data

  • 박민영 (성균관대학교 통계학과) ;
  • 박세영 (성균관대학교 통계학과)
  • 투고 : 2020.05.20
  • 심사 : 2020.07.14
  • 발행 : 2020.08.31

초록

단세포 RNA 시퀀싱 데이터(single-cell RNA-sequencing data, 이하 단세포 RNA 데이터)는 세포 조직으로부터 추출한 각 단세포 별 유전자의 신호를 기록한 데이터로, 세포 간의 이질성을 파악하는 것을 주요 목적으로 한다. 그러나 단세포 RNA 데이터는 샘플링 및 기술적인 한계로 인해 결측비율이 높고, 노이즈가 크다. 이러한 이유 때문에 기존의 군집화 방법을 적용하는 데에 한계가 존재한다. 본 논문에서는 단세포 RNA 데이터 분석에서 모티브를 얻어 스펙트럼 군집화(spectral clustering) 기반의 방법을 제안한다. 특히 유사도 행렬(similarity matrix) 계산에서 유전자 별로 가중치를 부여하여 기존의 단세포 데이터 분석 방법과 차별화하였다. 제안하는 군집화 방법은 유전자별 가중치를 부여함과 동시에 세포를 군집화한다. 군집화는 반복 알고리즘을 통해 제안하는 비볼록식(non-convex optimization)을 풀어 진행한다. 또한 실데이터 적용과 시뮬레이션을 통해 제안하는 군집화 방법이 기존의 방법보다 군집을 잘 구분하는 것을 보인다.

Single-cell RNA-sequencing (scRNA-seq) data consists of each cell's RNA expression extracted from large populations of cells. One main purpose of using scRNA-seq data is to identify inter-cellular heterogeneity. However, scRNA-seq data pose statistical challenges when applying traditional clustering methods because they have many missing values and high level of noise due to technical and sampling issues. In this paper, motivated by analyzing scRNA-seq data, we propose a novel spectral-based clustering method by imposing different weights on genes when computing a similarity between cells. Assigning weights on genes and clustering cells are performed simultaneously in the proposed clustering framework. We solve the proposed non-convex optimization using an iterative algorithm. Both real data application and simulation study suggest that the proposed clustering method better identifies underlying clusters compared with existing clustering methods.

키워드

참고문헌

  1. Beck, A. and Tetruashvili, L. (2013). On the convergence of block coordinate descent type methods, SIAM Journal on Optimization, 23, 2037-2060. https://doi.org/10.1137/120887679
  2. Buettner, F., Natarajan, K. N., Casale, F. P., Proserpio, V., Scialdone, A., Theis, F. J., Teichmann, S. A., Marioni, J. C., and Stegle, O. (2015). Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells, Nature Biotechnology, 33, 155-160. https://doi.org/10.1038/nbt.3102
  3. Deng, Q., Ramskold, D., Reinius, B., and Sandberg, R. (2014). Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells, Science, 343, 193-196. https://doi.org/10.1126/science.1245316
  4. Haque, A., Engel, J., Teichmann, S. A., and Lonnberg, T. (2017). A practical guide to single-cell RNAsequencing for biomedical research and clinical applications, Genome Medicine, 9, 75. https://doi.org/10.1186/s13073-017-0467-4
  5. Kalisky, T. and Quake, S. R. (2011). Single-cell genomics, Nature Methods, 8, 311-314. https://doi.org/10.1038/nmeth0411-311
  6. Kvalseth, T. O. (1987). Entropy and correlation: some comments, IEEE Transactions on Systems, Man, and Cybernetics, 17, 517-519. https://doi.org/10.1109/TSMC.1987.4309069
  7. Ng, A. Y., Jordan, M. I., and Weiss, Y. (2002). On spectral clustering: analysis and an algorithm. In Advances in Neural Information Processing Systems, 849-856.
  8. Park, S., Xu, H., and Zhao, H. (2020). Integrating multidimensional data for clustering analysis with applications to cancer patient data, Journal of the American Statistical Association, In Press.
  9. Park, S. and Zhao, H. (2018). Spectral clustering based on learning similarity matrix, Bioinformatics, 34, 2069-2076. https://doi.org/10.1093/bioinformatics/bty050
  10. Park, S. and Zhao, H. (2019). Sparse principal component analysis with missing observations, Annals of Applied Statistics, 13, 1016-1042. https://doi.org/10.1214/18-AOAS1220
  11. Pollen, A. A., Nowakowski, T. J., Shuga, J., Wang, X., Leyrat, A. A., Lui, J. H., Li, N., Szpankowski, L., Fowler, B., Chen, P., and Ramalingam, N. (2014). Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex, Nature Biotechnology, 32, 1053-1058. https://doi.org/10.1038/nbt.2967
  12. Saha, A. and Tewari, A. (2013). On the nonasymptotic convergence of cyclic coordinate descent methods, SIAM Journal on Optimization, 23, 576-601. https://doi.org/10.1137/110840054
  13. Satija, R., Farrell, J. A., Gennert, D., Schier, A. F., and Regev, A. (2015). Spatial reconstruction of single-cell gene expression data, Nature Biotechnology, 33, 495-502. https://doi.org/10.1038/nbt.3192
  14. Schlitzer, A., Sivakamasundari, V., Chen, J., Sumatoh, H. R. B., Schreuder, J., Lum, J., Malleret, B., Zhang, S., Larbi, A., Zolezzi, F., and Renia, L. (2015). Identification of cDC1-and cDC2-committed DC progenitors reveals early lineage priming at the common DC progenitor stage in the bone marrow, Nature Immunology, 16, 718-728. https://doi.org/10.1038/ni.3200
  15. Shapiro, E., Biezuner, T., and Linnarsson, S. (2013). Single-cell sequencing-based technologies will revolutionize whole-organism science, Nature Reviews Genetics, 14, 618-630. https://doi.org/10.1038/nrg3542
  16. Stegle, O., Teichmann, S. A., and Marioni, J. C. (2015). Computational and analytical challenges in singlecell transcriptomics, Nature Reviews Genetics, 16, 133-145. https://doi.org/10.1038/nrg3833
  17. Ting, D. T., Wittner, B. S., Ligorio, M., Jordan, N. V., Shah, A. M., Miyamoto, D. T., Aceto, N., Bersani, F., Brannigan, B. W., Xega, K., and Ciciliano, J. C. (2014). Single-cell RNA sequencing identifies extracellular matrix gene expression by pancreatic circulating tumor cells, Cell Reports, 8, 1905-1918. https://doi.org/10.1016/j.celrep.2014.08.029
  18. Treutlein, B., Brownfield, D. G., Wu, A. R., Neff, N. F., Mantalas, G. L., Espinoza, F. H., Desai, T. J., Krasnow, M. A. and Quake, S. R., (2014). Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq, Nature, 509, 371-375. https://doi.org/10.1038/nature13173
  19. von Luxburg, U. (2007). A tutorial on spectral clustering, Statistics and Computing, 17, 395-416. https://doi.org/10.1007/s11222-007-9033-z
  20. Wang, B., Zhu, J., Pierson, E., Ramazzotti, D., and Batzoglou, S. (2017). Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning, Nature Methods, 14, 414-416. https://doi.org/10.1038/nmeth.4207
  21. Xu, C. and Su, Z. (2015). Identification of cell types from single-cell transcriptomes using a novel clustering method, Bioinformatics, 31, 1974-1980. https://doi.org/10.1093/bioinformatics/btv088
  22. Xu, Y. and Yin, W. (2017). A globally convergent algorithm for nonconvex optimization based on block coordinate update, Journal of Scientic Computing, 72, 700-734. https://doi.org/10.1007/s10915-017-0376-0