DOI QR코드

DOI QR Code

Variational Autoencoder Based Dimension Reduction and Clustering for Single-Cell RNA-seq Gene Expression

단일세포 RNA-SEQ의 유전자 발현 군집화를 위한 변이 자동인코더 기반의 차원감소와 군집화

  • Chi, Sang-Mun (Department of Computer Science, Kyungsung University)
  • Received : 2021.08.22
  • Accepted : 2021.09.15
  • Published : 2021.11.30

Abstract

Since single cell RNA sequencing provides the expression profiles of individual cells, it provides higher cellular differential resolution than traditional bulk RNA sequencing. Using these single cell RNA sequencing data, clustering analysis is generally conducted to find cell types and understand high level biological processes. In order to effectively process the high-dimensional single cell RNA sequencing data fir the clustering analysis, this paper uses a variational autoencoder to transform a high dimensional data space into a lower dimensional latent space, expecting to produce a latent space that can give more accurate clustering results. By clustering the features in the transformed latent space, we compare the performance of various classical clustering methods for single cell RNA sequencing data. Experimental results demonstrate that the proposed framework outperforms many state-of-the-art methods under various clustering performance metrics.

단일세포 RNA-Seq 은 개별 세포의 유전자 발현을 제공하므로 세포마다 차등적인 고해상도 정보를 준다. 단일세포 RNA-Seq 자료에 대하여 군집화는 세포의 유형과 고수준의 생물 과정을 이해하기 위하여 수행된다. 매우 고차원이고 대용량인 단일세포 RNA-Seq을 효과적으로 처리하기 위하여, 본 논문은 변이 자동인코더를 사용하여 고차원의 자료공간을 저차원의 잠재공간으로 변환하여, 보다 정확한 군집화를 수행할 수 있는 특징공간을 만든다. 차원이 축소된 잠재공간에 다양한 군집화 방법을 적용하는 접근을 다양한 전통적인 단일세포 RNA-Seq 군집화 방법과 성능을 비교하였다. 군집화 실험을 통하여, 제안한 방법은 기존 방법들보다 다양한 군집화 성능기준에서 성능이 개선되었다.

Keywords

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(2021R1I1A304651111)

References

  1. S. K. Kim, J. Lund, M. Kiraly, K. Duke, M. Jiang, J. M. Stuart, A. Eizinger, B. N. Wylie, and G. S. Davidson, "A gene expression map for Caenorhabditis elegans," Science, vol. 293, no. 5537, pp. 2087-2092, Sep. 2001. https://doi.org/10.1126/science.1061603
  2. M. N. Arbeitman, E. E. Furlong, F. Imam, E. Johnson, B. H. Null, B. S. Baker, M. A. Krasnow, Ma. P. Scott, R. W. Davis, and K. P. White, "Gene expression during the life cycle of Drosophila melanogaster," Science, vol. 297, no. 5590, pp. 2270-2275, Sep. 2002.
  3. F. Emmert-Streib, M. Dehmer, and B. Haibe-Kains, "Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks," Front. Cell Dev. Biol., vol. 2, no. 38, Aug. 2014.
  4. A. E. Saliba, A. J. Westermann, S. A. Gorski, and J. Vogel, "Single-cell RNAseq: advances and future challenges," Nucleic Acids Res., vol. 42, no. 14, pp. 8845-8860, Aug. 2014. https://doi.org/10.1093/nar/gku555
  5. C. W. Shields, C. D. Reyes, and G. P. Lopez, "Microfluidic cell sorting: a review of the advances in the separation of cells from debulking to rare cell isolation," Lab Chip, vol. 15, no. 5, pp. 1230-1249, Mar. 2015. https://doi.org/10.1039/C4LC01246A
  6. J. Tanevski, T. Nguyen, B. Truong, N. Karaiskos, M. Er. Ahsen, X. Zhang, C. Shu, K. Xu, X. Liang, Y. Hu, H. V. V. Pham, L. Xiaomei, T. D. Le, A. L. Tarca, G. Bhatti, R. Romero, N. Karathanasis, P. L.oher, Y. Chen, Z. Ouyang, D. Mao, Y. Zhang, M. Zand, J. Ruan, C. Hafemeister, P. Qiu, D. Tran, T. Nguyen, A. Gabor, T. Yu, E. Glaab, R. Krause, P. Banda, DREAM SCTC Consortium, G. Stolovitzky, N. Rajewsky, J. Saez-Rodriguez, and P. Meyer, "Predicting cellular position in the Drosophila embryo from single-cell transcriptomics data," bioRxiv, 2019. doi: doi.org/10.1101/796029.
  7. T. Ching, D. S. Himmelstein, B. K. Beaulieu-Jones, A. A. Kalinin, B. T. Do, G. P. Way, E. Ferrero, P. M. Agapow, M. Zietz, M. M. Hoffman, W. Xie, G. L. Rosen, B. J. Lengerich, J. Israeli, J. Lanchantin, S. Woloszynek, A. E. Carpenter, A. Shrikumar, Ji. Xu, E. M. Cofer, C. A. Lavender, S. C. Turaga, A. M. Alexandari, Z. Lu, D. J. Harris, D. DeCaprio, Y. Qi, A. Kundaje, Y. Peng, L. K. Wiley, M. H. S. Segler, S. M. Boca, S. J. Swamidass, A. Huang, A. Gitter, and C. S. Greene, "Opportunities and obstacles for deep learning in biology and medicine," J. R. Soc. Interface, vol. 15, no. 141, Apr. 2018.
  8. J. Ding, A. Condon, and S. P. Shah, "Interpretable dimensionality reduction of single cell transcriptome data with deep generative models," Nat Commun., vol. 9, no. 2002, May. 2018.
  9. G. Eraslan, L. M. Simon, M. Mircea, N. S. Mueller, and F. J. Theis, "Single-cell RNA-seq denoising using a deep count autoencoder," Nat Commun., vol. 10, no. 390, Jan. 2019.
  10. T. Wang, T. S. Johnson, W. Shao, Z. Lu, B. R. Helm, J. Zhang, and K. Huang, "BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes," Genome Biol., vol. 20, no. 165, Aug. 2019.
  11. M. Amodio, D. Dijk, K. Srinivasan, W. S. Chen, H. Mohsen, K. R. Moon, A. Campbell, Y. Zhao, X. Wang, M. Venkataswamy, A. Desai, V. Ravi, P. Kumar, R. Montgomery, G. Wolf, and S. Krishnaswamy, "Exploring single-cell data with deep multitasking neural networks," Nat. Methods, vol. 16, pp. 1139-1145, Oct. 2019. https://doi.org/10.1038/s41592-019-0576-7
  12. L. Xiong, K. Xu, K. Tian, Y. Shao, L. Tang, G. Gao, M. Zhang, T. Jiang, and Q. C. Zhang, "SCALE method for single-cell ATAC-seq analysis via latent feature extraction," Nat Commun., vol. 10, no. 4576, Oct. 2019.
  13. B. Wang, J. Zhu, E. Pierson, D. Ramazzotti and S. Batzoglou, "Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning," Nat. Methods, vol. 14, pp. 414-416, Mar. 2017. https://doi.org/10.1038/nmeth.4207
  14. S. Park and H. Zhao, "Spectral clustering based on learning similarity matrix," Bioinformatics, vol. 34, no. 12, pp. 2069-2076, Feb. 2018. https://doi.org/10.1093/bioinformatics/bty050
  15. T. Tian, J. Wan, Q. Song, and Z. Wei, "Clustering single-cell RNA-seq data with a model-based deep learning approach," Nature Mach. Intell., vol. 1, pp. 191-198, Apr. 2019. https://doi.org/10.1038/s42256-019-0037-0
  16. Y. Wu, Y. Guo, Y. Xiao, and S. Lao, "AAE-SC: A scRNA-Seq Clustering Framework Based on Adversarial Autoencoder," IEEE Access, vol. 8, pp. 178962-178975, Sep. 2020. https://doi.org/10.1109/access.2020.3027481
  17. G. Eraslan, L. M. Simon, M. Mircea, N. S. Mueller, and F. J. Theis, "Single-cell RNA-seq denoising using a deep count autoencoder," Nat. Commun., vol. 10, no. 390, Jan. 2019.
  18. J. Ding, A. Condon, and S. P. Shah, "Interpretable dimensionality reduction of single cell transcriptome data with deep generative models," Nat. Commun., vol. 9, no. 2002, May. 2018.
  19. C. Doersch, "Tutorial on Variational Autoencoders," arXiv:1606.05908v3, 2021.
  20. D. P. Kingma and M. Welling, "An Introduction to Variational Autoencoders," Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307-392, Nov. 2019. https://doi.org/10.1561/2200000056
  21. F. Pedregosa, G. Varoquaux, A. GRamfort, V. Miche, and B. Thirion, "Scikit-learn: Machine Learning in Python," JMLR, vol. 12, pp. 2825-2830, 2011.
  22. B. J. Frey and D. Dueck, "Clustering by Passing Messages Between Data Points," Science, vol. 315, no. 5814, pp. 972-976, Feb. 2007. https://doi.org/10.1126/science.1136800
  23. U. Luxburg, "A Tutorial on Spectral Clustering," Statistics and Computing, vol. 17, pp. 395-416, 2007. https://doi.org/10.1007/s11222-007-9033-z
  24. G. X. Y. Zheng, J. M. Terry, P. Belgrader, P. Ryvkin, Z. W. Bent, R. Wilson, S. B. Ziraldo, T. D. Wheeler, G. P. McDermott, J. Zhu, M. T. Gregory, J. Shuga, L. Montesclaros, J. G. Underwood, D. A. Masquelier, S. Y. Nishimura, M. Schnall-Levin, P. W. Wyatt, C. M. Hindson, R. Bharadwaj, A. Wong, K. D. Ness, L. W. Beppu, H. J. Deeg, C.r McFarland, K. R. Loeb, W. J. Valente, N. G. Ericson, E. A. Stevens, J. P. Radich, T. S. Mikkelsen, B. J. Hindson, and J. H. Bielas, "Massively parallel digital transcriptional profiling of single cells," Nat. Commun., vol. 8, no. 14049, Jan. 2017.
  25. D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," ICLR (Poster), 2015.
  26. E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, "DBSCAN revisited, revisited: why and how you should (still) use DBSCAN," ACM Transactions on Database Systems, vol. 42, no. 3, pp. 1-22, 2017.
  27. E. Schubert and M. Gertz, "Improving the Cluster Structure Extracted from OPTICS Plots," Proc. of the Conference LWDA, pp. 318-329. 2018.