DOI QR코드

DOI QR Code

Conditional Variational Autoencoder-based Generative Model for Gene Expression Data Augmentation

유전자 발현량 데이터 증대를 위한 Conditional VAE 기반 생성 모델

  • 봉현수 (명지대학교 데이터테크놀로지학과) ;
  • 오민식 (명지대학교 데이터테크놀로지학과)
  • Received : 2023.03.21
  • Accepted : 2023.04.26
  • Published : 2023.05.30

Abstract

Gene expression data can be utilized in various studies, including the prediction of disease prognosis. However, there are challenges associated with collecting enough data due to cost constraints. In this paper, we propose a gene expression data generation model based on Conditional Variational Autoencoder. Our results demonstrate that the proposed model generates synthetic data with superior quality compared to two other state-of-the-art models for gene expression data generation, namely the Wasserstein Generative Adversarial Network with Gradient Penalty based model and the structured data generation models CTGAN and TVAE.

유전자 발현 데이터는 질병의 예후 예측, 약물 반응성 예측 등 질병에 대한 이해와 정밀 의료 실현을 위한 연구들에 활용될 수 있지만 충분한 양의 데이터를 수집하는 데 많은 비용적 문제가 있다. 본 논문에서는 Conditional VAE에 기반한 유전자 발현 데이터 생성 모델을 제안하였다. 이전 연구인 WGAN-GP기반의 유전자 발현 생성 모델과 정형 데이터 생성 모델인 CTGAN, TVAE와 비교하여 본 논문의 Conditional VAE기반 모델이 생물학적, 통계학적으로 더 유의미한 합성 데이터를 생성할 수 있음을 보였다.

Keywords

Acknowledgement

This work was supported by the Technology Innovation Program (2022353, IoMT artificial intelligence and NFT interface standard development for metaverse) funded By the Ministry of Trade, Industry & Energy(MOTIE, Korea)

References

  1. Lee Su-min. "Recent Development of Next Generation Sequence Analysis (NGS) Technology and Future Research Direction", BRIC VIEW, 2014-T05,
  2. The Cancer Genome Atlas Research Network, Weinstein J.N. et al, "The Cancer Genome Atlas Pan-Cancer analysis project", Nat Genet , Vol.45, pp.1113-1120, 2013. doi: https://doi.org/10.1038/ng.2764
  3. Aguet F. et al, "The GTEx consortium atlas of genetic regulatory effects across human tissues", Science, Vol.369, pp.1318-1330, 2019. doi: https://doi.org/10.1126/science.aaz1776
  4. Rhee Je-Keun, "Prediction for Periodontal Disease using Gene Expression Profile Data based on Machine Learning", Journal of the Korea Institute of Information and Communication Engineering, Vol.23, No.8, pp.903-909, 2019. doi: https://doi.org/10.6109/jkiice.2019.23.8.903
  5. Li, Y., Umbach, D.M., Krahn, J.M.et al, "Predicting tumor response to drugs based on gene-expression biomarkers of sensitivity learned from cancer cell lines", BMC Genomics, Vol.22, No.272, 2021. doi: https://doi.org/10.1186/s12864-021-07581-7
  6. Wang, L., Oh, W. & Zhu, J, "Disease-specific classification using deconvoluted whole blood gene expression". Sci Rep, Vol.6, No.32976, 2016. doi: https://doi.org/10.1038/srep32976
  7. Ting Jin, Nam D Nguyen, Flaminia Talos, Daifeng Wang, "ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages", Bioinformatics, Vol.37, No.8, pp.1115-1124, 15 April 2021. doi: https://doi.org/10.1093/bioinformatics/btaa935
  8. Kingma DP, Welling M, "Auto-encoding variational bayes", ICLR, 2014. doi: https://doi.org/10.48550/arXiv.1312.6114
  9. Jia, P., Hu, R., Pei, G. et al. "Deep generative neural network for accurate drug response imputation", Nat Commun, Vol.12, No.1740, 2021. doi: https://doi.org/10.1038/s41467-021-21997-5
  10. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A, "Improved training of wasserstein gans", NIPS, Vol.30, 2017. doi: https://doi.org/10.48550/arXiv.1704.00028
  11. Vinas, R., Andres-Terre, H., Lio, P. & Bryson, K. "Adversarial generation of gene expression data", Bioinformatics, Vol.38, No.3, pp.730-737, February 2022. doi: https://doi.org/10.1093/bioinformatics/btab035
  12. D.P Kingma, D.J Rezende, S Mohamed, M Welling. "Semi-supervised learning with deep generative models", NIPS, Vol.27, 2014. doi: https://doi.org/10.48550/arXiv.1406.5298
  13. Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K, "Modeling tabular data using conditional gan", NIPS, Vol.32, 2019. doi: https://doi.org/10.48550/arXiv.1907.00503
  14. I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.C. Courville, Y. Bengio, "Generative adversarial nets," In Advances in Neural Information Processing Systems, pp.2672-2680, 2014. doi: https://doi.org/10.48550/arXiv.1406.2661
  15. M. Arjovsky, S. Chintala, L. Bottou, "Wasserstein Generative Adversarial Networks," Proceedings of the 34th International Conference on Machine Learning, pp. 214-223, 2017. doi: https://doi.org/10.48550/arXiv.1701.07875
  16. Weaver, N. "Lipschitz algebras", World Scientific, 2018. doi: https://doi.org/10.1142/4100
  17. M. Mirza and S. Osindero, "Conditional Generative Adversarial Nets", ArXivPrePrint ArXiv:1411.1784. (2014)
  18. Yifei Chen, Yi Li, Rajiv Narayan, Aravind Subramanian, Xiaohui Xie, "Gene expression inference with deep learning", Bioinformatics, Vol.32, No.12, pp.1832-1839, June 2016. doi: https://doi.org/10.1093/bioinformatics/btw074
  19. Wang Q. et al. "Unifying cancer and normal RNA sequencing data from different sources". Sci Data, Vol.5, No.180061, 2018. doi: https://doi.org/10.1038/sdata.2018.61
  20. McInnes L. et al. "UMAP: Uniform Manifold Approximation and Projection for dimension reduction", ArXivPrePrint ArXiv:1802.03426. (2018)