DOI QR코드

DOI QR Code

Knowledge-guided artificial intelligence technologies for decoding complex multiomics interactions in cells

  • Lee, Dohoon (Bioinformatics Institute, Seoul National University) ;
  • Kim, Sun (Interdisciplinary Program in Bioinformatics, Seoul National University)
  • 투고 : 2021.09.15
  • 심사 : 2021.10.21
  • 발행 : 2022.05.15

초록

Cells survive and proliferate through complex interactions among diverse molecules across multiomics layers. Conventional experimental approaches for identifying these interactions have built a firm foundation for molecular biology, but their scalability is gradually becoming inadequate compared to the rapid accumulation of multiomics data measured by high-throughput technologies. Therefore, the need for data-driven computational modeling of interactions within cells has been highlighted in recent years. The complexity of multiomics interactions is primarily due to their nonlinearity. That is, their accurate modeling requires intricate conditional dependencies, synergies, or antagonisms between considered genes or proteins, which retard experimental validations. Artificial intelligence (AI) technologies, including deep learning models, are optimal choices for handling complex nonlinear relationships between features that are scalable and produce large amounts of data. Thus, they have great potential for modeling multiomics interactions. Although there exist many AI-driven models for computational biology applications, relatively few explicitly incorporate the prior knowledge within model architectures or training procedures. Such guidance of models by domain knowledge will greatly reduce the amount of data needed to train models and constrain their vast expressive powers to focus on the biologically relevant space. Therefore, it can enhance a model's interpretability, reduce spurious interactions, and prove its validity and utility. Thus, to facilitate further development of knowledge-guided AI technologies for the modeling of multiomics interactions, here we review representative bioinformatics applications of deep learning models for multiomics interactions developed to date by categorizing them by guidance mode.

키워드

과제정보

This research was supported by the Collaborative Genome Program for Fostering New Post-Genome Industry of the National Research Foundation (NRF) funded by the Ministry of Science and ICT (MSIT) (NRF-2014M3C9A3063541); a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) funded by the Ministry of Health & Welfare, Republic of Korea (HI15C3224) and the Bio & Medical Technology Development Program of the NRF (NRF-2019M3E5D4065965); and a grant from the Institute of Information & Communications Technology Planning & Evaluation (IITP) funded by the Korean government(MSIT) (no. 2021-0-01343, Artificial Intelligence Graduate School Program, Seoul National University).

참고문헌

  1. Fields S, Song OK. A novel genetic system to detect protein-protein interactions. Nature 1989;340:245-6. https://doi.org/10.1038/340245a0
  2. Joung JK, Ramm EI, Pabo CO. A bacterial two-hybrid selection system for studying protein-DNA and protein-protein interactions. Proc Natl Acad Sci USA 2000;97:7382-7. https://doi.org/10.1073/pnas.110149297
  3. Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief Bioinform 2017;18:851-69. https://doi.org/10.1093/bib/bbw068
  4. Pawson T, Nash P. Protein-protein interactions define specificity in signal transduction. Genes Dev 2000;14:1027-47. https://doi.org/10.1101/gad.14.9.1027
  5. Lemos B, Meiklejohn CD, Hartl DL. Regulatory evolution across the protein interaction network. Nat Genet 2004;36:1059-60. https://doi.org/10.1038/ng1427
  6. Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002;415:141-7. https://doi.org/10.1038/415141a
  7. Munoz Descalzo S, Rue P, Faunes F, Hayward P, Jakt LM, Balayo T, et al. A competitive protein interaction network buffers Oct4-mediated differentiation to promote pluripotency in embryonic stem cells. Mol Syst Biol 2013;9:694. https://doi.org/10.1038/msb.2013.49
  8. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583-9. https://doi.org/10.1038/s41586-021-03819-2
  9. Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, et al. Improved protein structure prediction using potentials from deep learning. Nature 2020;577:706-10. https://doi.org/10.1038/s41586-019-1923-7
  10. Mardt A, Pasquali L, Wu H, Noe F. VAMPnets for deep learning of molecular kinetics. Nat Commun 2018;9:5. https://doi.org/10.1038/s41467-017-02388-1
  11. Luscombe NM, Austin SE, Berman HM, Thornton JM. An overview of the structures of protein-DNA complexes. Genome Biol 2000;1:REVIEWS001.
  12. Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res 2012;22:1798-812. https://doi.org/10.1101/gr.139105.112
  13. Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 2009;10:669-80. https://doi.org/10.1038/nrg2641
  14. Riley TR, Slattery M, Abe N, Rastogi C, Liu D, Mann RS, et al. SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. In: Graba Y, Rezsohazy R, editors. Hox genes. New York: Humana Press, 2014:255-78.
  15. Bailey TL, Williams N, Misleh C, Li WW. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 2006;34:W369-73. https://doi.org/10.1093/nar/gkl198
  16. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 2010;38:576-89. https://doi.org/10.1016/j.molcel.2010.05.004
  17. Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol 2015;33:831-8. https://doi.org/10.1038/nbt.3300
  18. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. In: Proceedings of the IEEE 1998;86:2278-324. https://doi.org/10.1109/5.726791
  19. Hassanzadeh HR, Wang MD. DeeperBind: enhancing prediction of sequence specificities of DNA binding proteins. PProceedings (IEEE Int Conf Bioinformatics Biomed) 2016;2016:178-83.
  20. Lipton ZC, Berkowitz J, Elkan C. A critical review of recurrent neural networks for sequence learning. arXiv:1611.05777 [Preprint]. 2016 [cited 2021 Sep 2]. Available from: https://doi.org/10.48550/arXiv.1611.05777.
  21. Quang D, Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 2016;44:e107. https://doi.org/10.1093/nar/gkw226
  22. Ruder S. An overview of multi-task learning in deep neural networks. arXiv:1706.05098 [Preprint]. 2017 [cited 2021 Sep 3]. Available from: https://doi.org/10.48550/arXiv.1706.05098.
  23. Avsec Z, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 2021;53:354-66. https://doi.org/10.1038/s41588-021-00782-6
  24. Shrikumar A, Tian K, Avsec Z, Shcherbina A, Banerjee A, Sharmin M, et al. Technical note on transcription factor motif discovery from importance scores (TF-MoDISco) version 0.5. 6.5. arXiv:1811.00416 [Preprint]. 2018 [cited 2021 Sep 5]. Available from: https://doi.org/10.48550/arXiv.1811.00416.
  25. He Q, Johnston J, Zeitlinger J. ChIP-nexus enables improved detection of in vivo transcription factor binding footprints. Nat Biotechnol 2015;33:395-401. https://doi.org/10.1038/nbt.3121
  26. Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 2018;28:739-50. https://doi.org/10.1101/gr.227819.117
  27. Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H, et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci USA 2003;100:15776-81. https://doi.org/10.1073/pnas.2136655100
  28. Zhou J, Theesfeld CL, Yao K, Chen KM, Wong AK, Troyanskaya OG. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet 2018;50:1171-9. https://doi.org/10.1038/s41588-018-0160-6
  29. Kelley DR. Cross-species regulatory sequence activity prediction. PLoS Comput Biol 2020;16:e1008050. https://doi.org/10.1371/journal.pcbi.1008050
  30. Agarwal V, Shendure J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep 2020;31:107663. https://doi.org/10.1016/j.celrep.2020.107663
  31. Avsec Z, Agarwal V, Visentin D, Ledsam JR, Grabska-Barwinska A, Talyer KR, et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods 2021;18:1196-203. https://doi.org/10.1038/s41592-021-01252-x
  32. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017); Long Beach (CA), USA.
  33. Mount SM. A catalogue of splice junction sequences. Nucleic Acids Res 1982;10:459-72. https://doi.org/10.1093/nar/10.2.459
  34. Reed R, Maniatis T. The role of the mammalian branchpoint sequence in pre-mRNA splicing. Genes Dev 1988;2:1268-76. https://doi.org/10.1101/gad.2.10.1268
  35. Jaganathan K, Panagiotopoulou SK, McRae JF, Darbandi SF, Knowles D, Li YI, et al. Predicting splicing from primary sequence with deep learning. Cell 2019;176:535-48. https://doi.org/10.1016/j.cell.2018.12.015
  36. Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 2015;12:931-4. https://doi.org/10.1038/nmeth.3547
  37. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012;489:57. https://doi.org/10.1038/nature11247
  38. Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. Integrative analysis of 111 reference human epigenomes. Nature 2015;518:317-30. https://doi.org/10.1038/nature14248
  39. Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 2016;26:990-9. https://doi.org/10.1101/gr.200535.115
  40. Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol 2017;18:67. https://doi.org/10.1186/s13059-017-1189-z
  41. Tian Q, Zou J, Tang J, Fang Y, Yu Z, Fan S. MRCNN: a deep learning model for regression of genome-wide DNA methylation. BMC Genomics 2019;20:1-10. https://doi.org/10.1186/s12864-018-5379-1
  42. Schreiber J, Durham T, Bilmes J, Noble WS. Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome. Genome Biol 2020;21:81. https://doi.org/10.1186/s13059-020-01977-6
  43. Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 2018;24:1248-59. https://doi.org/10.1158/1078-0432.CCR-17-0853
  44. Sharifi-Noghabi H, Zolotareva O, Collins CC, Ester M. MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics 2019;35:i501-9. https://doi.org/10.1093/bioinformatics/btz318
  45. Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M. Striving for simplicity: the all convolutional net. arXiv:1412.6806 [Preprint]. 2014 [cited 2021 Sep 10]. Available from: https://doi.org/10.48550/arXiv.1412.6806.
  46. Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning; Sydney, Australia. PMLR 2017;70:3145-53.
  47. Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning; Sydney, Australia. PMLR 2017;70:3319-28.
  48. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006;34(Database issue):D535-9. https://doi.org/10.1093/nar/gkj109
  49. Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res 2021;49(D1):D605-12. https://doi.org/10.1093/nar/gkaa1074
  50. Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, et al. HumanNet v2: human gene networks for disease research. Nucleic Acids Res 2019;47:D573-80. https://doi.org/10.1093/nar/gky1126
  51. Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, et al. The reactome pathway knowledgebase. Nucleic Acids Res 2020;48:D498-503.
  52. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Trans Neural Netw 2008;20:61-80. https://doi.org/10.1109/TNN.2008.2005605
  53. Rhee S, Seo S, Kim S. Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-18); Stockholm (Sweden); 2018 Jul 13-19. IJCAI 2018;3527-34.
  54. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28:27-30. https://doi.org/10.1093/nar/28.1.27
  55. Lee S, Lim S, Lee T, Sung I, Kim S. Cancer subtype classification and modeling by pathway attention and propagation. Bioinformatics 2020;36:3818-24. https://doi.org/10.1093/bioinformatics/btaa203
  56. Ma T, Zhang A. Multi-view factorization autoencoder with network constraints for multi-omic integrative analysis. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine; 2018 Dec 3-6. IEEE BIBM 2018;702-7.
  57. Ando RK, Zhang T. Learning on graph with Laplacian regularization. In: Scholkopf B, Platt J, Hofmann T, et al. Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference. 2007;19:25.
  58. Fortelny N, Bock C. Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data. Genome Biol 2020;21:190. https://doi.org/10.1186/s13059-020-02100-5
  59. Gene Ontology Consortium. The gene ontology resource: 20 years and still GOing strong. Nucleic Acids Res 2019;47:D330-8. https://doi.org/10.1093/nar/gky1055
  60. Lipscomb CE. Medical subject headings (MeSH). Bull Med Library Assoc 2000;88:265-6.
  61. Ma J, Yu MK, Fong S, Ono K, Sage E, Demchak B, et al. Using deep learning to model the hierarchical structure and function of a cell. Nat Methods 2018;15:290-8. https://doi.org/10.1038/nmeth.4627
  62. Kuenzi BM, Park J, Fong SH, Sanchez KS, Lee J, Kreisberg JF, et al. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 2020;38:672-84. https://doi.org/10.1016/j.ccell.2020.09.014
  63. Khan A, Fornes O, Stigliani A, Gheorghe M, Castro-Mondragon JA, Van Der Lee R, et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res 2018;46:D260-6. https://doi.org/10.1093/nar/gkx1126
  64. Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 2006;34:D108-10. https://doi.org/10.1093/nar/gkj143
  65. Kolmykov S, Yevshin I, Kulyashov M, Sharipov R, Kondrakhin Y, Makeev VJ, et al. GTRD: an integrated view of transcription regulation. Nucleic Acids Res 2021;49:D104-11. https://doi.org/10.1093/nar/gkaa1057
  66. Ploenzke, MS, Irizarry RA. Interpretable convolution methods for learning genomic sequence motifs. bioRxiv [Preprint]. 2018 [cited 2021 Sep 3]. Available from: https://doi.org/10.1101/411934.
  67. Kang M, Lee S, Lee D, Kim S. Learning cell-type-specific gene regulation mechanisms by multi-attention-based deep learning with regulatory latent space. Frontier Genet 2020;11:869 https://doi.org/10.3389/fgene.2020.00869
  68. Lanchantin J, Qi Y. Graph convolutional networks for epigenetic state prediction using both sequence and 3D genome data. Bioinformatics 2020;36(Suppl_2):i659-67. https://doi.org/10.1093/bioinformatics/btaa793