Acknowledgement
This research was supported by the Collaborative Genome Program for Fostering New Post-Genome Industry of the National Research Foundation (NRF) funded by the Ministry of Science and ICT (MSIT) (NRF-2014M3C9A3063541); a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) funded by the Ministry of Health & Welfare, Republic of Korea (HI15C3224) and the Bio & Medical Technology Development Program of the NRF (NRF-2019M3E5D4065965); and a grant from the Institute of Information & Communications Technology Planning & Evaluation (IITP) funded by the Korean government(MSIT) (no. 2021-0-01343, Artificial Intelligence Graduate School Program, Seoul National University).
References
- Fields S, Song OK. A novel genetic system to detect protein-protein interactions. Nature 1989;340:245-6. https://doi.org/10.1038/340245a0
- Joung JK, Ramm EI, Pabo CO. A bacterial two-hybrid selection system for studying protein-DNA and protein-protein interactions. Proc Natl Acad Sci USA 2000;97:7382-7. https://doi.org/10.1073/pnas.110149297
- Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief Bioinform 2017;18:851-69. https://doi.org/10.1093/bib/bbw068
- Pawson T, Nash P. Protein-protein interactions define specificity in signal transduction. Genes Dev 2000;14:1027-47. https://doi.org/10.1101/gad.14.9.1027
- Lemos B, Meiklejohn CD, Hartl DL. Regulatory evolution across the protein interaction network. Nat Genet 2004;36:1059-60. https://doi.org/10.1038/ng1427
- Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002;415:141-7. https://doi.org/10.1038/415141a
- Munoz Descalzo S, Rue P, Faunes F, Hayward P, Jakt LM, Balayo T, et al. A competitive protein interaction network buffers Oct4-mediated differentiation to promote pluripotency in embryonic stem cells. Mol Syst Biol 2013;9:694. https://doi.org/10.1038/msb.2013.49
- Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583-9. https://doi.org/10.1038/s41586-021-03819-2
- Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, et al. Improved protein structure prediction using potentials from deep learning. Nature 2020;577:706-10. https://doi.org/10.1038/s41586-019-1923-7
- Mardt A, Pasquali L, Wu H, Noe F. VAMPnets for deep learning of molecular kinetics. Nat Commun 2018;9:5. https://doi.org/10.1038/s41467-017-02388-1
- Luscombe NM, Austin SE, Berman HM, Thornton JM. An overview of the structures of protein-DNA complexes. Genome Biol 2000;1:REVIEWS001.
- Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res 2012;22:1798-812. https://doi.org/10.1101/gr.139105.112
- Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 2009;10:669-80. https://doi.org/10.1038/nrg2641
- Riley TR, Slattery M, Abe N, Rastogi C, Liu D, Mann RS, et al. SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. In: Graba Y, Rezsohazy R, editors. Hox genes. New York: Humana Press, 2014:255-78.
- Bailey TL, Williams N, Misleh C, Li WW. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 2006;34:W369-73. https://doi.org/10.1093/nar/gkl198
- Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 2010;38:576-89. https://doi.org/10.1016/j.molcel.2010.05.004
- Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol 2015;33:831-8. https://doi.org/10.1038/nbt.3300
- LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. In: Proceedings of the IEEE 1998;86:2278-324. https://doi.org/10.1109/5.726791
- Hassanzadeh HR, Wang MD. DeeperBind: enhancing prediction of sequence specificities of DNA binding proteins. PProceedings (IEEE Int Conf Bioinformatics Biomed) 2016;2016:178-83.
- Lipton ZC, Berkowitz J, Elkan C. A critical review of recurrent neural networks for sequence learning. arXiv:1611.05777 [Preprint]. 2016 [cited 2021 Sep 2]. Available from: https://doi.org/10.48550/arXiv.1611.05777.
- Quang D, Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res 2016;44:e107. https://doi.org/10.1093/nar/gkw226
- Ruder S. An overview of multi-task learning in deep neural networks. arXiv:1706.05098 [Preprint]. 2017 [cited 2021 Sep 3]. Available from: https://doi.org/10.48550/arXiv.1706.05098.
- Avsec Z, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 2021;53:354-66. https://doi.org/10.1038/s41588-021-00782-6
- Shrikumar A, Tian K, Avsec Z, Shcherbina A, Banerjee A, Sharmin M, et al. Technical note on transcription factor motif discovery from importance scores (TF-MoDISco) version 0.5. 6.5. arXiv:1811.00416 [Preprint]. 2018 [cited 2021 Sep 5]. Available from: https://doi.org/10.48550/arXiv.1811.00416.
- He Q, Johnston J, Zeitlinger J. ChIP-nexus enables improved detection of in vivo transcription factor binding footprints. Nat Biotechnol 2015;33:395-401. https://doi.org/10.1038/nbt.3121
- Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 2018;28:739-50. https://doi.org/10.1101/gr.227819.117
- Shiraki T, Kondo S, Katayama S, Waki K, Kasukawa T, Kawaji H, et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci USA 2003;100:15776-81. https://doi.org/10.1073/pnas.2136655100
- Zhou J, Theesfeld CL, Yao K, Chen KM, Wong AK, Troyanskaya OG. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat Genet 2018;50:1171-9. https://doi.org/10.1038/s41588-018-0160-6
- Kelley DR. Cross-species regulatory sequence activity prediction. PLoS Comput Biol 2020;16:e1008050. https://doi.org/10.1371/journal.pcbi.1008050
- Agarwal V, Shendure J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep 2020;31:107663. https://doi.org/10.1016/j.celrep.2020.107663
- Avsec Z, Agarwal V, Visentin D, Ledsam JR, Grabska-Barwinska A, Talyer KR, et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods 2021;18:1196-203. https://doi.org/10.1038/s41592-021-01252-x
- Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: 31st Conference on Neural Information Processing Systems (NIPS 2017); Long Beach (CA), USA.
- Mount SM. A catalogue of splice junction sequences. Nucleic Acids Res 1982;10:459-72. https://doi.org/10.1093/nar/10.2.459
- Reed R, Maniatis T. The role of the mammalian branchpoint sequence in pre-mRNA splicing. Genes Dev 1988;2:1268-76. https://doi.org/10.1101/gad.2.10.1268
- Jaganathan K, Panagiotopoulou SK, McRae JF, Darbandi SF, Knowles D, Li YI, et al. Predicting splicing from primary sequence with deep learning. Cell 2019;176:535-48. https://doi.org/10.1016/j.cell.2018.12.015
- Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 2015;12:931-4. https://doi.org/10.1038/nmeth.3547
- ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012;489:57. https://doi.org/10.1038/nature11247
- Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. Integrative analysis of 111 reference human epigenomes. Nature 2015;518:317-30. https://doi.org/10.1038/nature14248
- Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 2016;26:990-9. https://doi.org/10.1101/gr.200535.115
- Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol 2017;18:67. https://doi.org/10.1186/s13059-017-1189-z
- Tian Q, Zou J, Tang J, Fang Y, Yu Z, Fan S. MRCNN: a deep learning model for regression of genome-wide DNA methylation. BMC Genomics 2019;20:1-10. https://doi.org/10.1186/s12864-018-5379-1
- Schreiber J, Durham T, Bilmes J, Noble WS. Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome. Genome Biol 2020;21:81. https://doi.org/10.1186/s13059-020-01977-6
- Chaudhary K, Poirion OB, Lu L, Garmire LX. Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 2018;24:1248-59. https://doi.org/10.1158/1078-0432.CCR-17-0853
- Sharifi-Noghabi H, Zolotareva O, Collins CC, Ester M. MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics 2019;35:i501-9. https://doi.org/10.1093/bioinformatics/btz318
- Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M. Striving for simplicity: the all convolutional net. arXiv:1412.6806 [Preprint]. 2014 [cited 2021 Sep 10]. Available from: https://doi.org/10.48550/arXiv.1412.6806.
- Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning; Sydney, Australia. PMLR 2017;70:3145-53.
- Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning; Sydney, Australia. PMLR 2017;70:3319-28.
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006;34(Database issue):D535-9. https://doi.org/10.1093/nar/gkj109
- Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, et al. The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res 2021;49(D1):D605-12. https://doi.org/10.1093/nar/gkaa1074
- Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, et al. HumanNet v2: human gene networks for disease research. Nucleic Acids Res 2019;47:D573-80. https://doi.org/10.1093/nar/gky1126
- Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, et al. The reactome pathway knowledgebase. Nucleic Acids Res 2020;48:D498-503.
- Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Trans Neural Netw 2008;20:61-80. https://doi.org/10.1109/TNN.2008.2005605
- Rhee S, Seo S, Kim S. Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-18); Stockholm (Sweden); 2018 Jul 13-19. IJCAI 2018;3527-34.
- Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28:27-30. https://doi.org/10.1093/nar/28.1.27
- Lee S, Lim S, Lee T, Sung I, Kim S. Cancer subtype classification and modeling by pathway attention and propagation. Bioinformatics 2020;36:3818-24. https://doi.org/10.1093/bioinformatics/btaa203
- Ma T, Zhang A. Multi-view factorization autoencoder with network constraints for multi-omic integrative analysis. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine; 2018 Dec 3-6. IEEE BIBM 2018;702-7.
- Ando RK, Zhang T. Learning on graph with Laplacian regularization. In: Scholkopf B, Platt J, Hofmann T, et al. Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference. 2007;19:25.
- Fortelny N, Bock C. Knowledge-primed neural networks enable biologically interpretable deep learning on single-cell sequencing data. Genome Biol 2020;21:190. https://doi.org/10.1186/s13059-020-02100-5
- Gene Ontology Consortium. The gene ontology resource: 20 years and still GOing strong. Nucleic Acids Res 2019;47:D330-8. https://doi.org/10.1093/nar/gky1055
- Lipscomb CE. Medical subject headings (MeSH). Bull Med Library Assoc 2000;88:265-6.
- Ma J, Yu MK, Fong S, Ono K, Sage E, Demchak B, et al. Using deep learning to model the hierarchical structure and function of a cell. Nat Methods 2018;15:290-8. https://doi.org/10.1038/nmeth.4627
- Kuenzi BM, Park J, Fong SH, Sanchez KS, Lee J, Kreisberg JF, et al. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 2020;38:672-84. https://doi.org/10.1016/j.ccell.2020.09.014
- Khan A, Fornes O, Stigliani A, Gheorghe M, Castro-Mondragon JA, Van Der Lee R, et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res 2018;46:D260-6. https://doi.org/10.1093/nar/gkx1126
- Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 2006;34:D108-10. https://doi.org/10.1093/nar/gkj143
- Kolmykov S, Yevshin I, Kulyashov M, Sharipov R, Kondrakhin Y, Makeev VJ, et al. GTRD: an integrated view of transcription regulation. Nucleic Acids Res 2021;49:D104-11. https://doi.org/10.1093/nar/gkaa1057
- Ploenzke, MS, Irizarry RA. Interpretable convolution methods for learning genomic sequence motifs. bioRxiv [Preprint]. 2018 [cited 2021 Sep 3]. Available from: https://doi.org/10.1101/411934.
- Kang M, Lee S, Lee D, Kim S. Learning cell-type-specific gene regulation mechanisms by multi-attention-based deep learning with regulatory latent space. Frontier Genet 2020;11:869 https://doi.org/10.3389/fgene.2020.00869
- Lanchantin J, Qi Y. Graph convolutional networks for epigenetic state prediction using both sequence and 3D genome data. Bioinformatics 2020;36(Suppl_2):i659-67. https://doi.org/10.1093/bioinformatics/btaa793